Got Risk? Debating Eliezer Yudkowsky About “AIs that prove changes correct”

Rice’s Theorem (in a nutshell):

Unless everything is specified, anything non-trivial (not directly provable from the partial specification you have) can’t be proved

AI Implications (in a nutshell):

You can have either unbounded learning (Turing-completeness) or provability – but never both at the same time!

I have been semi-publicly called out by Eliezer Yudkowsky. He posted the following on IEET* Director James Hughes’ Facebook Wall in response to a post that referenced my last article (Coherent Extrapolated Volition: The Next Generation):

Eliezer Yudkowsky – I can confirm that Mark Waser severely and systematically misrepresents all my ideas, and that I’ll be using his writing for future examples of straw versions of SIAI.

Really? All of his ideas?

This was in response to the fact that, in a clarification as to whether I was mis-characterizing the SIAI’s position in an earlier article (The “Wicked Problem” of Existential Risk with AI (Artificial Intelligence)) by using the phrase 0% risk, I made the following statement:

In response, I have to ask “What percentage risk do you associate with something that is “provably safe”? I recognize that the SIAI recognizes and would like to mitigate implementation risk – BUT they clearly insist upon a design with 0% risk. This is not a mis-characterization and, again, I have discussed it with the principals numerous times and they have never objected to the phrasing “insist on zero risk.”

And I certainly can’t let stand the claim that I “misrepresent” him (“misunderstand” might be barely tolerable since that is the game that he and the SIAI normally play, but “misrepresent” is an entirely different kettle of fish).

Mark Waser – Could you provide short, clear, specific examples here, please—with specific quotes where I do so? Such is not my intent and I will correct anything that is pointed out. Or, Eliezer Yudkowsky, you can also send them to me directly. If you decline, however, I can confirm that I will change my statement to “Eliezer Yudkowsky claims that I severely and systematically misrepresent all his ideas but refuses to provide concrete examples, as has always been my experience in the past.”

I also ensured that I posted his statement as an update to the article. I am a firm believer in the knowledge discovery process of science as opposed to the rhetorical process of argumentation. As such, I abhor misrepresentation along with all other attempts to obscure – like this strawman that appeared three minutes later.

Eliezer Yudkowsky – Okay. Simple and obvious one, I never ever claimed that when you use logic-based proof inside an AI, it eliminates risk. It reduces one kind of risk and doesn’t deal with others at all. Never said otherwise. There goes your entire last article (the one before this one).

That’s funny. I never ever claimed that he said such a thing. In fact, the clarification that he was responding to emphasized (by bolding the word design) the fact that I was only talking about design risk (after also specifically noting that the SIAI recognizes and would like to mitigate implementation risk). Isn’t all of this obvious from the first quote in this article?

At this point, he wants us to believe that his strawman successfully dismisses the entire article. I’d normally argue this immediately but there is a bigger, better fish in there to be pursued. Eli constantly refers to “AIs that prove changes correct”. Unless he cares to dispute this, it has always been quite apparent that he means “prove” in the sense of a logical/mathematical proof (i.e. guaranteed 100% correct). Yet, now, he is suddenly using the word reduce instead of eliminate. *That* is certainly worth exploring . . . and in less than a minute . . . .

Mark Waser – Does it reduce one kind of risk or eliminate it?

And after an hour-long pause:

Mark Waser – Yeah, exactly what I’ve always seen in the past—rhetorical “Dark Arts” and/or a refusal to answer when pinned down. Ask you to provide specific quotes and you come up with a strawman that, not only did I never claim but, I specifically addressed and dismissed in the clarification section of the article that this thread started with. And, in your reply, why don’t you address the true issue of “provably safe” and how your design isn’t affected by Rice’s Theorem rather than playing silly little rhetorical games.

This is arguably a bit rude/obnoxious as I tend to be overly impatient with “drive-bys”, rhetoric and argumentation (and I only repeat it here for completeness) but it does attempt to advance the conversation. H+ Magazine Editor Peter Rothman, in particular, has been trying to start numerous conversations about the impact of Rice’s theorem on Yudkowsky’s claims about provability and his “Friendly AI”. Successfully engaging Eli in such a conversation could clarify or resolve many issues that a number of people have with SIAI and “Friendly AI”.

QUICK TUTORIAL

Rice’s Theorem states that unless the input to a program is completely specified and the transition from input to output is completely specified, you can’t prove non-trivial properties about the output of the program (with non-trivial meaning anything that isn’t directly provable from the specification). Or, more simply, the relatively clear English statement above which is basically true by definition (a definitional tautology) and should be comprehensible by anyone. Rice’s theorem uses the specific “term of art” partial functions to refer to programs, procedures, etc. that are not “total” (i.e. fully specified).

Inputs can only be completely specified it they, or the complete set of their ranges, is countable (enumerable). For example, the integers between one and ten are countable. Infinity is not countable. The integers between one and any concrete number are countable (though possibly not during a lifetime). The real numbers between 0 and 1 are infinite and uncountable despite being bounded; however, they can be divided into a countable number of sets that cover that complete range (for example, a set 0 <= x < 0.5 and a set 0.5 <= y <= 1) which can then be specified.

CPU designs can be fully specified because they only accept binary inputs of specific fixed lengths (eminently countable). Some operating systems are provably safe because they only accept specified input and reject the set/range of all unspecified input. Similarly, certain programming languages (Adga, Charity, Epigram, etc.) disallow partial functions and, therefore, all programs written in them can be proved correct—but this disallowing carries a HEAVY price. Requiring that the transition from input to output be completely specified disallows any learning and change that is not completely specified. The “term of art” for unbounded learning systems is that they are “Turing-complete” (based upon Turing’s model of a state machine and an infinite input tape which being infinite, cannot be counted/specified).

The “killer consequence” of Rice’s theorem is that you can EITHER be Turing-complete (and have the capability of unbounded learning) OR have provable properties (like safety) BUT NOT BOTH.

Eliezer Yudkowsky – Rice’s Theorem says that you can’t prove facts about an arbitrary process generated randomly or by an adversary. You can choose special cases for which the facts are provable – this is why we can e.g. prove CPU designs correct. Frankly, the question seems to reveal a certain lack of mathematical understanding. But to answer the last question before I go, if you prove theorems about code, you’re probably reducing that exact form of risk to somewhere in the range of “cosmic ray hits exactly wrong place” / “your CPU specs are wrong in a very unfortunate way” so >0 and <.01, or thereabouts. The argument for AIs that prove changes correct is not that the total risk goes to zero – there are lots of other risks – the argument is that an AI which *doesn’t* prove changes correct is *guaranteed* to go wrong in a billion sequential self-modifications because it has a conditionally independent failure probability each time. An AI that proves changes correct has a *non-zero* chance of *actually working* if you get everything else right; the argument is that non-change-proving AIs are effectively *guaranteed to fail*. Then come the other problems to be solved.

Eliezer Yudkowsky – Got less hateful places to be, toodles.

So, his statement about what Rice’s Theorem says is a bit wonky but he is definitely very aware that CPUs are members of the class of “special cases for which the facts are provable”—and deploys that as a red herring. Notice, however, that he makes *NO* attempt to argue that his design is in that same class – I tend to assume because he is aware that it isn’t. He also deploys the standard SIAI post-obfuscation “this is so complicated that obviously no one except me can understand it”. Then, he continues to insist on conflating operational error/risk via cosmic rays with design errors before throwing in a huge new (to this thread) claim that “non-change-proving AIs are effectively *guaranteed to fail*.”

Of course, the most interesting facet of this last claim is that it means that if he can’t get around Rice’s theorem, he is claiming that his design is guaranteed to fail. My speculation has long been that this may well be the reason why he has decided to stop all work on implementing AI. So, I couldn’t resist replying in hopes of drawing him back.

Mark Waser – Eliezer Yudkowsky – You persist in obfuscating and not clearly answering even simple questions that require a one word responses (i.e. reduce or eliminate). It looks, however, like you answered “reduce” (you’re probably reducing that exact form of risk)—which shows a total lack of understanding of what mathematical provability is.

The special cases for which facts are provable is when accepted input is limited and all output is defined (i.e. it is not an arbitrary process). Whether the program or the input is generated randomly or by an adversary is a total non sequitur.

You can prove CPU designs correct because CPUs have limited input (binary instructions and binary data for the most common CPUs) and responses (only producing fixed specific outputs for any given set of inputs). Similarly, there are a number of programming languages that are not Turing complete (Adga, Charity, Epigram, etc.) which you can prove correctness in.

Neither of these things is true for your proposed design so your entire argument is a series of strawman while, once again, you didn’t answer my question. The question was not about CPUs. It is about your design which emphatically does not have limited input and fixed specific output.

Rice’s theorem says you *CANNOT* prove your design correct. Unless you can show why Rice’s theorem does not apply TO YOUR DESIGN, all your arguments against “unproven” designs (inflated as they may be) apply to your design as well.

Unfortunately, that is where the conversation stands. It would be excellent if this article could draw Eliezer into contact with the numerous others who agree that AI Safety is a critical issue but who don’t agree with many of his contentions. I am sure that Hank Pellisier would be absolutely delighted to publish any rebuttal that Eli would be willing to send his way. I similarly suspect that Peter Rothman would likely be very happy to publish a good rebuttal in H+ Magazine. And I will post an update to this article pointing to any rebuttal that I become aware of—even if it is only in the LessWrong echo chamber.

So, Eliezer, let me return the favor. I claim that you either misunderstand my position or are deliberately misrepresenting it. I claim that you either do not understand Rice’s Therorem’s implications for “Friendly AI” or that you are deliberately distorting and dodging them. Are you capable of maintaining your positions in the face of a rational scientific discourse without resorting to rhetoric, misrepresentations or other forms of the Dark Arts (as you yourself term them)? Or are you going to continue to “duck and weave” and avoid all meaningful engagement with those who honestly disagree with you?

*The Institute for Ethics & Emerging Technologies is an excellent non-profit, non-fear-mongering organization that includes “existential risk” in its portfolio of interests

* here image from http://yttalk.com/threads/big-vision.29714/

1 Comment

Add yours

DavidJKelley
October 13, 2014 at 2:42 pm


comments from the Archive:

I hope Eliezer responds. I’m interested in his perspective on this.

By Lincoln Cannon on Dec 21, 2012 at 10:03am

Definitely. It can be frustrating…useless, in fact, when someone dances around and obfuscates what they’ve publicly written, even when you attempt to isolate very small parts of their text.

By Kevin Haskell on Dec 22, 2012 at 9:39pm

the future of humanity now

Got Risk? Debating Eliezer Yudkowsky About “AIs that prove changes correct”

Rice’s Theorem (in a nutshell):

AI Implications (in a nutshell):

* here image from http://yttalk.com/threads/big-vision.29714/

Mark Waser

1 Comment

Add yours

3 Pingbacks

Leave a Reply Cancel reply

Transhumanity.net on FB

Recent Comments

Recent Posts

Meta