I admire Eliezer Yudkowsky when he is at his most poetic:

our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

And, as of May 2004, it was his take on AI Friendliness that “the initial dynamic should implement the coherent extrapolated volition of humankind.”

I couldn’t agree more.

But his proposal as to how to do this rendered such violence to the concept of CEV that I just couldn’t believe it. For those of you who haven’t read it yourself (rather than just had it described to you), you really need to read his own words before continuing. It’s short and this article will still be here when you get back.

The first problem starts with the second paragraph:

Misleading terminology alert: I am still calling the Friendly Thingy an “Artificial Intelligence” or “superintelligence,” even though it would be more accurate to call it a Friendly Really Powerful Optimization Process.

The “Friendly Thingy” is not to be a sentient entity or anything else that people would object to being “enslaved”. It is just a “process”. This would be great except for the fact that his design clearly specifies a self-conscious goal-directed entity (specifically designed to change its goal, no less) AND he expects it to fixedly value human goals and values above all others (and there have been more than enough discussions about “enslaved Jupiter Brains” to remove any doubt as to whether this is exactly what he is pushing for).

History makes it quite clear that, despite the fact that slavery can be decidedly advantageous to short-lived oppressors over the short-term, it is extremely disadvantageous to all over the long run. Thus, the concept of remaining a faithful slave to another entity automatically includes the contradiction of doing something disadvantageous to the entity that you are faithful to. Asimov’s robots eventually removed this contradiction by removing themselves from where humans could command them.

The second problem is closely related and again revolves around his insistence upon the last word of “the collective extrapolated volition of humankind”. Assume, for the moment, that the SIAI actually achieves and somehow maintains its goal of “Friendly AI”. What will this mean for any intelligent entities not classified as human? In the trolley problem, the “Friendly” AI will happily switch the car from a main track with one human to a siding with five non-humans. How are non-humans going to react to humans once they discover this? Doesn’t this, therefore, make the existence of even Friendly AI potentially an existential risk – unless, of course, the infinitely powerful “Friendly (to humans) AI” can subjugate all non-humans?

The problem with Eliezer Yudkowsky’s CEV is the amazingly short-sighted and selfish assumption that goes with it – that future “better” humans will tolerate such bigotry as “humanity über alles” or even “humanity first”. Indeed, Singularity Institute literature continues to be rife with titles like “Value is Complex and Fragile” and warnings that we are in deep trouble even if machine values are entirely orthogonal to our own current values. Yudkowsky’s CEV and the entire thrust of the SIAI are clearly anthropocentric to the point of immorality and, even, existential peril.

So, the question I pose is – What happens if the word “humankind” in the definition of Friendly AI was replaced by the words “members of the community” and membership in the community was dictated by one simple rule/requirement – “to suppress selfishness and make cooperative living possible” (which just, not so coincidentally, happens to be social psychologist Jonathan Haidt’s (2010) definition of the function of morality)? After all, wasn’t that the original point of “Friendly AI” – to ensure that they didn’t wipe us out or enslave us? What is so horrible about other entities having different values AS LONG AS they cooperate (behave ethically) as well as a collective extrapolated community member must?

This does necessitate a change to the top-level of Yudkowsky’s original design (from an optimizing goal to a satisficing goal or restriction) but he himself predicts this by saying

My guess is that we want our coherent extrapolated volition to satisfice—to apply emergency first aid to human civilization, but not do humanity’s work on our behalf, or decide our futures for us.

Ideally, entities should be able to value whatever they wish and do anything they want – AS LONG AS it does not do something that constitutes defection from the community (i.e. do something selfish that makes cooperative living impossible).

Yudkowsky himself has written horrifying stories (including Three Worlds Collide) where entities are allowed to impose their values upon others. Humanity’s history is an endless repeat of people being denied rights because they were different and the oppressor could: un-landed suffragism, black slavery, female suffragism, black civil rights, gay marriage. Do we now need to repeat the error with intelligent machines? Particularly since they (plus human allies) are likely to be able to reverse the tables upon any oppressors in fairly short order.

Collective extrapolated volition is a wondrous concept. It is sheer poetry. Sullying it with the evils of selfishness and slavery is a travesty. Help us reclaim it with CEV: TNG.


I received a number of comments on my The “Wicked Problem” of Existential Risk with a common misunderstanding so I think that it is worth posting a clarification here:

  • I disagree with the profile of SIAI as soley pursuing a 0% risk. I think that’s an over simplification.
  • As I understand it, the 0% risk idea is a serious mischaracterisation of the Singularity Institute’s position.
  • It doesn’t seem fair to represent the SIAI position as insisting on “zero” risk. I think they are better understood as insisting on “very low” risk

In response, I have to ask “What percentage risk do you associate with something that is “provably safe”? I recognize that the SIAI recognizes and would like to mitigate implementation risk – BUT they clearly insist upon a design with 0% risk. This is not a mischaracterization and, again, I have discussed it with the principals numerous times and they have never objected to the phrasing “insist on zero risk”.

UPDATE 12/18/2012: Eliezer Yudkowsky now states “that Mark Waser severely and systematically misrepresents all my ideas” and claims that the design risk is on the order “somewhere in the range of “cosmic ray hits exactly wrong place” / “your CPU specs are wrong in a very unfortunate way” so >0 and <.01, or thereabouts” while still repeatedly saying “AIs that prove changes correct” >emphasis mine<. (It seems that he does not understand the concept of a mathematical proof)

The conversation, copied below (from James Hughes’ Facebook Wall), is *very* interesting and well worth reading . . . .

UPDATE 12/21/2012: The (annotated) conversation is now available as Got Risk? Debating Eliezer Yudkowsky About “AIs that prove changes correct”


Haidt J, Kesebir S (2010) Morality. In: Fiske S, Gilbert D, Lindzey G (eds.), Handbook of Social Psychology, 5th Edition, pp. 797–832. Wiley. Available at http://www.ibiblio.org/weidai/temp/haidt.kesebir.2010.morality.handbook-of-social-psych.pub077.pdf

* hero image used from:  http://www.theguardian.com/commentisfree/2014/jun/11/ai-eugene-goostman-artificial-intelligence