Transhumanity
where strange brilliant ideas for the future intermingle and breed…

Home > Articles > Coherent Extrapolated Volition: The Next Generation

Coherent Extrapolated Volition: The Next Generation

Posted: Sun, December 16, 2012 | By: Mark Waser



Those unable to solve the Centipede Game puzzle from my last article (Backward Induction:  Rationality or Inappropriate Reductionism? – Part 1) can find a blinder-removing hint here.  People who believe that they’ve found the best solution should check the hint to verify that they have indeed found the best solution (i.e. since a “good enough” solution and a “better” solution both exist).  Part 2 of the article, with the solutions, will be along shortly.

I admire Eliezer Yudkowsky when he is at his most poetic:

our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

And, as of May 2004, it was his take on AI Friendliness that “the initial dynamic should implement the coherent extrapolated volition of humankind.”

I couldn’t agree more.

But his proposal as to how to do this rendered such violence to the concept of CEV that I just couldn’t believe it.  For those of you who haven’t read it yourself (rather than just had it described to you), you really need to read his own words before continuing.  It’s short and this article will still be here when you get back.

The first problem starts with the second paragraph:

Misleading terminology alert: I am still calling the Friendly Thingy an “Artificial Intelligence” or “superintelligence,” even though it would be more accurate to call it a Friendly Really Powerful Optimization Process.

The “Friendly Thingy” is not to be a sentient entity or anything else that people would object to being “enslaved”.  It is just a “process”.  This would be great except for the fact that his design clearly specifies a self-conscious goal-directed entity (specifically designed to change its goal, no less) AND he expects it to fixedly value human goals and values above all others (and there have been more than enough discussions about “enslaved Jupiter Brains” to remove any doubt as to whether this is exactly what he is pushing for).

History makes it quite clear that, despite the fact that slavery can be decidedly advantageous to short-lived oppressors over the short-term, it is extremely disadvantageous to all over the long run.  Thus, the concept of remaining a faithful slave to another entity automatically includes the contradiction of doing something disadvantageous to the entity that you are faithful to.  Asimov’s robots eventually removed this contradiction by removing themselves from where humans could command them.

The second problem is closely related and again revolves around his insistence upon the last word of “the collective extrapolated volition of humankind”.  Assume, for the moment, that the SIAI actually achieves and somehow maintains its goal of “Friendly AI”.  What will this mean for any intelligent entities not classified as human?  In the trolley problem, the “Friendly” AI will happily switch the car from a main track with one human to a siding with five non-humans.  How are non-humans going to react to humans once they discover this?  Doesn’t this, therefore, make the existence of even Friendly AI potentially an existential risk – unless, of course, the infinitely powerful “Friendly (to humans) AI” can subjugate all non-humans?

The problem with Eliezer Yudkowsky’s CEV is the amazingly short-sighted and selfish assumption that goes with it – that future “better” humans will tolerate such bigotry as “humanity über alles” or even “humanity first”.  Indeed, Singularity Institute literature continues to be rife with titles like “Value is Complex and Fragile” and warnings that we are in deep trouble even if machine values are entirely orthogonal to our own current values.  Yudkowsky’s CEV and the entire thrust of the SIAI are clearly anthropocentric to the point of immorality and, even, existential peril.

So, the question I pose is – What happens if the word “humankind” in the definition of Friendly AI was replaced by the words “members of the community” and membership in the community was dictated by one simple rule/requirement – “to suppress selfishness and make cooperative living possible” (which just, not so coincidentally, happens to be social psychologist Jonathan Haidt’s (2010) definition of the function of morality)?  After all, wasn’t that the original point of “Friendly AI” – to ensure that they didn’t wipe us out or enslave us?  What is so horrible about other entities having different values AS LONG AS they cooperate (behave ethically) as well as a collective extrapolated community member must?

This does necessitate a change to the top-level of Yudkowsky’s original design (from an optimizing goal to a satisficing goal or restriction) but he himself predicts this by saying

My guess is that we want our coherent extrapolated volition to satisfice—to apply emergency first aid to human civilization, but not do humanity’s work on our behalf, or decide our futures for us. 

Ideally, entities should be able to value whatever they wish and do anything they want – AS LONG AS it does not do something that constitutes defection from the community (i.e. do something selfish that makes cooperative living impossible).

Yudkowsky himself has written horrifying stories (including Three Worlds Collide) where entities are allowed to impose their values upon others.  Humanity’s history is an endless repeat of people being denied rights because they were different and the oppressor could: un-landed suffragism, black slavery, female suffragism, black civil rights, gay marriage.  Do we now need to repeat the error with intelligent machines?  Particularly since they (plus human allies) are likely to be able to reverse the tables upon any oppressors in fairly short order.

Collective extrapolated volition is a wondrous concept.  It is sheer poetry.  Sullying it with the evils of selfishness and slavery is a travesty.  Help us reclaim it with CEV: TNG.

Clarification  

I received a number of comments on my The “Wicked Problem” of Existential Risk with a common misunderstanding so I think that it is worth posting a clarification here:

  • I disagree with the profile of SIAI as soley pursuing a 0% risk. I think that’s an over simplification.
  • As I understand it, the 0% risk idea is a serious mischaracterisation of the Singularity Institute’s position.
  • It doesn’t seem fair to represent the SIAI position as insisting on “zero” risk. I think they are better understood as insisting on “very low” risk

In response, I have to ask “What percentage risk do you associate with something that is “provably safe”?  I recognize that the SIAI recognizes and would like to mitigate implementation risk – BUT they clearly insist upon a design with 0% risk.  This is not a mischaracterization and, again, I have discussed it with the principals numerous times and they have never objected to the phrasing “insist on zero risk”.

UPDATE 12/18/2012:  Eliezer Yudkowsky now states “that Mark Waser severely and systematically misrepresents all my ideas” and claims that the design risk is on the order “somewhere in the range of “cosmic ray hits exactly wrong place” / “your CPU specs are wrong in a very unfortunate way” so >0 and <.01, or thereabouts” while still repeatedly saying “AIs that prove changes correct” >emphasis mine<.  (It seems that he does not understand the concept of a mathematical proof)

The conversation, copied below (from James Hughes’ Facebook Wall), is *very* interesting and well worth reading . . . .

References

Haidt J, Kesebir S (2010) Morality. In: Fiske S, Gilbert D, Lindzey G (eds.), Handbook of Social Psychology, 5th Edition, pp. 797–832. Wiley. Available at http://www.ibiblio.org/weidai/temp/haidt.kesebir.2010.morality.handbook-of-social-psych.pub077.pdf 



Comments:

UPDATE:  Eliezer Yudkowsky now states “that Mark Waser severely and systematically misrepresents all my ideas” and claims that the design risk is on the order “somewhere in the range of “cosmic ray hits exactly wrong place” / “your CPU specs are wrong in a very unfortunate way” so >0 and <.01, or thereabouts” while still repeatedly saying “AIs that prove changes correct”.  (It seems that he does not understand the concept of a mathematical proof)

The conversation is *very* interesting and well worth reading . . . .
======================================================

Eliezer Yudkowsky · I can confirm that Mark Waser severely and systematically misrepresents all my ideas, and that I’ll be using his writing for future examples of straw versions of SIAI.

Mark Waser - Could you provide short, clear, specific examples here, please—with specific quotes where I do so? Such is not my intent and I will correct anything that is pointed out.  Or, Eliezer Yudkowsky, you can also send them to me directly. If you decline, however, I can confirm that I will change my statement to “Eliezer Yudkowsky claims that I severely and systematically misrepresent all his ideas but refuses to provide concrete examples, as has always been my experience in the past.”

Eliezer Yudkowsky · Okay. Simple and obvious one, I never ever claimed that when you use logic-based proof inside an AI, it eliminates risk. It reduces one kind of risk and doesn’t deal with others at all. Never said otherwise. There goes your entire last article (the one before this one).

Mark Waser Does it reduce one kind of risk or eliminate it?
(hour-long wait)
Mark Waser Yeah, exactly what I’ve always seen in the past—rhetorical “Dark Arts” and/or a refusal to answer when pinned down. Ask you to provide specific quotes and you come up with a strawman that, not only did I never claim but, I specifically addressed and dismissed in the clarification section of the article that this thread started with. And, in your reply, why don’t you address the true issue of “provably safe” and how your design isn’t affected by Rice’s Theorem rather than playing silly little rhetorical games.

Eliezer Yudkowsky · Rice’s Theorem says that you can’t prove facts about an arbitrary process generated randomly or by an adversary. You can choose special cases for which the facts are provable - this is why we can e.g. prove CPU designs correct. Frankly, the question seems to reveal a certain lack of mathematical understanding. But to answer the last question before I go, if you prove theorems about code, you’re probably reducing that exact form of risk to somewhere in the range of “cosmic ray hits exactly wrong place” / “your CPU specs are wrong in a very unfortunate way” so >0 and <.01, or thereabouts. The argument for AIs that prove changes correct is not that the total risk goes to zero - there are lots of other risks - the argument is that an AI which *doesn’t* prove changes correct is *guaranteed* to go wrong in a billion sequential self-modifications because it has a conditionally independent failure probability each time. An AI that proves changes correct has a *non-zero* chance of *actually working* if you get everything else right; the argument is that non-change-proving AIs are effectively *guaranteed to fail*. Then come the other problems to be solved.

Eliezer Yudkowsky · Got less hateful places to be, toodles.

Mark Waser Eliezer Yudkowsky - You persist in obfuscating and not clearly answering even simple questions that require a one word responses (i.e. reduce or eliminate). It looks, however, like you answered “reduce” (you’re probably reducing that exact form of risk)—which shows a total lack of understanding of what mathematical provability is.

The special cases for which facts are provable is when accepted input is limited and all output is defined (i.e. it is not an arbitrary process). Whether the program or the input is generated randomly or by an adversary is a total non sequitur.

You can prove CPU designs correct because CPUs have limited input (binary instructions and binary data for the most common CPUs) and responses (only producing fixed specific outputs for any given set of inputs). Similarly, there are a number of programming languages that are not Turing complete (Adga, Charity, Epigram, etc.) which you can prove correctness in.

Neither of these things is true for your proposed design so your entire argument is a series of strawman while, once again, you didn’t answer my question. The question was not about CPUs. It is about your design which emphatically does not have limited input and fixed specific output.

Rice’s theorem says you *CANNOT* prove your design correct. Unless you can show why Rice’s theorem does not apply TO YOUR DESIGN, all your arguments against “unproven” designs (inflated as they may be) apply to your design as well.

By Mark Waser on Dec 18, 2012 at 5:11am

Eliezer states that “an AI which *doesn’t* prove changes correct is *guaranteed* to go wrong in a billion sequential self-modifications because it has a conditionally independent failure probability each time. An AI that proves changes correct has a *non-zero* chance of *actually working* if you get everything else right; the argument is that non-change-proving AIs are effectively *guaranteed to fail*. Then come the other problems to be solved.”

But by this kind of argument, biological organisms would all be “guaranteed to fail” ....  But they are quite successful, actually.  They just don’t retain the same exact static goals—they evolve and develop in fundamental ways….  Eliezer, in this paragraph, appears to view this as a bug, but one can view it as a feature

By Ben Goertzel on Dec 21, 2012 at 10:33am

Mark, it seems that your principle ” one simple rule/requirement – “to suppress selfishness and make cooperative living possible” ” ... could easily be interpreted as valuing borg-like hive minds over anything else…

It seems that any society of individuals is likely to have a certain (often creative) tension btw self-focus and cooperation-focus…

By Ben Goertzel on Dec 21, 2012 at 10:46am

Ben,

I agree strongly that “any society of individuals is likely to have a certain (often creative) tension btw self-focus and cooperation-focus.”  I think that that is a very good thing.

I disagree equally strongly that “suppressing selfishness and making cooperative living possible” means valuing borg-like hiveminds.  First, there is an extreme difference between self-interest and selfishness that followers of Ayn Rand want to pretend doesn’t exist.  Individuals MUST be self-interested for society to function effectively since they are best equipped to identify and fulfill their own needs (except in the rare cases where their skills are so valuable that it is better for others to fulfill their needs while they do their thing for society as a whole).  But they don’t need to be so selfish that they can be called out as violating society’s norms—which certainly do allow for justifiable self-interest.

Second, monoculture can quickly become fatal when changes occur and problems arise.  Diversity is essential in effective problem-solving and group survival.  The borg have a ton of brainpower and are incredibly quick to adapt but are still beaten by diversity which already has solutions in place before problems are proposed.

Third, I have no desire to be like everyone else or have everyone else be like me.  That’s no fun.  grin

Finally, the goal which I generally propose to hold a society together is to maximize goal fulfillment for its members in terms of number and diversity of both goals and goal-seekers.  Being borg definitely does NOT maximize the fulfillment of this goal.

Thanks for commenting!  It’s great to see you here!

    Mark

By Mark Waser on Dec 23, 2012 at 1:37pm

The coherent extrapolated vision of humanity as a whole would I think be rather lower order stuff that most of humanity could agree on.  I don’t know of anything inherent in reality that would make the CEV actually the best for humanity, much less for anything beyond this one planet’s dominant technological species. 

Isn’t there a rub from the beginning when you say “if we knew more, thought faster, were more the people we wished we were..”?  If in other words we were different beings than in fact we really are?  Being the beings we are aren’t even our wishes for what we would be at least suspect?  If you took this from all human beings how would this not be an idealization of normal human evolved psychological proclivities?  Are you sure you want an super-powered AGI to enforce this rather bizarrely formulated mix?  Are you sure that would be “Friendly”?  I think it would be a living nightmare.

By Samantha Atkins on Mar 23, 2013 at 6:33pm


Leave a Comment:

Note We practice Buddhist Right Speech in our communication. All comments must be polite, friendly, and on topic.







What color is a blue sky?



Subscribe

Enter your email address:

Books

Acidexia
Acidexia
Zero State: Year Zero
Zero State: Year Zero
The Hedonistic Imperative
The Hedonistic Imperative
More Books
Videos
Brave New World with Stephen Hawking - Episode 2: Health
Brave New World with Stephen Hawking - Episode 2: Health
‘Brain Pacemakers’ Zap Brain to Ward Off Alzheimer’s
‘Brain Pacemakers’ Zap Brain to Ward Off Alzheimer’s
Ido Bachelet - How Will Nanobots Change Medicine?
Ido Bachelet - How Will Nanobots Change Medicine?
More Videos