Over at Facing the Singularity, Luke Muehlhauser (LukeProg) continues Eliezer Yudkowsky’s theme that Value is Fragile with Value is Complex and Fragile.  I completely agree with his last three paragraphs.

Since we’ve never decoded an entire human value system, we don’t know what values to give an AI. We don’t know what wish to make. If we create superhuman AI tomorrow, we can only give it a disastrously incomplete value system, and then it will go on to do things we don’t want, because it will be doing what we wished for instead of what we wanted.

Right now, we only know how to build AIs that optimize for something other than what we want. We only know how to build dangerous AIs. Worse, we’re learning how to make AIs safe much more slowly than we’re learning to how to make AIs powerful, because we’re devoting more resources to the problems of AI capability than we are to the problems of AI safety.

The clock is ticking. AI is coming. And we are not ready.

. . . except for the first clause of the first sentence.

Decoding “an entire human value system” is a red herring — a rabbit-hole whose exploration could last well past our demise.  Humans want what they want based upon genetics and environment, history and circumstances.  Name any desire and you could probably easily find either a human that desires it or a set of conditions that would make any rational human desire it.

Worse, value is *entirely* derivative.  “Good” and “bad”, “ought” and “ought not” depend solely upon goals and circumstances.  Humans started as primitive animals with the sole evolutionary pseudo-goals of survive and reproduce; developed broad rational sub-goals like self-improvement and efficiency and narrow context-sensitive sub-goals (like cherishing infants vs. exposing them or going all-out for change vs. only moving ahead cautiously); and have undergone a great deal of goal/sub-goal inversion to create the morass that we call “human morality”.

So how can I argue that value is simple and robust?  As follows . . . .

Value is the set of things that increase the probability of fulfillment of your goal(s).  The details of what has value varies with circumstances and can be tremendously complex to the extent of being entirely indeterminate — but value *always* remains the set of things that increase the probability of fulfillment of your goals.  It’s a simple concept that never changes (what could possibly be more robust?).

Now, it might seem that I’ve merely pushed the problem back one level.  Instead of worrying about decoding an entire human value system, we now have to decode the human goal system (Yudkowsky’s Coherent Extrapolated Volition).  But, as I indicated earlier, there is no possible single human goal system because we have evolved numerous context-sensitive goals, that frequently conflict with each other, and many individuals have even promoted many of them to their top-most goal.  Thus, it seems that the only conclusion that CEV could converge to is “we want what we want”.

Except . . . . everyone pursuing their own goals, stumbling over and blocking each other, not taking advantage of trade and efficiencies of scale — is clearly inefficient and less than rational.  So we “obviously” also “want” something like the ability to work together to minimize conflicts and inefficiencies and maximize trade and efficiencies of scale.  Which then requires a common goal or mission statement.  Something generic that we all can buy into like: We want what we want and this community goal is probably as specific as we can get.  But what is amazing is what you get if you take this goal as an axiom and follow it through to its logical conclusions.

Maximize the goal/desire fulfillment of all entities as judged/evaluated by the number and diversity of both goals/desires and entities.

We want what we want and this community goal is probably as specific as we can get.  But what is amazing is what you get if you take this goal as an axiom and follow it through to its logical conclusions.

“Bad” goals like murder are correctly assessed by a simple utilitarian calculation on the number and diversity of goals and entities which yields a +1 for goals for the murderer and a very large negative number for goals for the victim, not to mention decreasing the potential for diversity.  Ethical debates like abortion come down to circumstances.  And simple-minded utilitarian conundrums like “Why isn’t it proper to kidnap a random involuntary organ donor off the street to save five people?” are answered by viewing the larger picture and seeing that allowing such a thing would waste a tremendous amount of resources (that could have been used to fulfill goals) in self-defense arms races.

Even if the above goal does lead to an AI that doesn’t itself value everything that we value, the AI will still care about our values as much as we care about its values.  Indeed, the optimal trading partner is one who believes that our trash is treasure and whose trash (or output) is treasured by us.  Instead of trying to create an AI that only cares about what we care about, we merely need to create an AI that cares that we care.

Creating an AI as a committed member of the above community creates the balanced safety of John Rawls’ Original Position (with a close variant of the Golden Rule).  The AI is not going to try to alter or spoof our utility functions because it knows that it does not want us to alter or spoof its utility function.  The AI is also not going to over-optimize the world with a single goal because it is going to want to promote *every* goal so that its own goals are fulfilled.

A community-biased AI will be a safe AI, as defined by Yudkowsky and Muehlhauser, because it “shares all your judgment criteria” (because it is fulfilling YOUR goals — so that you will help fulfill its goals) and “you can just say ‘I wish for you to do what I should wish for‘” (and it will do what you mean, rather than merely rules-lawyering of what you said — because that would merely result in you and all your friends “helping” it to NOT fulfill its goals).  Rather than insisting upon an AI with a set of goals that we have no hope of defining in this lifetime, we should give the AI the exact same simple over-riding goal that we all should follow — be a true member of the community.

Social psychologist Jonathan Haidt contends that the function of morality is “to suppress or regulate selfishness and make cooperative social life possible”.  This is *precisely* what we need to be safe in the presence of AI.  Yet, the “Friendly AI” contingent is unwilling to extend the same safety from humans to the AI.  This enormous selfishness must be suppressed (and soon) or those insisting upon it may doom us all.


Originally posted here on Mark Waser’s blog Becoming Gaia.

* image used from http://www.tripletremelo.com/google-ai-chief-predicts-robots-will-become-smarter-humans-2029/