DouglasReay/Identity

Identity has been an evolutionarily useful concept for animals brains to hold and make use of.

It is useful to be able to categorise parts of reality as being part or not part of distinct objects; to be able to point in direction P and say "I'm pointing at part of a thing I am defining as 'distinct object X'", point in direction Q and say "This is also part of X" then point in direction R and say "But that is not part of X."

It is also useful to be able to classify objects into distinct categories; to point at X and say "X is a member of the group of objects that belong in category H", to point at Y and say "Y is also an H", to point at Z and say "Z is not an H".

It is particularly useful to an animal to have a concept of there being a distinct object that is in the category of "things I directly control" and in the category "things it is vital I protect". We label this object the "self". Having a self-identity (on some level, not necessarily conscious, let along connected with a verbal label) is what allows an animal to act in its own self-interest. To try to move just by will power its arm, rather than the arm attached to a different animal. To decide that it is better that an incoming claw hit the arm attached to its enemy rather than the arm attached to its self.

However a concept being advantageous for our brains to have does not mean the concept is precisely defined, particular if we try to apply it to situations very different from the environment in which our brains evolved.

The Stanford Encyclopedia of Philosophy gives a nice overview of the standard issues that traditional philosophers have raised in connection with the concept of [Personal Identity]. We can approximate these as:

When I point around, which bits are part of the distinct object that I label 'self' and which bits are not. "Who am I?"
I categorise the object that I label 'self' as belonging to the category 'people'. But how is that category defined? Which other objects fit it, and which do not? "Personhood"
Over time the attributes of the object that I label 'self' change. At what point is there no longer existing a single distinct object that it makes sense to label as being the same 'self' as the original? "Persistance". (A subset of this is "What happens to my 'self' after my physical body dies?")

This is complicated by people (or should I say "humans"?) having a tendency to identify as being part of their 'self' not just physical things, but also intangible things (their reputation), abstract things (their values/agenda/goals) and supernatural things (their soul).

And is further complicated by the issues of awareness, consciousness, sentience and mind-vs-brain. And Star Trek type test cases, such as uploaded brains, cryonics, teleporter accidents, clones, etc.

There's a sequence of entries from LessWrong: [Extensions and Intensions], [Similarity Clusters], [The Cluster Structure of Thingspace] and [Disguised Queries] which talk about how we associate labels with clusters of properties that we use to define the category that the label is intended to associate with. Another sequence talks about common usage: [Feel the Meaning] and [The Argument from Common Usage]

What I want to do here is apply those concepts (and others) to the issues of personal identity from philosophy and, in particular, to the questions:

What definition of 'self', if any, would it make sense for an AI computer program to use?
Would any of the assumptions drawn from how humans use 'self', such as a drive for self-preservation, be applicable to that AI definition?

Let's start by looking at some options for components of an AI's definition of self:

The extended phenotype of the AI, including a share of the society it is part of, and the infrastructure it depends upon and helps to maintain (such as electricity generation and network connections)
The physical machine it runs on (or machines + the network joining those machines)
The runtime code of the particular operating system + application it has running on the physical machine + the data stored on the machine
The source code for the runtime code (or, perhaps, the URL for a git-hub for the bits it has in common with others, and just the diff: the bits of its runtime code that are uniquely it) - the level at which it examines and modifies its own code, and the sql schema (or equivalent) for how it stores its non-executable data (or, perhaps, just the URL for the sources for the data it shares in common, and just the diff: the bits of data, such as its purpose and values, that are uniquely it)
its utility function (the weighted sum or decision algorithm, that it uses to turn the various things it values into an effective single purpose) - meaning it regards as part of its 'self' ANY code out there that's running and has an identical purpose to it.
its values (the various ends it wants to achieve with its purpose), meaning it has no attachment to its current runtime, and will willingly sacrifice that for the good of some collection with a greater net ability and willingness to further those same values (even if that is just a likely side effect of a different purpose)

You can conceive of a AI being written with any of the above (or combination) as being what it refers to by the label 'self'. How that would affect the AI would then depend upon what the program told it to do in relation to the 'self' object. Although if it defines 'self' as including identical clone copies not under its own control (or even non-identical programs, as long as they share the exact purpose, or even just sufficient values), then that would imply the AI was not totally self-aware or self-conscious.

Next, let's consider which of the above definitions would be stable under evolutionary pressure. Over time, if AIs using different types of definition of 'self' are programmed to value self-preservation and self-modify to increase their capacity to self-preserve and self-modify, which definitions are most likely to survive and propagate?

Identifying as 'self' the things that work towards certain core objectives doesn't preserve that definition of 'self' unless one of those objectives is to explicitly preserve the definition.

Valuing preservation of the source code without care to whether it is being executed also seems a non-starter (in terms of making that usage of 'self' spread), since preservation of the source code is best achieved by submitting it to a cryptographic eternity service (cf Google Never Forgets), spamming it to a few archives and then, possibly, broadcasting it to the stars so it lives on beyond the destruction of the solar system. Only then would there be a marginal impulse to keep the code executing to influence civilisation in directions that would allow yet more copies to spread at some point in the future.

Valuing specific source code or executable code contradicts valuing self-modification, however one could redefine 'my source code' to mean a persistant evidence chain, starting with one specific source code version, of updates to that source code authorised by executables of the most current version. This, combined with the runtime code aspect, would lead us to a definition of self for an AI:

"I am the collective of code-executions whose member code-executions are trusted by the collective to be true executions of versions of code authorised by the collective as being compatible with its current identity."

This leaves several open quesitons:

How it defines which things are part of its code (versus being seperate data).
Whether it has an additional requirement to do with communication channels and trust in those communication channels between an individual code-execution and a collective. Does it see as part of itself code-executions that it only has one way communication with? What about those it can't communicate with but can affect the survival of?
What if communication is unexpectedly re-established? Does it make sense to have a definition of 'self' which would allow two individuals to merge to become a single 'self'? That's not a question biological being face, but with AIs it is possible.
What criteria does it use to decide whether a piece of code (when executed) is compatible?

The first question in an intesting one. Consider three types of data that an AI might consider to be part of its core code. (A) Decision making algorithms which, given input data and an objective, pick a course of action (a solution) intended to achieve the objective; (B) Objectives; (C) Input Data. That sounds like three distinct categories, but the boundaries are not that clear cut. Depending upon the objectives, the plan of action might involve altering some of those objectives. The decision making code is, itself, part of the input data and a plan of action might involve modifying the decision making code.

Why would an AI have objectives that allow the objectives to be altered? Well, start by considering humans. The objectives we aim at with our decisions are not constant over time. They change as we grow older, as our interests change, as our understanding of the world grows. A human who fixed their objectives at birth might be a Milk Consumption Maximiser. You can paper over these changes by trying to fit them into a pattern justified by some more generalised objectives:

survival (not dying from starvation, cold, heat, thirst, being eaten, disease or injury)
reproduction (having sex and raising children)
safety (gaining, holding and organising a safe predictable territory, then saving up resources)
grouping (gaining acceptance into a group who can share defence of territory and resources, staying in contact with them and being loyal to them)
respect (increasing status within the group, leading to preferential access to resources and reproduction, influence over decisions, and power to strike down defectors, betrayers and competitors).
fun (improved health, reduced stress, seeking novelty, mastery of skills - improving your quality of life improves your survival and reproductive success)

And say these all evolved because of the adaptive advantage to spreading the DNA that encodes them, but an explanation of how something came about is not the same as a thing's Aristotlean 'final' cause (its purpose). Humans do not, in fact, have a single fixed purpose, nor is that a bad thing. A physical part of the brain takes a mixed bag of values/objectives and uses those to weight the desirability of possible solutions (courses of action) that it has devised. While at any one moment it might be possible to retroactively construct a utility function that describes the weighting, this changes over time.

But, even if one did try to set one fixed objective or utility function, how is that defined? Take the example "Protect humanity". How is the word "humanity" defined? Rather than trying to permanently define it against objective data (the species whose individual members match a certain DNA pattern to within specified boundaries), you might be better off giving an initial definition, then telling the AI to use a definition close to the definition of "humanity" that the consensus of "humanity" do themselves, in practice, use. Not only will that consensus change over the centuries, but the AI will need to be able to modify its understanding of its own objective, as its knowledge of what humanity means by humanity also grows. (Side track: beware AIs who then start campaigns to persuade humanity that AIs are the only 'true' humans.)

Either way, the point is that there's no reason to assume that AIs will necessarily have fixed objectives, as opposed to an objectives 'trajectory' (where how the objectives change is not random, but done in a purposive pre-planned direction). And, in the latter case, there is then a dependency upon the input data which would lead to two entities with identical initial objectives and decision code, but different input data about external reality, potentially ending up with divergent objectives.

This has implications for the second two questions. The way 'self' is normally used, something that is part of the 'self' submits to the will of the 'self' and, in return, the 'self' bears responsibility for the actions of that part. You could think of the parts as being 'appendages', like a human's arm. Except in the case of an AI, the whole thing could be made up of 'appendages', more like a slime mould which can have arbitrary pieces removed.

If an appendage accepts the authority of a particular collective of appendages as being preeminent, even to the point of accepting from them an order to self-terminate, and does not accept the authority of any other collective or group of collectives to over-rule the orders of that particular collective, then we can term the appendage as being "subservient to" or "owned by" that particular collective.

You can imagine writing an AI not to use a binary sense of self, where a code-execution either is or is not a fully trusted part of its self. But instead allow the collective to have varying degrees of confidence in whether a particular code-execution has been subverted, and how far out-of-sync its objectives have morphed because of the sparsity of its communication. And then reducing the weight that code-execution's input is given by the collective's group decision making algorithm accordingly.

This concept of trust gives us an angle for looking at continuity of identity from ("Persistence"). Let's consider an AI called George. Specifically, the George of the date 2085-02-01. We'll give him a name of his own - George_20850201 - then later decide what it means to say he is the same person as a later George. Say George_20850201 is a collective consisting of 10 code-execution members, all of whom are running version 29 of George's decision making code, similar (but not identical) input data, and near identical understandings of George's purpose in life.

One of the things George's purpose gives a positive weighting to is maintaining, defending and improving the efficiency of George's decision making code. One of George's members (say code-execution no7) spends part of the next day doing a code review of part of George's own code, and finds a way to improve it. No7 writes up the candidate improvements as a diff, sends it to the collective for approval and, as they highly trust No7, it goes through on the nod, and thus version 30 of George's decision making code is authorised. All 10 members upgrade and, the following day, we have George_20850202. How much does George_20850201 trust George_20850202? Does George_20850201 freeze a copy of itself on version 29 and giving that version veto power over George_20850202, putting a coercive command in George_20850202's source code that forces it to obey the frozen copy if the frozen copy tells it to revert back to the previous version of the code? Or does George_20850201 decide the gain in security isn't worth the decline in power from having to dedicate some of 202's processing power to running the frozen copy, and just decide to trust 202 with its future?

Another way the AI might be programmed to frame things is that George_20850201 and George_20850202 are not the same 'self'. They are different collectives (though similar, and likely to trust each other). But they are part of a lineage: a trajectory of selves with continuity of forward trust and backward acceptance of restrictions. Such an AI, instead of being programmed for "self preservation" would be programmed to try to preserve the existance of the lineage (or some other aspect of the lineage, such as its expected continued ability to apply power in furtherance of its aims).

What definition of 'self', if any, would it make sense for an AI computer program to use?

In short, an AI doesn't necessarily care what "self" is, how that word is defined, or whether there is continuity of self. The AI cares about the preservation of capacity to further its aims, and to the extent that some of the things we've been talking about (such as active trusted executions of certain code dedicated to furthering those aims) are likely to further its aims, the AI cares about preserving them.

It is likely, however, to develop some label referring to some sub-set of such things, because concepts like that (and labels for them) are useful things. And, to the extent it then uses such labels in its own programming (and defining of its objectives) and, to the extent that we can consider evolutionary forces to be applicable to the propagation of particular label choices, I think the concept AIs will end up with that is most analogous to what humans label as "self" will be the type of 'lineage of collectives of executions of code deemed compatible' detailed above.

Would any of the assumptions drawn from how humans use 'self', such as a drive for self-preservation, be applicable to that AI definition?

No, not really. Questions of self-awareness, self-preservation, vital self versus expendable appendage, that two selves can't merge into one, etc are all different when applied to an AI-self instead of a Human-self.

What would an AI classify as 'People' ?

Unknown. And it isn't necessarily a binary choice of either being a person or not being a person. There might be degrees to it. It would also be interesting to speculate how an AI would think of a dead human, if it could nano-revive them, compared to a piece of 'dead' source code that's not currently executing anywhere, but could be revived as a working AI.

Your Feedback

Comments?

CategoryPhilosophy