part is an important and useful idea. But Koppel's algorithm for achieving it is conceptually

problematic. Suppose the program/data pairs (y1,z1) and (y2,z2) both cause a Turing machine to

output x The structure of intelligence - 4, but whereas %y1%=50 and %z1%=300, %y2%=250 and %z2%=110. Since

%y1%+%z1%=350, whereas %y2%+%z2%=360, (y2,z2) will not be selected in Step 1, which

searches for those pairs (y,z) that minimize %y%+%z The structure of intelligence - 4%. What if, in Step 2, (y1,z1) is chosen as

the pair with maximum %y%? Then the sophistication of x will be set at %y1%=50. Does it not

seem that the intuitively much more sophisticated program y2, which The structure of intelligence - 4 computes x almost as well

as y1, should count toward the sophistication of x?

In the language of pattern, what Koppel's algorithm does is:

1) Locate the pairs (y,z) that are the The structure of intelligence - 4 most intense patterns in x according to the definition of

pattern with % %, a=b=1, c=0

2) Among these pairs, select the one which is the most intense pattern in x according The structure of intelligence - 4 to the

definition of pattern with % %, a=1, b=c=0.

It applies two different special cases of the definition of pattern, one after the other.

How can all this be modified to accommodate examples The structure of intelligence - 4 like the pairs (y1,z1), (y2,z2) given

above? One approach is to look at some sort of combination of %y%+%z% with %y%.

%y%+%z% measures the combined length of program and data, and The structure of intelligence - 4 %y% measures the length

of the program. What is desired is a small %y%+%z% but a large %y%. This is some motivation

for looking at (%y%+%z%)/%y%. The smaller %y%+%z The structure of intelligence - 4% gets, the smaller this quantity gets;

and the bigger %y% gets, the smaller it gets. One approach to measuring complexity, then, is to

search all (y,z) such that x=y*z, and pick the The structure of intelligence - 4 one which makes (%y%+%z%)/%y% smallest. Of

course, (%y%+%z%)/%y% = 1 + %z%/%y%, so whatever makes (%y%+%z%)/%y% smallest

also makes %z%/%y% smallest. Hence, in this context, the following is natural The structure of intelligence - 4:

Definition 3.12: The crudity of a pattern (y,z) is %z%/%y%.

The crudity is simply the ratio of data to program. The cruder a pattern is, the greater the

proportion of data to The structure of intelligence - 4 program. A very crude pattern is mostly data; and a pattern which is mostly

program is not very crude. Obviously, "crudity" is intended as an intuitive opposite to

"sophistication"; however, it is The structure of intelligence - 4 not exactly the opposite of "sophistication" as Koppel defined it.

This approach can also be interpreted to assign each x a "natural program" and hence a

"natural class". One must simply look at the The structure of intelligence - 4 pattern (y,z) in x whose crudity is the smallest. The

program y associated with this pattern is, in a sense, the most natural program for x.

^ LOGICAL DEPTH

Bennett [9], as mentioned above, has proposed The structure of intelligence - 4 a complexity measure called "logical depth",

which incorporates the time factor in an interesting way. The KCS complexity of x measures

only the length of the shortest program required for computing The structure of intelligence - 4 x -- it says nothing about how

long this program takes to run. Is it really correct to call a sequence of length 1000 simple if it

can be computed by a short program which takes a The structure of intelligence - 4 thousand years to run? Bennett's idea is to

look at the running time of the shortest program for computing a sequence x. This quantity he

calls the logical depth of the sequence.

One of The structure of intelligence - 4 the motivations for this approach was a desire to capture the sense in which a

biological organism is more complex than a random sequence. Indeed, it is easy to see that The structure of intelligence - 4 a

sequence x with no patterns in it has the smallest logicaldepth of any sequence. The shortest

program for computing it is "Print x", which obviously runs faster than any other program

computing a sequence of The structure of intelligence - 4 the same length as x. And there is no reason to doubt the hypothesis

that biological organisms have a high logical depth. But it seems to us that, in some ways The structure of intelligence - 4,

Bennett's definition is nearly as counterintuitive as the KCS approach.

Suppose there are two competing programs for computing x, program y and program y'. What

if y has a length of 1000 and a The structure of intelligence - 4 running time of 10 minutes, but y' has a length of 999 and a

running time of 10 years. Then if y' is the shortest program for computing x, the logical depth of

x is ten years. Intuitively, this The structure of intelligence - 4 doesn't seem quite right: it is not the case that x fundamentally

requires ten years to compute.

At the core of Bennett's measure is the idea that the The structure of intelligence - 4 shortest program for computing x is the

most natural representation of x. Otherwise why would the running time of this particular

program be a meaningful measure of the amount of time x requires to evolve The structure of intelligence - 4 naturally? But one

may define the "most natural representation" of a given entity in many different ways. Bennett's

is only the simplest. For instance, one may study the quantity dC(y,z) + e The structure of intelligence - 4%z%/%y% +

f(%y%+%z%), where d, e and f are positive constants defined so that d+e+f=3. The

motivation for this is as follows. The smaller %z%/%y% is, the The structure of intelligence - 4 less crude is the pattern (y,z).

And, as indicated above, the crudity of a pattern (y,z) may be interpreted as a measure of how

natural a representation it is. The smaller C(y The structure of intelligence - 4,z) is, the less time it takes to get x out of (y,z).

And, finally, the smaller %y%+%z% is, the more intense a pattern (y,z) is. All these facts

suggest The structure of intelligence - 4 the following:

Definition 3.13: Let m denote the smallest value that the quantity

dC(y,z) + e%z%/%y% + f(%y%+%z%) assumes for any pair (y,z) such that x=y*z (assuming

there The structure of intelligence - 4 is such a minimum value). The meaningful complexity of x may then be defined as the

time complexity C(y,z) of the pattern (y,z) at which this minimum m is attained.

Setting The structure of intelligence - 4 d=e=0 reduces the depth complexity to the logical depth as Bennett defined it. Setting

e=0 means that everything is as Bennett's definition would have it, except that cases such as the

patterns The structure of intelligence - 4 (y1,z1), (y2,z2) described above are resolved in a more intuitive matter. Setting f=0 means

that one is considering the time complexity of the most sophisticated -- least crude, most

structured -- representation of x The structure of intelligence - 4, rather than merely the shortest. And keeping all the constants

nonzero ensures a balance between time, space, and sophistication.

Admittedly, this approach is not nearly so tidy as Bennett's. Its The structure of intelligence - 4 key shortcoming is its failure

to yield any particular number of crucial significance -- everything depends on various factors

which may be given various weights. But there is something to be said for considering all The structure of intelligence - 4 the

relevant factors.

3.4 Structural Complexity

We have discussed several different measures of static complexity, which measure rather

different things. But all these measures have one thing in common: they work by singling out the

one pattern which minimizes The structure of intelligence - 4 some quantity. It is equally interesting to study the total amount of

structure in an entity. For instance, suppose x and x% both have KCS complexity A, but whereas

x can only The structure of intelligence - 4 be computed by one program of length A, x% can be computed by a hundred totally

different programs of length A. Does it not seem that x% is in some sense more complex than x The structure of intelligence - 4,

that there is more to x% than to x?

Let us define the structure of x as the set of all (y,z) which are approximate patterns in x The structure of intelligence - 4, and

denote it St(x). Then the question is: what is a meaningful way to measure the size of P(x). At

first one might think to add up the intensities [1+d(y*z,x)][a%y The structure of intelligence - 4%+b%z%+cC(y,z)] of all the

elements in P(x). But this approach has one crucial flaw, revealed by the following example.

Say x is a sequence of 10,000 characters, and (y The structure of intelligence - 41,z1) is a pattern in x with %z1%=70,

%y1%=1000, and C(y1,z1)=2000. Suppose that y1 computes the first 1000 digits of x from the

first 7 digits of z1, according to a certain algorithm The structure of intelligence - 4 A. And suppose it computes the second 1000

digits of x from the next 7 digits of z1, according to the same algorithm A. And so on for the third

1000 digits of z2, etc The structure of intelligence - 4. -- always using the same algorithm A.

Next, consider the pair (y1,z1) which computes the first 9000 digits of x in the same manner as

(y2,z2), but computes the last 1000 digits of The structure of intelligence - 4 x by storing them in z1 and printing them after the

rest of its program finishes. We have %z2%=1063, and surely %y2% is not much larger than

%y1%. Let's say %y2%=150. Furthermore, C(y The structure of intelligence - 42,z2) is certainly no greater than C(y1,z1): after all,

the change from (y1,z1) to (y2,z2) involved the replacement of serious computation with simple

storage and printing.

The point The structure of intelligence - 4 is that both (y1,z1) and (y2,z2) are patterns in x, but in computing the total amount of

structure in x, it would be foolish to count both of them. In general, the The structure of intelligence - 4 problem is that different

patterns may share similar components, and it is unacceptable to count each of these components

several times. In the present example the solution is easy: don't count (y The structure of intelligence - 42,z2). But one may also

construct examples of very different patterns which have a significant, sophisticated component

in common. Clearly, what is needed is a general method of dealing with similarities between

patterns.

Recall that Ia The structure of intelligence - 4(v%w) was defined as the approximate version of the effort required to compute

v from w, so that if v and w have nothing in common, Ia(v,w)=Ia(v). And, on The structure of intelligence - 4 the other хэнд, if v

and w have a large commoncomponent, then both Ia(v,w) and Ia(w,v) are very small. Ia(v%w) is

defined only when v The structure of intelligence - 4 and w are sequences. But we shall also need to talk about one program

being similar to another. In order to do this, it suffices to assume some standard "programming

language" L, which assigns to each program The structure of intelligence - 4 y a certain binary sequence L(y). The specifics of L

are irrelevant, so long as it is computable on a Turing machine, and it does not assign the same

sequence to any The structure of intelligence - 4 two different programs.

The introduction of a programming language L permits us to define the complexity of a

program y as Ia(L(y)), and to define the complexity of one program The structure of intelligence - 4 y1 relative to another

program y2, as Ia(L(y1)%L(y2)). As the lengths of the programs involved increase, the differences

between programming languages matter less and less. To be precise, let L and L The structure of intelligence - 41 be any two

programming languages, computable on Turing machines. Then it can be shown that, as L(y1)

and L(y2) approach infinity, the ratios Ia(L(y1))/Ia(L1(y1)) and The structure of intelligence - 4 Ia(L(y1)%L(y2))/Ia(L1(y1)%L1(y2))

both approach 1.

Where z is any binary sequence of length n, let D(z) be the binary sequence of length 2n

obtained by The structure of intelligence - 4 replacing each 1 in z with 01, and each 0 in z with 10. Where w and z are any two

binary sequences, let wz denote the sequence obtained by placing the sequence 111 at the end The structure of intelligence - 4 of

D(w), and placing D(z) at the end of this composite sequence. The point, as usual, is that 111

^ THE STRUCTURE OF INTELLIGENCE

50

cannot occur in either D(z) or D(w The structure of intelligence - 4), so that wz is essentially w juxtaposed with z, with 111 as a

marker inbetween.

Now, we may define the complexity of a program-data pair (y,z) as Ia(L(y)z), and we may

define the The structure of intelligence - 4 complexity of (y,z) relative to (y1,z1) as Ia(L(y)z%L(y1)z1). And, finally, we may

define the complexity of (y,z) relative to a set of The structure of intelligence - 4 pairs {(y1,z1),(y2,z2),...,(yk,zk)} to be

Ia(L(y)z%L(y1)z1L(y2)z2...L(yk)zk). This is the tool we need to make sense of the The structure of intelligence - 4 phrase "the total

amount of structure of x".

Let S be any set of program-data pairs (x,y). Then we may define the size %S% of S as the

result of the following The structure of intelligence - 4 process:

Algorithm 3.1:

Step 0. Make a list of all the patterns in S, and label them (y1,z1), (y2,z2), ..., (yN,zN).

Step 1. Let s1(x)=Ia(L(y1)z1).

Step 2. Let s2(x The structure of intelligence - 4)=s1(x)+Ia(L(y2)z2)I(L(y1)z1).

Step 3. Let s3(x)=s2(x)+Ia(L(y3)z3%L(y1)z1L(y2)z2))

Step 4. Let s4(x)=s3(x)+Ia The structure of intelligence - 4(L(y4)z4%L(y1)z1L(y2)z2L(y3)z3))

...

Step N. Let %S%=sN(x)=sN-1(x)+

Ia(L(yN)zN%L(y1)z1L(y The structure of intelligence - 42)z2)...L(yN-1)zN-1)

At the k'th step, only that portion of (yk,zk) which is independent of {(y1,z1),...,(yk-1,zk-1)} is added

onto the current estimate of %S%. For instance The structure of intelligence - 4, in Step 2, if (y2,z2) is independent of (y1,z1), then

this step increases the initial estimate of %S% by the complexity of (y2,z2). But if (y2,z2) is highly

dependent on (y The structure of intelligence - 41,z1), not much will be added onto the first estimate. It is not difficult to see that

this process will arrive at the same answer regardless of the order in which the (yi,zi) appear The structure of intelligence - 4:

Theorem 3.2: If Ia=I, the result of the above algorithm is invariant under permutation of the

(yi,zi).

Where St(x) is the set of all patterns in x, we may now The structure of intelligence - 4 define the structural complexity of x

to be the quantity %St(x)%. This is, I suggest, the sense of the word "complexity" that one uses

when one says that a person is more complex than a The structure of intelligence - 4 tree, which is more complex than a

bacterium. In a way, structural complexity measures how many insightful statements can

^ THE STRUCTURE OF INTELLIGENCE

51

possibly be мейд about something. There is much more The structure of intelligence - 4 to say about a person than about a tree,

and much more to say about a tree than a bacterium.

ALGEBRA AND TOPOLOGY OF PATTERN SPACE

From the definition of structural complexity we The structure of intelligence - 4 may obtain the extremely useful notion of

structural similarity, which we shall also refer to as pattern-distance. As the name suggests,

this is a measure of how "close" two entities are, structurally speaking. We denote The structure of intelligence - 4 the structural

similarity of x and y by d#(x,y), and define it as the structural complexity of the symmetric

difference of St(x) and St(y). It measures the The structure of intelligence - 4 amount of pattern in x but not y, or in y but not x.

This concept will play a central role in our treatment of memory and analogy.

The following concept, though it refers only The structure of intelligence - 4 to structure, not structural complexity, is equally

essential.

Definition 3.14: The emergence between x and y is defined as

Em(x,y) = St(xUy) - St(y) - St(y)

The intuitive meaning of The structure of intelligence - 4 emergence should be clear: it is what is present in the whole but not

the parts. In the realm of pattern, the whole is in general more than the sum of the parts, in the

sense The structure of intelligence - 4 that not all the patterns in the whole are patterns in the parts.

Finally, the following idea, though not so crucial, sheds some light into certain matters.

Definition 3.15: (y1,z1) is The structure of intelligence - 4 said to be complementary to (y2,z2) in x tothefollowing extent:

1 - IN(x,y1,z1)/[IN(L(y1),y2,z2) + IN(z1,y2,z2)]. If y1 is complementary to y2 in x The structure of intelligence - 4 and y2 is

complementary to y1 in x, then y1 and y2 are said to be complementary in x.

Complementarity, intuitively, is a very weak form of negation. If y1 is highly complementary

to y2 in The structure of intelligence - 4 x, that means that although one can effectively represent x in terms of either y1 or y2,

once one has represented x in terms of y1, one cannot effectively The structure of intelligence - 4 represent the elements of this

representation in terms of y2. If y1 and y2 are both entirely "independent" in St(x), this will

usually be the case.

Crude intuitive examples of this phenomenon may be drawn from The structure of intelligence - 4 nearly any field of study.

For instance, in quantum mechanics one may represent an electron as a wave or as a particle, but

once one has represented it as either one cannot interpret The structure of intelligence - 4 one's representation in terms of the

other one. Or: one may view the American economy as a battleground of class struggle, or as an

arena of gradual evolution by natural selection through The structure of intelligence - 4 competition -- but once one has

diagrammed the economy in terms of conflicting classes, one cannot analyze the diagram in

Social Darwinist terms; and vice versa. Of course, these are little more than The structure of intelligence - 4 analogies; in order to

make such applications at all precise one would have to provide exact definitions of the terms

involved.

^ THE STRUCTURE OF INTELLIGENCE

52

COMPUTABLE APPROXIMATIONS

The structural complexity of x is a measure of the total The structure of intelligence - 4 size of the set of all regularities in x.

Unfortunately, as I noted above, it is uncomputable. Step 0 of Algorithm 3.1, in its full

generality, is not programmable. Therefore, in order to The structure of intelligence - 4 apply the notion of structural complexity

to real-world problems, it is necessary to restrict oneself to some particular class of patterns. One

way to do this is via schematic structural complexity (SSC), a The structure of intelligence - 4 simple notion that will lead us

naturally to more interesting complexity measures such as the Lempel-Ziv complexity and the

n'th order Boolean complexities.

In the theory of genetic classifier systems (Goldberg, 1989), a The structure of intelligence - 4 schema is a sequence such as

1001**1011101010*1**111, where * is a "don't care" symbol signifying that either a 0 or a 1

may occupy the indicated place. Let us consider a slightly more general notion of schema, in

which The structure of intelligence - 4 each "don't care" symbol has a tag (*1,*2, etc.), and each "don't care" symbol may stand

for a binary sequence of any length. And let us define a schematizer as a function The structure of intelligence - 4 which maps

schema into binary sequences, by inserting in place of each occurence of the "don't care" symbol

*i some binary sequence wi. Each pair (schematizer, schema) is a schematic program; it defines

a The structure of intelligence - 4 unique binary sequence.

The schematic structural complexity (from here on, SSC) of a binary sequence may be

described in terms of Algorithm 3.1 with a modified initial step.

Algorithm 3.2:

Step 0'. Make a The structure of intelligence - 4 list of all schematic programs that are patterns in x, and label them y1,...,yN

Steps 1-N. As in Algorithm 3.1

The result of this algorithm may be called the schematic size of x, the well-definition The structure of intelligence - 4 of which

follows from the following generalization of Theorem 2.1.

Theorem 3.1. The result of Algorithm 4.1 is invariant with respect to permutation of the yi.

In analogy to structural complexity, the SSC of a The structure of intelligence - 4 sequence x may be defined as the schematic

size of the set of all patterns in x, S(x).

A schematic program represents the simplest sort of compression: it compresses an image or

sequence The structure of intelligence - 4 by abstracting repeated figures. A more flexible approach may be obtained by

considering more general programs. For instance, one might define a metaschematizer as a map

from schema to schema, which takes in a schema The structure of intelligence - 4 and replaces each occurence of the "don't care"

symbol *i with some schema Si. A metaschematic program is then any program whose action

may be represented in the form schematizer(m The structure of intelligence - 41(m2(...(mk(schema))...)), where the mi are

metaschematizers.

Given this, one may define the "metaschematic structural complexity" of a sequence in an

obvious way. This complexity measure recognizes not only repeated figures, but The structure of intelligence - 4 repeated figures

^ THE STRUCTURE OF INTELLIGENCE

53

within repeated figures and so on. It is new in detail but not in concept -- it is a very close

approximation to the Lempel-Ziv complexity (Lempel and Ziv, 1978).

The The structure of intelligence - 4 Lempel-Ziv complexity misses a great deal of structure. By definition, no computable

approximation to the structural complexity can capture every type of structure. However, there is

a large gap between repetitions and The structure of intelligence - 4 completely general patterns. One way to fill this gap is with

the n'th order Boolean structural complexity. This complexity measure may be developed by

analogy to the schematic and metaschematic structural complexities.

We The structure of intelligence - 4 will need to use vector Boolean operations, which act coordinatewise on Boolean

sequences. For example the vector negation of 1010110 is 0101001; the vector conjunction of

11010 and 01011 is 11011. And we will say that a Boolean function The structure of intelligence - 4 (of an arbitrary number of

variables) is of order n if it can be written using less than n+1 disjunctions and conjunctions.

An n'th order Boolean schematic program for computing an array x may The structure of intelligence - 4 be defined as a

pair (schema, schematizer), where the schema is a sequence composed of of 0's, 1's, unlabeled

"don't care" symbols *, and k different labeled "don't care" symbols *1,...,*k. The The structure of intelligence - 4 schematizer

consists of: 1) a memory, which consists of a set of l binary sequences, and 2) a collection of k

vector Boolean functions f1,...,fk of order n, each of which takes exactly l The structure of intelligence - 4 sequences as

arguments. The k'th array Boolean function computes the array to be substituted for the "don't

care" symbol *i.

From the n'th order Boolean schematic programs one may obtain n'th The structure of intelligence - 4 order Boolean

metaschematic programs in a natural way. The n'th order Boolean structural complexity is then

defined in terms of these programs, just as the metaschematic structural complexity is defined in

terms of The structure of intelligence - 4 ordinary metaschematic programs. It should be clear that any pattern can be expressed

as an n'th order Boolean schematic program for some n. Therefore, the n'th order Boolean

programs are one way The structure of intelligence - 4 of forming a bridge between Lempel-Ziv complexity and general

structural complexity.

These approximations are rather technical; they are not as conceptually elegant as the

structural complexity itself. However, it is clear that at least the The structure of intelligence - 4 n'th order Boolean complexity is

relevant to mentality, because no brain can compute Boolean functions of arbitrarily high order.

4

Intelligence and Mind

4.0 The Triarchic Theory Of Intelligence

^ THE STRUCTURE OF INTELLIGENCE

54

Though there The structure of intelligence - 4 is a vast psychological literature on intelligence, it contains surprisingly few

insights into the foundational questions which interest us here: what is intelligence, and how can

it, practically or theoretically, be quantified? The problem is The structure of intelligence - 4 that, as Robert Sternberg has

observed, theories of intelligence are not all theories of the same thing. Rather, they tend to be

theories of different aspects of intelligence. To make matters worse The structure of intelligence - 4, the theorists who propose

these theories rarely make it clear just what aspects of intelligence their theories embrace (1987,

p.141).

The psychology of intelligence has dwelled on the context-specific and the easily measurable The structure of intelligence - 4.

But transcending the bounds of particular contexts is what intelligence is all about; and there is