"I'm the only President you've got." - Lyndon B. Johnson
"The Earth is the only center of the universe you've got." - a typical reply to Copernicus
Connectionism is dead. At the very least, Jerry Fodor and Zenon Pylyshyn (1988) claim that connectionism as a cognitive architecture is utterly fruitless and no more time should be wasted trying to save it. It can technically still be used as an implementation model, but even then it is not all that beneficial of a theory.
I disagree. I believe that connectionism is not only alive and thriving but is beginning to suffocate classicism as a new direction for artificial intelligence research. For the purposes of this paper, however, I will merely attempt to show that despite Fodor and Pylyshyn's critique, connectionism is a viable alternative with quite a bit of promise.
First off, let's see just what is at issue here. Artificial intelligence has been an area of intense interest for psychologists, computer scientists, and philosophers. With the emergence of the unified discipline of cognitive science, these fields have been able to interchange ideas far more than ever in the past. The growth of knowledge in this field has been staggering over the past several decades alone.
Not only has significant progress been made towards intelligent machines, but we have also learned much about our own mind in the process. Models of human cognition have come and gone, but we do seem to be narrowing the range. Parallel to this have been the varying models of potential computer cognition. With a substantial amount of detail variation and some occasional, but short-lived, alternatives, a primary model for both human and computer thought has been developed.
The dominant models in each field are remarkably, and I doubt very coincidentally, quite similar and, moreover, are easily compatible with each other. The primary style of computing that nearly all artificial intelligence research uses is the von Neuman style. The most relevant point is that it is a serial, symbolic style; a particular form of the more generalized Turing Machine. Although this is a very broad category, only a general understanding of it is necessary for my purposes.
Whereas von Neuman and Turing Machine models pretty much dominated computer science since its onset, models of human cognition were less consistent. However, for approximately the past 20 years, mild variations of Fodor's Language of Thought model have reigned (1975). I will explain it in detail in the next section, but suffice it to say, it is quite serial and symbolic as well. Both the computer and the cognitive models deal with a series of processes over discrete symbolic representations.
There have always been an upstart or two to challenge each view, usually with little or no success however. Recently, a new breakthrough within computer science has begun a revolution in nearly all the disciplines of cognitive science. It has been called a clear example of a Kuhnian paradigm shift for each of those fields. It is connectionism.
There were ancestors to connectionism in both computer science and philosophy of mind dating back to Kant, but, as Fodor and Pylyshyn so gladly point out, they failed. Why should this new version of an old theory rear its ugly head again and still be taken seriously? It should be because now computer scientists have developed sophisticated models that are able to empirically prove what were merely wild speculations and highly abstract arguments. Previously, there were only guesses and hypotheses of what these models could do, but now we can directly see. Even though computer scientists are now able to directly test some philosophical ideas, the greatest advances have been entirely new discoveries made with the connectionist networks that have surprised and inspired philosophers.
Later in the paper, I will give a better explanation of the broad field of connectionism, but first a quick glimpse for those new to the paradigm. Connectionist models are not serial in nature, but primarily parallel networks of highly interconnected nodes. These nodes, which can be vast in number, are very simple processors, each often just summing its input, perhaps performing a simple calculation, and then giving the appropriate output. A very different picture than the serial models where a single, complex processor does all of the work.
To better understand the differences between the two let's compare how they would each handle a certain operation - data retrieval for instance. Rumelhart and McClelland developed a network that stores and retrieves information, in this case information about the various members of the Jets and Sharks gangs (1986). A serial machine would best store the information in a list-like database as shown in Figure 1. The computer would run a specific retrieval program that would scan certain rows and columns searching for keywords to get the desired information.
The network created by Rumelhart and McClelland works much differently. A scaled down version (for clarity) is diagrammed in Figure 2. Here the nodes are divided into groups. The groups are: name, age, "profession", marital status, gang affiliation, and education. These are connected by way of hidden nodes. These nodes are considered "hidden" because they do not receive direct input or have any sort of meaningful output. They are useful for connecting the different groups and, in some other connectionist network designs (as I will discuss later) they are essential for computations.
To retrieve information about a certain feature, you merely activate that node. For example, you can activate the Ralph node and Ralph's characteristics will also be activated: Junior High, Jet, single, Pusher, and 30's. Since the activation is two-way, there is an added benefit that can be of more use in other networks but is illustrated here. Art is a single Jet pusher with a Junior High education but is in his 40's, so when each of those nodes are activated, activation trickles back to the Art node. It is decreased in level enough that it does not greatly affect the rest of the network, but is significant enough to be influenced by exactly how similar Ralph is to Art. This effect is essential to connectionist systems' ability to generalize as I will address towards the end of my paper. You can retrieve any sort of information from this network. You can find out not only which of the individuals are Jets but activating that node, but also due to the two-way activation and generalizing, you can find out which profession is most prevalent among them just by stimulating that single node. So it is quite apparent that connectionism and classical systems approach computation far differently.
Connectionism was originally inspired by neuroscience, but the analogy should not be taken too far, at least not presently. The ultimate goal of many connections is to reduce these nodes to electronic neurons and therefore bridge the gap between neuroscience and cognitive psychology. Others - in fact a great majority - are content with an inexact structural similarity between the neural level and the cognitive level. It is important to emphasize that connectionism is still in its infancy. The general foundation has only been laid in the past few years and future directions are still being fine-tuned.
The problem now arises, which model of human cognition is correct? It would appear that connectionist networks are incompatible with the Language of Thought model. This is precisely the main thrust of Fodor and Pylyshyn's critique. Two main options are open: a) prove that connectionism, or a related form of it, is indeed compatible with Language of Thought, or b) prove that the connectionist model of cognition is better than Language of Thought. The first option is not very attractive to connections, and the latter is forbidding to say the least. However, these are actually only the two ends of a continuum of possible positions. A third position, intermediate between the two, also exists, according to which, it is possible to form a new cognitive model that is at least as good as Language of Thought, if not better in some ways, while still preserving the some of the best aspects of it. I shall argue for this latter position.
Before we dive in, let's do some road-mapping for this paper. For starters, I will explain in greater detail the Language of Thought model and some of Fodor's main arguments in favor of it. I will then summarize Fodor and Pylyshyn critique of connectionism presenting their arguments as to why connectionism cannot support a Language of Thought. The primary issues are the way in which connectionism handles both mental representations and mental processes.
I then begin the defensive portion of paper with an in depth
examination of how connectionism actually handles complex mental
representations and a look at the preliminary work on structured
mental processes in connectionist networks. From these I hope to
show that Fodor and Pylyshyn's argument fails and that
connectionism is at least a viable alternative. To finish up, I
will make a comparison between the two approaches as they stand
now and in their potential.
Fodor begins to depart by claiming that mental states (beliefs
and desires in particular) have causal roles for behavior, but
that meaning does not arise from causal relations. Fodor
believes in a more innate style of semantics. He believes that
representations gain their meaning as an intrinsic property by
virtue of causal conections with what they represent and that
relations with other representations are inconsequential. He
does not see intentionality arising from the relations between
mental states, a view that he calls "functional-role
semantics". I will not belabor this point right now. I
will go into more detail about it later since Fodor and Pylyshyn
criticize connectionism on the very point that meaning is
determined by relations. Just suffice it to say that Fodor
believes that language of thought models cannot gain
intentionality through internal relations.
To have this combinatorial structure means that mental states
can have an internal structure consisting of other mental states.
Mental states can be combined with other mental states to become
more complex states. For example, the thought "raise my
left arm and hop on my right leg" is a complex mental
state composed of the two mental states "raise my left
arm" and "hop on my right
leg".1
For clarity, we will only deal with intentions to make
something true. So the intentional state of wanting to raise my
left hand only means that I want to make it true that my left
hand is raised. Whether or not this can cover all intentions
remains to be seen, but it does seem to be an apt idealization.
This move is useful in dealing with such things as unfulfilled
intentions and complex, abstract intentions.
A clear analogy of this combinatorial structure was presented
by Steven Schiffer (Fodor 1987, Sterelny 1991). He developed the
notion of an intention box. Being in an intentional state could
be seen as putting something into the intention box. The
intention box "churns and gurgles" and at least
attempts to make it happen. There may be intermittent gaps that
prevent the intention from being realized, so our intention boxes
can only do so much (e.g. I can be in the intentional state that
I will be Emperor of the Universe all I want, but it sure is not
going to happen any time soon).
For example, if I am in the intentional state of wanting to
raise my left hand, I simply put the proper mental state thing
into my intention box and voila! My left hand raises. Now for
raising my left and hopping on my right foot, I put a mental
state thing into my intention box but, under the language of
thought theory, this mental state thing is made up of the mental
states of the two individual actions conjoined in the proper
fashion. Those who disagree with a language of thought on this
point would want to have a third, completely unrelated mental
state for raising my left hand and hopping on my right foot.
This mental state would have absolutely no relation at all to the
other two except by coincidentally similar behaviors.
Fodor believes that this violates a fundamental rule of
scientific inference that wants to postulate the least number of
"accidents".
Also, the similarity in behavior between the mental state of
raising my left hand and hopping on my right foot and the
two separate mental states of raising my left hand and
hopping on my right foot is purely accidental in AIR.
They are unrelated mental states and any similarity in outcome is
coincidence. Occam's Razor was developed to shave off such ad
hoc explanations as this. It seems quite clear that this
generalized theory of AIR is methodologically vulnerable at best.
Both Fodor (1987) and Kim Sterelny (1991) argue that mental
representations are necessary. Sterelny cites problem solving in
support of this. People use a hypothesis/test method for problem
solving (this includes the full range from abstract psychology
experiments in the area to mundane analysis of what is happening
in one's environment). This methodology requires some form of
representation in which to formulate hypotheses to test. If all
of our hypothesis testing had to be done externally rather than
mentally, then humans would most likely not have evolved quite as
far as we have. Mental trial-and-error has definite survival
advantages over external trial-and-error.
Fodor sticks with the sentence comprehension examples and
argues that mental processes are defined by the mental
representations over which they function (1987). Mental
processes, on his account, are a series of mental representations
and a transition from one to another is completed by performing
operations on the representations. For example, Wh- questions
such as "Who is John?" can be easily converted to
"John is who?" by performing simple operations on a
mental representation that can be seen as a basic parsing tree.
Aunty says a somewhat similar thing with her "Unknown
Neurological Mechanism" that converts the expression before
the listener hears it. Aunty's explanation is just hand-waving,
whereas Fodor offers an explanation for how this occurs - a
language of thought.
It is widely supported that mental processes are
computational, so there must be some sort of structure that
allows one to transfer parts to and from other structures without
altering the rest. The parsing tree model of mental
representations is compatible with this (and not with AIR), and,
Fodor tells us, appears to be the best explanation so far.
Systematicity is illustrated by the ability to move parts of
complex structures to create new ones. Being able to have the
mental state John loves Michael also means that you
necessarily can have the thought Michael loves
John. It is quite apparent that thought is systematic.
Parts can be rearranged, removed, and added according to a set of
rules to create novel mental states.
AIR must rely on a phrase-book type of system. Since AIR's
mental states are indivisible they cannot be systematic. All
that Aunty can do is memorize a massive list of sentences, just
like a non-native speaker reading from a large phrase-book.
However this phrase book would have to be astronomically
expansive from the beginning unless you can somehow add new
thoughts to it. With thoughts being whole objects, a theory
explaining the addition of new ones has its work cut out for it.
With all thoughts coming from a massive phrase book, it is
also possible, and even likely, that the situation of being able
to think John loves Michael but being utterly unable, no
matter how hard you tried, to ever think Michael loves
John, could arise. This situation not only does not seem to
occur in any moderately complex organisms, but implausible to
even conceive of it occurring. This position is not seriously
supported by anyone that I have come across.
Language of thought models with their combinatorial semantics
can explain the systematicity of thought just as it does for
public languages. However, in a (somewhat) noble move, Fodor
points a flaw in this reasoning. The argument that thought has
combinatorial structure because of systematicity goes along as
follows:
As Fodor puts it, "one man's affirming the consequent is
another man's inference to the best explanation" (Fodor
p.149, 1987). Since Fodor is only trying to prove that LOT is
better than his opponents, he merely has to show that it explains
the facts better than the rest. He does not have to prove that
it is necessarily true. To further show the power of affirming
the consequent to get a conditionally true conclusion, all of
science is based completely on this logical fallacy (If my theory
is true, then the evidence will come out this way. The evidence
came out this way, so my theory is true.)
Compositionality is quite similar to systematicity and Fodor
and Pylyshyn even say that "perhaps they should be viewed as
aspects of a single phenomenon (1988, p.41)." Pretty much
what systematicity is to mental processes, compositionality is to
mental representations. It is the clearly defined structure that
the mental processes operate over. A mental representation is
compositional if and only if it is structured in such a way that
the information in it is accessible to mental processes. So the
systematicity of mental processes depends on the compositionality
of its representations. The compositional structure of the
representations depends in turn on combinatorial syntax. The
arguments surrounding systematicity apply directly to
compositionality, at least for now.
It is quite clear that language of thought models provide a
better explanation of systematicity and compositionality than
AIR's phrase book model. Not only is language of thought better
than AIR, it is apparent that due to systematicity, AIR leads to
unacceptable conclusions. To conclude this portion, keep in mind
that the characteristics that a language of thought requires are
systematicity in mental processes and compositionality of mental
states, both of which derive from a language of thought's
combinatorial syntax ands semantics. Now I will look at Fodor
and Pylyshyn's analysis of artificial intelligence with
connectionism as their main target.
Fodor and Pylyshyn define connectionist systems as a large
network of nodes that sum all of their input then and output some
value according to a certain simple function. Rather than having
some single, complex processor to carry out the mental functions,
there are a vast number of very simple processors that together
carry out the mental functions. This is the fundamental
difference between connectionism and classical systems. This
apparently small shift in processing style actually leads to some
very large differences on key issues.
It will be important to note that Fodor and Pylyshyn discuss a
localist version of connectionism. This means that
semantics is assigned directly to individual nodes as opposed to
being distributed over a large number of nodes. So in a localist
network, each node would have a specifically assigned meaning,
whereas in a distributed network groups of nodes would be
assigned meaning.
Fodor and Pylyshyn see this shift to distributed
representations as irrelevant to the issue at hand. In their
eyes, distributed representations have no advantage over localist
models. They are both still fundamentally connectionist and
consequently their arguments should apply similarly. This point
is clearly made in their first footnote:
Fodor and Pylyshyn then point out that the only relevant
"primitive relation" in connectionism is the casual
relation between nodes (i.e., such and such node affects these
other nodes in such and such a way). Classical systems not only
recognize causal relations but also "a range of structural
relations, of which constituency is paradigmatic" (Fodor and
Pylyshyn, 1988, p.12). This fact, that classical systems have
constituency and other structural properties and connectionist
systems do not, is the focal point of their argument: after all,
constituency is where Aunty failed.
Further clarifying connectionism (mostly through contrasting
it with classical systems), Fodor and Pylyshyn see the
distinction becoming most apparent and relevant with regard to
issues of mental representation and mental processes. Classical
systems have a combinatorial syntax and semantics for their
representations whereas connectionist networks do not. Fodor and
Pylyshyn supposedly establish this point with the classic
arguments for language of thought that I presented earlier -
compositionality and systematicity.
The model used by Fodor and Pylyshyn as an allegedly typical case
of connectionism is reprinted as Figure 3. It is a simple
logical reference machine, a paradigmatic case of language of
thought. It is a network in which the complex predicate A &
B can be broken down to either or both of its constituents, A and
B. Understandably, in reality such a simple network would only
be part of a larger machine, but Fodor and Pylyshyn use it as a
connectionist network stripped to its essence, with no frills at
all. If this "purified" connectionist system can
support a language of thought, then there's no problem at all.
However, if it cannot, then connectionism is just out of luck.
They now use the compositionality of mental representations
and the systematicity of mental processes as the two primary
tests of connectionism and its potential for a language of
thought. These two characteristics are the primary indicators of
combinatorial structure, the foundation of a language of thought.
If and only if our test network can possess these, can it then
support a language of thought.
Fodor and Pylyshyn's logical inference network does not
possess any constituent structure, or any internal structure
whatsoever. Each node is atomic. Even node 1, which
represents the molecular statement A & B, is itself
atomic. It is just a single site of activation and lacks
structure of any kind, let alone compositional structure.
One might be tempted to say that since node 1 represents A
& B, a statement that obviously has constituent structure,
this representation can then be split into its individual parts.
However, this move misinterprets the nature of this connectionist
system. The fault lies in the fact that the characteristics of
the labels (i.e. compositionality) are being attributed
to the representations (which have no internal
structure). The logical statements A, B, A & B are all node
labels. These labels are conventions attached to them by the
programmers to make the network's activity meaningful.
The representations themselves are merely the nodes
themselves. Each representation is a simple yes/no, on/off, 1/0
level of activation. Since this network is a localist one, each
node is by definition a representation. With Fodor and
Pylyshyn's formulation of it, as with most (but not all) localist
nets, there are no other levels of representation. There is only
the activation of individual nodes.
It is impossible for a system with compositionality to have
representations that are all atomic. The two are mutually
exclusive. In fact, without molecular representations, there can
be no structure at all present in them. Fodor and Pylyshyn
clearly point out that a base, primitive object that cannot be
broken down obviously cannot have any structure at all. Language
of thought gets around this by developing molecular
representations from the base atomic ones. Localist
connectionist networks, at least this particular one, are not
able to make this move.
It is possible to perform operations that end up exhibiting
systematic rules, as our logical inference device does. However,
htese are not intrinsic operations. Again, the particular labels
are the source of systematicity and not the representations
themselves.
The labels of the system play no causal role whatsoever. The
fact that node 1, when activated, tends to activate node 2 and
node 3, is the only causal relation present. The particular node
labels are utterly irrelevant. We could relabel node 1 'Bill the
Clown', node 2 'Bill', and node 3 'clown'. It would appear to be
systematic still. But what about 'Bill the Clown', 'elephant',
and lampshade' being the respective node labels? 'Elephant'
and 'lampshade' are not constituents of 'Bill the Clown', yet the
causal relations among the nodes remains unchanged. To
be anthropomorphic, the computational system couldn't care less
what labels we humans put onto it; it will compute exactly the
same way. This is another obvious point that Fodor and Pylyshyn
elaborate.
So just what process do connectionist networks operate by?
Fodor and Pylyshyn claim that they operate by association alone.
They are trained to relate representations according to
statistical relations that emerge from experience. A network
will be so trained that if it gets a certain input it should
produce a certain output. If it does not, then the network is
altered so that it will do better next time. In other words,
connectionist networks (supposedly) are trained only to associate
a certain output with a certain input. However, association is
not a structure sensitive operation. There is nothing causally
relevant within the representations, only their association to
other representations. This is not a viable option for
systematic processes either.
Fodor and Pylyshyn do not completely nail the lid shut on
connectionism though. They are kind enough to present the only
alternatives they see open to die hard connections:
My argument from here on does not follow any of these options.
Instead I use the work of many other philosophers and computer
scientists to create a fifth one: a connectionist variant of the
language of thought. However, I will also touch on the last two
options that Fodor and Pylyshyn offer. It would seem that the
third can be used to refute their entire argument (as I will
argue in the next section), and the fourth option may be more
prevalent than they realize (which I will discuss briefly in the
last section).
Fodor and Pylyshyn argue that a connectionist network cannot
support a language of thought. Their logical inference example
shows a case where this is obviously true. So their conclusion
is that connectionism fails as a form of cognitive
architecture. They explicitly offer the possibility,
however, that connectionism is a viable option as an
implementational architecture. More simply stated,
Fodor and Pylyshyn believe that connectionism has no place in
philosophy and should be confined solely to the computer science
domain. This may very well be what they set out to prove, but
they did not succeed.
Their arguments, as they stand, prove that no
connectionist network can support a language of thought. It is
not possible to develop any structure formental representations
and processes within a connectionist network. Fodor and Pylyshyn
argue that due to its associational basis, it is impossible to
form any structured mental process and representations in a
connectionist system. I emphasize no
connectionist network because ones that are attempted
implementations of classical cognitive architectures are still
fundamentally connectionist networks. Therefore, it is
impossible to implement a classical cognitive architecture (and
therefore a language of thought) on a fundamentally connectionist
network. However, Fodor and Pylyshyn then claim that they proved
that no connectionist systems can instantiate a language of
thought except for implementations of classical
systems. So what their argument proves goes far beyond what they
conclude from it, no tto mention that it also goes beyond what is
generally accepted.
The analogy that Chalmers draws is his mad scientist who
proves that Earth is the only inhabited planet in the universe.
First this scientist runs through an a priori proof that
the proper biochemical reactions for life to ever evolve cannot
occur. It is necessarily impossible for life to exist. However,
life obviously exists on Earth. So the Earth is the only planet
with life on it! If you are thinking ad hoc, you are right.
This maneuver is the one that Fodor and Pylyshyn subtly use.
This is a big problem for Fodor and Pylyshyn. They could say
that they were wrong in their conclusion and that implementations
of classical architectures are not an option either. However,
connectionist implementations of classical architectures that
work just as well as standard implementations are entirely
possible. As a computational system, both connectionist networks
and von Neuman/Turing Machines are equivalent. Both can compute
anything computable. Consequently, Fodor and Pylyshyn are out of
luck there.
Their next option is to claim that their argument applies only
to cognitive architectures and not implementational ones. In
fact, this very well seems to be the most likely direction the
would go. Still no luck though. Their arguments are too strong
and end up attacking connectionism as an implementational
architecture as well. They prove that their example cannot
support structured representations or processes of any sort as
either a cognitive or an implementational architecture. I do not
contest this. The example they use cannot implement a classical
cognitive architecture, and therefore, even indirectly, a
language of thought.
It is not the fact that the example is so simple and small.
The fallacy lies in its purely localist nature. Plenty of useful
and well-documented connectionist networks use atomic, localist
representations. However, none of them are attempts at modelling
cognition. By analogy, I could very well look quite in-depth
into this word processing program that I am currently using and
claim that it cannot be even remotely cognitive and conclude that
no von Neuman computer (or Turing machine, for that matter) can
be cognitive.
Sounds quite ridiculous, doesn't it? It parallels Fodor and
Pylyshyn's argument exactly. They take an extreme example of
connectionism that no true connectionist would support as a model
of cognition, and prove that it cannot support a language of
thought. Since all other aspects of connectionism are
"confusions and irrelevancies" (Fodor and Pylyshyn
1988, p. 6), all connectionist networks are incapable of
supporting a language of thought.
Even after the substantial literature on the subject, Fodor
still fails to see its importance. He still claims that
"connections clearly assume that there are elementary mental
representations (typically labeled nodes), and that these have
both semantic and causal properties." (1994, p.96) However
this is far from being assumed at all. In fact it is the very
thing connections argue against.
Fodor and Pylyshyn's model though, is only half-way to the
true models of cognitive architecture that is supported by
connectionists. This example is actually atomic symbols
connected by associations. As Chalmers puts it, it is
"symbolic AI with soft constraints" (Chalmers, 1990b
p.343). It is a blend of the two positions using the
representation style of one and a processing style of the other.
On top of this, these two characteristics that Fodor and Pylyshyn
decide to blend are the weakest of each. In the classic view,
atomic symbols are pretty powerless unless they are combined
using systematic rules. In the connectionist views, the
associations between representations are also quite powerless
except for the fact that their representations are distributed
ones with a great deal of internal structure. Clearly, Fodor and
Pylyshyn's example is not the standard for connectionism.
What Fodor and Pylyshyn overlook is the primary characteristic
of potentially cognitive networks - distributed representation.
They brush it off nonchalantly. As we will see, though, it can
be a powerful, not to mention relevant, element.
With distributed representation, the basic level of
representation is not a single node, but the pattern of
activation over a group of nodes. This is a far cry from the
atomic tokens within classical AI, and far more difficult to
understand. Atomic tokens are easily understood, most likely due
to parallel with variables in computer programs and simple
algebra. Understanding distributed representations requires a
massive theoretical shift.
For one thing, the nodes themselves are not exactly the
representation. They are merely the objects used to instantiate
the representations. Consequently, a given set of nodes within a
network is able to instantiate a large variety of
representations. To give a simplified example, say that we have
a connectionist network within which a group of 5 nodes are
responsible for representations. The rest of the network can
simply be a camera for input, and a speech box that outputs
"Hey, that's a ______!". Now particular patterns of
activation will correspond to particular representations. For
example, 01101 could represent cat, 01100 could represent
dog, and 10011 rock. Whenever those 5 nodes match
one of those activation patterns, they are said to represent that
thing. The 5 nodes could represent any of them or none of them.
It's the particular patterns that are the representations.
It is quite a change in representational structure to say the
least. All that is really necessary to understand at the
beginning is that representations and nodes are not assigned on a
one-to-one basis as in the Fodor and Pylyshyn example. Instead,
the activation pattern over a group of nodes is the
representation.
It is also important to note that these representations are
complex without being molecular. According to classical AI,
there are only atomic and molecular representations. Atomic
representations are indivisible, simple pieces. One piece with
no internal structure. Molecular representations are directly
composed of atomic or other molecular representations. You can
remove a part and it will merely be the same representation
missing that piece. For example, a representation may go from
"Ralph ate the meat that was spoiled." to "Ralph
ate the meat."
Distributed representations possess characteristics of both
atomic and molecular representations. They are complex, being
composed of the activation pattern of 2 or more nodes: in the
above examples, each representation had 5 distinct parts.
However, qua representations, they are indivisible. A
distributed representation depends for its identity on all of the
activations. You cannot remove even one without altering the
entire representation. In the distributed cat
representation, each node does not correspond to a particular
part or feature of the cat: the representation is one holistic
entirety. So distributed representations are complex yet
indivisible. When I look at connectionist compositionality in a
little while, I will go more in depth into these two different
forms of complexity and internal structure.
Another very large benefit of distributed representations is
their ability to generalize. Between the training methods and
the very nature of connectionist networks, systematic rules
naturally emerge, and the network will generalize beyond
its training set. Localist connectionist systems cannot do this
at all. It is also interesting to note that while classical
models are able to do this, they must be specially modified to do
so. With distributed representational networks, this
generalizing is automatic.
Lastly, a major bonus for distributed representations is that
they are able to perform operations and possess properties that
localist models cannot. The type of operations and processes
that Fodor and Pylyshyn are looking for in order to support a
language of thought fall within this group. As they clearly
showed, localist models cannot account for the big two:
compositionality and systematicity. I will show later that
connectionist networks that use distributed representations can
have compositionality and systematicity.
This assumption is very wrong. In even the earliest
connectionist research, it was clear that having representations
distributed over many nodes rather than being on a one-to-one
relation allowed for far more flexibility and possible
characteristics. Apparently, all distributed representations
cannot be merely reduced to localist ones. There is a relevant
theoretical difference involved, not just an implementational
one.
Fodor and Pylyshyn do actually argue against distributed
representations, but very briefly. Their argument is solely on
the basis of compositionality. They claim that since the
individual parts of the representation (each activation value)
are not a separable and "semantically evaluable" on
their own, the internal structure involved is irrelevant.
Representations must be divisible in order to be considered
compositional. For example, with the distrbuted cat
representation, the individual parts (each 1 and 0) carry no
meaning (are not semantically evaluable) so they are irrelevant
to the issue. So, again, distributed representations supposedly
add nothing new to the issue. Arguing against this will be the
focus of my next section.
However, compositionality itself is a bit broader than this.
Constituency is just one method to bring about compositionality.
The actual definition of compositionality that I will use is:
I believe that compositionality can take other forms.
Constituent structure is the most widely accepted and perhaps
even paradigmatic version, but I feel that it is not a necessary
one. There is a more general, catch-all category that involves
any method of representing a structured object in a way different
from Fodor's concatenative compositionality. Tim van Gelder had
called this alternative "functional compositionality".
A representation of a structured object is created with all
relevant information accessible, but not literally present.
The reason that Fodor and others do not believe
compositionality can exist without the constituent parts being
literally present is that they confuse where the structure must
be. Functional compositionality is representation of structure
rather than a structured representation. All of the information
concerning the original structure is accessible.
All that is needed is functional structure. For a
representation to be useful and compositional, it must merely
contain the information concerning the original structure. Fodor
did not realize this. He thought that the only way to do this
was to have the representation directly mirror the original
structure. There is no reason for this other than it has worked
well so far.
Connectionism's problem is with concatenative
compositionality. Using the concatenative method, to represent
an object of arbitrary size, you would need an arbitrary amount
of space to represent it in. This is not feasible in a
connectionist network. The number of nodes is normally specified
at the onset, and all representations must fit within a given
number of nodes. Programming a connectionist network to change
its size as necessary is an extremely challenging task at best,
and perhaps even impossible to properly train at worst.
Connections are consequently left with having to develop a
form of functional compositionality. This form would have to be
able to represent an arbitrary size object with a fixed-size
representation. The answer to this is, of course, distributed
representations. If trained in the proper manner, it is possible
to consistently form representations of objects of varying size
objects as well as consistently recover them with the original
structure still intact.
With each of these models, there is an input which is to be
represented. This input is then encoded into a distributed
representation. It is also possible to output the original
pattern of input given only the distributed representation form
of it. They all have their own method for doing this, but they
each perform the same basic function - transform the input into a
distributed representation. The differences between each of
these models is irrelevant to our discussions.
The way that these representations can be understood is by
picturing them as vectors within a multi-dimensional space.
Sounds easy, huh? It actually is not too bad. Going back to our
5 node example with the cat, dog, and rock
representations, each of these representations would correspond
to a vector within a 5 dimensional space (corresponding to the
number of nodes). The values for each of these nodes are its
coordinates within this space. This move is helpful in
understanding the role of relations between representations as
well as the method for achieving compositionality.
If two representations are very similar, then they are near
each other within this multi-dimensional space. For example,
dogs and cats are somewhat similar (at least more so than dogs
and rocks), consequently their activation patterns are quite
similar (differing in only one value).3
Vector mathematics is
quite handy with connectionist models. The amount of similarity
and difference can be determined quantitatively. Also, the
method of combining two representations without literally
attaching one to the other most often uses vector multiplication.
The two vectors are combined to make a third one. This third
representation bears no surface resemblance to either of them,
but both are contained within it and are always fully
recoverable.
On a slight side note, this also helps to understand the
natural generalizing ability of connectionist networks. Training
a network on one representation spills over to all other
representations to a degree that is directly proportional to
their similarity. The more similar they are the more the
training of one will affect the other. So training our network
on cat would greatly affect its training on dog but
hardly affect rock since it is quite unrelated.
To return to compositionality, this encoding of
representations is not done randomly. This generalizing ability
causes the network to settle towards a set of connections where
the representations are ordered according to their similarity.
The coordinates within the multi-dimensional space are assigned
according to systematic rules that naturally develop.
For combining two representations, concatenative
compositionality would simply stick them together. This type of
functional compositionality blends them together instead. Using
straightforward vector mathematics, the two vectors are combined
into a third one. Within this new representation, neither of the
originals is literally present. All of the information is still
contained, however, and can be recovered just as easily. This is
a peculiarly connectionist way of storing and retrieving
information - through distributed representations. This fulfills
all the requirements of the general definition of
compositionality.
Furthermore, it is impossible to create a model of an yof
these networks, or even come close, with a localized
connectionist network. There is no way of capturing all of the
information these distributed representations contain. To go
even a step further, this functional compositionality can be
richer in meaning than its concatenative counterpart.
Distributed representations are sensitive to more subtle and
various similarities among representations than just
constituency. Classical, concatenative compositionality handles
constituent relations extremely well, but does little else.
Distributed representations can handle constituent relations (not
as naturally as classical, however, but still rather well) but
can also contain the information relevant to a whole host of
other similarities.
This is easier said than done, though. Since most work has
been focused on connectionist compositionality, and that has only
been achieved recently, very little has been done in the way of
systematicity. Most networks will use a distributed
representation until it is necessary to perform any systematic
processes on it, then convert it over to the more traditional
version.
This leaves one to wonder, perhaps systematicity only works
with concatenative compositionality. Even though all of the same
information is contained with functional compositionality, the
particular structure of the concatenative version is necessary.
Since the information in the original structure is merely
recoverable within a distributed representation rather than being
literally present, there is some plausibility to this notion.
I, surprisingly enough, disagree. Functional compositionality
is all that is necessary for systematicity, on one condition.
The constituent parts of the distributed representation must be
individually recoverable and analyzable. In other words, you do
not have to decode the entire thing just to get a single piece of
it. You can remove a piece and leave the rest of it in a
distributed format. If the representation must be transformed
back to its original state entirely to access any bit of it, then
storing it as a distributed representation is unnecessary. It
has no advantage over classical systems and is most likely
inferior.
If a system can create a representation that contains all the
information concerning the original structure and can perform
operations on that representation using that information without
completely converting it, then the system will possess
systematicity. This also sidesteps Fodor and Pylyshyn's
objection that connectionist networks rely solely on association
for their processes. If a process uses the information contained
within the representation, then it is not purely
"Associationist".
Pollack was able to make a version of his RAAM architecture
that can have simple syntactic transformations performed upon it
(Chalmers, 1990a). In this case, the network is able to create
distributed representations of sentences, and then transfer them
from the active to passive voice (or vice versa). This
transformation is done entirely on the distributed
representation, it is never restored back to its original state.
The network works best at a 3N-N-3N set-up as diagramed in
Figure 4. The 3N is the input layer where the sentence, in its
original state, is first fed into the network. The middle layer
is the hidden layer where the distributed representations are
instantiated. The third layer is the output layer for the
transformed sentence. Within this simple architecture, is a very
basic version of systematicity.
In addition to this, the system is very reliable, achieving
100% accuracy in its transformations. This accuracy includes
sentences (within its vocabulary) that it was never trained on.
Even with what it generalizes to, it is perfectly accurate.
This clearly shows that it is possible to perform operations
on a distributed representation using the information contained
within it. These operations are not merely statistical tricks,
where it learned certain associations and correlations. That
could not account for the accuracy of the generalized instances.
These transformations make use of the information of the original
structure that is encoded within the representations, even though
the representation itself does not have that structure. Our
flying machines have stayed in the air. Connectionist networks
can possess systematicity.
Basically, it seems that Fodor and Pylyshyn were wrong and
that connectionism is a possibility for cognitive modeling. Now
we are left with their last argument, mostly presented by Fodor
himself. He claims that the ultimate argument for the classical
view of language of thought is that it is the only viable
alternative around. A few others may try to pop up their heads
briefly, but then don't last. Therefore, since the classical
view is the only one around, it would seem to be the winner for
now.
Through showing that connectionism can possess
compositionality and systematicity, the two criteria set down by
Fodor and Pylyshyn themselves, connectionism has become a viable
alternative. It seems that the classical view is no longer the
only option around. So which will it be then?
Smolensky (1987) addressed this issue, although from a broader
perspective. He refers to the Paradox of Cognition. He claims
that there are two distinct approaches to cognitive modeling: (i)
the hard approach - the mind seems to be characterized by rules,
and (ii) the soft approach - rules alone do not seem to be
enough. This is the paradox. On the one hand it seems that the
mind follows clear rules, on the other hand, the rules only seem
to be able to explain things to a point. The matter of whether
or not the mind is only a set of rules seems paradoxical. This
is parallel to the classical/connectionist debate since the
classical endeavor follows the hard approach and the
connectionist the soft.
In actuality the answer most likely lies somewhere in the
middle: the majority of the mind is characterized by rules, but
not all of it is. Some parts lay below rules. By following both
approaches at the same time, it should be much easier to find
that spot. Also, the problems of one are compensated by the
benefits of the other. They complement each other quite well.
Unfortunately, the vast majority of the research has been done
following the hard/classical view, and very little has been done
on the soft approach. The majority of the work done on the soft
approach has only been in recent years as well. So there does
seem to be an imbalance here, an imbalance that Fodor and
Pylyshyn wish to preserve.
The soft side is quickly growing, though, for reasons I will
address shortly. Connectionism is quite compatible with the soft
approach, and it has finally proven itself to be at least a
viable alternative. The issue now is: of these two alternatives,
which, if any, is better?
I feel that language is an enormous part of thinking and helps
humans to organize our thoughts better, but I do not see how
something that has all the same brain power as us, but no
language, cannot be intelligent and possess any kind of conscious
thought. Supporters of this view have taken linguistic ability
(which is allegedly unique to humans) and becuse of its human
uniqueness claimed that it the most important aspect of
intelligence and even consciousness.
Whereas language is very important, I have not seen any
conclusive evidence to support the notion that it is the basis
for all intelligence. There are many, many mental
processes that are not even remotely linguistic in nature. A
partial list by Chalmers of mental processes in which
compositionality is unimportant, includes: "perception,
categorization, motor control, memory, similarity judgements,
associations, and attention" (Chalmers 1990b, p.347).
For a machine to be fully intelligent, it must incorporate all
or nearly all of the characteristics of the mental, whether they
are linguistic or not. Classical AI handles the language-like
aspects better, and connectionism does surprisingly well at most
of the other aspects. So a full AI system would most likely
incorporate parts of both.
Also, classical AI has pretty much dominated the research for
decades. Most of its failings are quite apparent now. With most
of these problems lingering with little or no progres made to
resolve them, many researchers are seeking alternative theories.
This general dissatisfaction with classical AI and the growing
desire for an alternative have fanned the flames of
connectionism.
We seem to be left with two options, classical and
connectionist AI. I think that the final answer will be a blend
of both (even if it is skewed more towards connectionism). At
the very least, connectionism will offer a brand new perspective
on the issues. However, connectionism seem to be faring better
than that, and may someday be an equal alternative to classical
AI, or perhaps even take over the lead that young students will
argue against since it would then be "establishment".
It would seem that the classical dominance is at an end.
Connectionism is carving out its own place, even if it is with
the "lesser" mental functions. Contrary to Fodor and
Pylyshyn, it is a force that demands attention, not to mention
that the amount of controversy stirred up is revitalizing the
field. Many people are dissatisfied with the recent lack of
progress in classical AI. Far from being dead, connectionism is
alive and thriving extremely well.
Chalmers, D. (1990a). Syntactic transformations on distributed
representations. Connection Science, 2: 53-62.
Chalmers, D. (1990b). Why Fodor and Pylyshyn were wrong: The
simplest refutation. In Proceedings of the Twelfth Annual
Conference of the Cognitive Science Society, pp. 340-347.
Elman, J. (1990). Structured representations and connectionist
models. In Proceedings of the Twelfth Annual Conference of the
Cognitive Science Society, pp.17-23.
Fodor, J. (1975). Language of Thought. Scranton, PA:
Crowell.
Fodor, J. (1987). Psychosemantics: The Problem of Meaning
in the Philosophy of Mind. Cambridge, MA: MIT Press.
Fodor, J. (1990). Theory of Content and Other Essays.
Cambridge, MA: MIT Press.
Fodor, J. (1994). Concepts: A potboiler. Cognition, 50:
95-113.
Fodor, J. & Pylyshyn, Z. (1988). Connectionism and
cognitive architecture: A critical analysis. Cognition,
28: 3-71.
Pollack, J. (1988). Recursive auto-associative memory:
Devising compositional distributed representations. In
Proceedings of the Tenth Annual Conference of the Cognitive
Science Society, pp. 33-39.
Polack, J. (1990). Recursive distributed representations.
Artificial Intelligence, 46: 77-105.
Rumelhart, D., McClelland, J., and the PDP Research Group.
(1986). Parallel Distributed Processing: Explorations in the
Microstructure of Cognition Vols. 1-2. Cambridge, MA: MIT
Press.
Searle, J. (1980). Minds, brains, and programs. Behavorial
and Brain Science, 3: 417-458.
Smolensky, P. (1987). The constituent structure of
connectionist mental states: A reply to Fodor and Pylyshyn.
Southern Journal of Philosophy, 26: 137-163.
Sterelney, K. (1991). The Representational Theory of Mind:
An Introduction. Cambridge, MA: Blackwell.
Van Gelder, T. (199?). Compositionality and the explanation of
cognitive process.
Van Gelder, T. (1990). Compositionality: A connectionist
variation on a theme. Cognitive Science, 14: 355-384.2. The Characteristics of Language of Thought
2.1. General Background for LOT
So just what is language of thought? It is within the broad
category that Fodor calls intentional realism. He uses this term
to cover the broad range of views that fundamentally all support
physical intentionality. In other words, that the mental is a
physical thing. Also, mental states can be intentional, they can
have meaning. This includes beliefs, desires, and other such
"mental" stuff. Intentional realists thus include the
majority of the philosophy of mind community (though it is not
without its major dissenters).
2.2. Combinatorial Structure
Some of the criteria specific to LOT are much more mainstream,
but as I will show later, they are still far from entirely
accepted. Fodor goes into more detail on them, but these
criteria can be summarized by the statement that a language of
thought must have combinatorial structure for its mental states
and representations.
2.3. Arguments for Combinatorial Structure
Fodor's main strategy in supporting a language of thought is to
argue against the plausibility of non-languages of
thought2 (a
position supported in Fodor's articles by "Aunty"),
since only language of thought is left, then it is the best we
have so far. As Fodor puts in a famous quote from Lyndon
Johnson, "I'm the only President you've got." (Fodor
p.27, 1975). I must give Fodor credit in that this is not his
entire strategy. He does show that on the points that non-language
of thought models fail, the language of thought model
does quite well. However, Fodor's main technique is that of
shooting down everyone until you are the only one left standing.
Fodor offers three main arguments for a language of thought. The
first is a methodological attack on the non-language of thought
intentional realists. Then he looks at psychological processes,
and finally at the issue of systematicity. On all three,
surprisingly enough, language of thought not only does quite
well, but AIR [Aunty's Intentional Realism] handles them quite
poorly.
2.3.1. Argument from Methodology
First the methodological attack. The AIR model which is quite
opposed to combinatorial structure relies upon holistic mental
states. These are indivisible to constituent parts (e.g.,
Raising my left hand and hopping on my right foot is a
whole, indivisible mental state unrelated to the two that a
language of thought would see as constituents of it).
Principle P: Suppose there is a kind of event c1 of which the
normal effect is a kind of e1; and a kind of event c2 of which
the normal effect is a kind of event e2; and a kind of event c3
of which the normal effect is a complex event e1 & e2.
Viz.:
This is a particular case of a more general principle known as
Occam's Razor. It states that it is best to have the simplest
solution as possible with the least number of unseen factors as
possible. The AIR violates Occam's Razor by postulating far more
mental states than a language of thought with its combinatorial
structures.
c1 => e1
c2 => e2
c3 => e1 & e2
Then, ceteris paribus, it is reasonable to infer that c3 is a
complex event whose constituents include c1 and c2. (Fodor
p.141, 1987).
2.3.2. Argument from Psychological Processes
The second main support of LOT/objection to AIR is how each deals
with mental processes. Aunty hates the entire idea of mental
representation. She refers to it as "ontological
promiscuity" (Fodor p.144, 1987). For example, Aunty
believes that when someone talks to you, there is only the
utterance and an "Unknown Neurological Mechanism" that
works on the utterance between the ear and the conscious self so
that it is heard already analyzed. There are no representations
of any sort. The listener hears it analyzed already, so any
mental representations would be superfluous.
2.3.3. Argument from Systematicity and Compositionality
Last, Fodor argues that the systematicity of thought requires a
language of thought. One of the major features of language of
thought that other theories have trouble dealing with is the
ability to create new thoughts systematically - having novel
mental states without having to randomly put stuff together until
we get something that happens to work.
(a) There's a certain property [systematicity] that linguistic
capacities have in virtue of the fact that natural languages have
a combinatorial semantics.
The problem here is that pesky little logical fallacy called
affirming the consequent (P -> Q, Q / P). "Since
language has combinatorial semantics, it is systematic. Thought
is systematic also, therefore thought also has combinatorial
semantics." Fodor admits this, but says that it is alright
here.
(b) Thought has this property too.
(c) So thought too must have a combinatorial semantics. (Fodor
p.148, 1987)
3. The Attack of Fodor and Pylyshyn
3.1. Fodor and Pylyshyn's Version of Connectionism
I will now offer an overview of Fodor and Pylyshyn's arguments
against connectionism, of course, with the trimming of extraneous
connectionist bashing that fails to form any sort of argument.
However, you may have noticed that I am starting with their
critique of connectionism without really laying out the
connectionist view. This is intentional. My main objection to
Fodor and Pylyshyn's critique is the form of connectionism they
assume. So let's first take a look at the argument on their
terms before we look at the connectionist views that people
actually believe in.
The difference between Connectionist networks in which the state
of a single node encodes properties of the world (i.e., the so-called
'localist' networks) and ones in which the pattern of
states of an entire population of units does the encoding (the
so-called 'distributed' representation networks) is considered to
be important by many people working on Connectionist models.
Although Connections debate the relative merits of localist (or
'compact') versus distributed representations, the distinction
will usually be of little consequence for our purposes, for
reasons we give later. (Fodor & Pylyshyn, 1988 p.5)
In actuality, there is little debate amongst connections. Nearly
all believe that distributed representations are not only the
better of the two forms for connectionist systems, but also
believe that they are connectionism's saving grace. I will get
to that later, though. For now we will deal exclusively with
localist networks.
3.2. Some New Twists to an Old Argument
3.2.1. The Attack of Structured Mental Representations
This objection deals with how connectionism handles
compositionality. Just as a quick reminder, compositionality is
the form of structure that is necessary in mental
representations. It is the constituent structure that allows
mental processes to be systematic.
3.2.2. The Attack of the Structured Mental Processes
So now that it is apparent that the mental representations of our
connectionist network necessarily lack compositionality, it is
merely beating a dead horse to show that they lack systematicity
as well. Quite simply, systematic mental processes rely upon
compositionality. If compositionality is not present, then
systematicity must be lacking also.
3.3 Fodor and Pylyshyn's Conclusions
It would seem that our logical inference network just cannot
support a language of thought. It not only lacks constituent
structure in its representations and systematic structure in its
processes, it lacks any kind of structure whatsoever. Apparently
connectionism is out of luck as a model of human cognition and
consequently as a method for artificial intelligence research.
1) Try to show that unstructured mental representations are the
correct model for cognitive processes.
Option 1 says that even though structured mental representations
are very useful and have been generally accepted for years, the
notion is wrong - actual human mentality does not involve
systematicity and compositionality. However, this seems hopeless
at its worst and unattractive at its best. The second option is
similar to the previous one except that it throws out only
systematicity but keeps compositionality. Again, this
counter-productiveness is not a good sign and makes this option seem
equally bleak. The third option is Fodor and Pylyshyn's
favorite, since it pretty much gives in to their argument. It
states that connectionism fails as a cognitive theory, but alows
that computer scientists can use it if they really want but only
to implement a primarily classical system. The last one is to
give up all the higher cognitive processes like language and
reason and only use connectionist systems to model lower
processes that are only generally "mental", such as
perception and reflexes. This is also looked upon favorably by
Fodor and Pylyshyn, but they feel that instances where
connectionist networks would model the process better than
language of thought are quite rare.
2) Rely on structured mental representations but continue
to use an associational account of mental processes.
3) Use connectionism only as an implementation theory for a
classical architecture.
4) Give up on networks as model for cognitive processes in
general and only use for certain less cognitive mental processes
(i.e. perception).
4. Fodor and Pylyshyn are Wrong
4.1. Chalmers' "Simple Refutation"
David Chalmers offers "a particularly simple refutation of
Fodor and Pylyshyn's argument" (1990b, p. 340). I think it
will be nice to start off with it, both to take some of the steam
out of them right from the start as well as to introduce the main
problem with all of Fodor and Pylyshyn's arguments against
connectionism. According to Fodor and Pylyshyn's premises and
argument, argues Chalmers, their conclusion not only does not
follow but is also simply false. Connectionism can be a viable
cognitive model.
4.2. Fodor and Pylyshyn's Error
So just where do they go wrong? It is in their example. The
logical inference network that they test their objections against
is a textbook example of a straw person argument. Since Fodor and
Pylyshyn show that some connectionist systems (localist
in particular) can't support a language of thought, it does not
mean that none can. They simply looked at the wrong kind
of system. I have not come across any connectionist who would
believe that such a localist, association-based network could
model human cognition.
When one asks what is the deepest philosophical commitment of the
connectionist movement, the answer is surely this: the rejection
of the atomic symbol as the bearer of meaning. Connections feel
that atomic tokens simply do not carry enough information with
them to be useful in modeling human cognition. Rather,
distributed, subdivisible, malleable representations are the
cornerstone of the connectionist endeavor. For this reason,
localist networks are regarded by many connections as not really
connectionist at all. These networks employ precisely the
traditional notion of atomic symbols, with a new twist added by
connecting them by associative links. (Chalmers, 1990b, p. 343)
This sums up quite nicely the true foundation of the
connectionist movement. It is not association as Fodor and
Pylyshyn would have you believe, but distribution that i
sessential.
4.3. A New Hope: Distributed Representations
So just what is a distributed representation and why is it so
important? Contrary to Fodor and Pylyshyn, they are the
fundamental basis of the connectionist movement. The primary
direction for AI research followed by connections is the attempt
to find a new basis for meaning. As I will elaborate on in the
last section, there is a growing dissatisfaction with atomic
symbols. Distributed representations are the connectionist
movement's alternative.
4.3.1. Just What Difference Would Distribution Make?
You may now be wondering just what benefits distributed
representations would have other than scaring off your opposition
with techno babble. There are many in fact. The largest and
main reason why connections first began to pursue them is that a
group of nodes can carry vastly more information than any single
node can. This is simply obvious and a very good reason to
prefer them. The information may not be literally present in the
representations, but far more can be encoded within a large group
of them rather than a single on/off.
4.3.2. The (Brief) Argument Against Distribution
Fodor and Pylyshyn do mention distributed representations.
However, it is quite clear that they misunderstand the entire
basis of them. In two footnotes, they state that the
"[localist/distributed] distinction will usually be of
little consequence to our discussion" (Fodor and Pylyshyn,
1988, p.5) and that "nothing relevant to this discussion is
changed" (Fodor and Pylyshyn 1988, p.15) if their example
were to be a distributed representational one rather than
localist. Clearly, they merely take it on assumption that
localist versus distributed is only a matter of implementation.
5. Connectionist Compositionality
5.1. Brands of Compositionality
Our two main questions now are whether or not distributed
representations can possess compositionality and systematicity.
To date, more work has been done on compositionality, so we shall
look at that first. For starters, we are familiar with Fodor's
form of compositionality where a complex representation is
composed of constituents each of which is either atomic or
another complex representation. This is the traditional version.
It involves recursively adding the new parts on so that the
constituent parts are literally present within the
representation. For example, ((P&Q)&R) is created from
adding 'P', 'Q', and 'R' together in that order. The final,
complex representation still has each of the original
constituents preserved within it.
The ability to create a representation of a structured object in
such a way as to preserve the original structure in a usable
form.
I feel that Fodor and Pylyshyn would be unlikely to argue with
this. It does not specify the method for forming these
representations, but they would say that the only way to do so is
through constituent structure, or "concatenative
compositionality" as Tim van Gelder has called it (199?,
1990).
5.2. The Possibilities of Functional Compositionality
There are three models in particular that have tried (rather
successfully in my opinion) to create connectionist
compositionality. They are: Pollack and his Recursive
Auto-Associator Memory, Hinton's Representational Hierarchy by Reduced
Description, and the Tensor Product Vectors of Smolensky (van
Gelder, 1990). All three of these differ in details, but they
share a fundamental similarity by relying on functional
compositionality.
6. Connectionist Systematicity
6.1. Groundwork
Now that it seems apparent that connectionist networks can
possess compositionality, what about systematicity? Fortunately,
the problem is not too daunting. Systematicity relies quite
heavily on compositionality, and with that, we are already well
on our way. Having representations that contain all the relevant
internal structure, then, in principle, it should be possible to
have the system perform systematic operations on those
representations. The next step is simply to show that this can,
indeed, be done. As van Gelder put it, "Having designed
their flying machines, connections now need to show they can
actually stay in the air" (van Gelder 1990, p.380).
6.2. First Steps: Syntactic Transformations
Whether or not there is a type of distributed representation that
allows for the accessing of individual parts is a very
challenging theoretical question. Empirically, however, it is
much easier. The biggest problem is waiting for someone to do
it. Well, Pollack did it.
7. Where Do We End Up?
7.1. Differing Cognitive Models: The Hard and Soft
Approaches
To recap things, I have just finished showing how connectionist
networks that rely on distributed representations as opposed to
localist ones, can possess functional compositionality. This is
a more generalized form of compositionality than the one Fodor
prefers, concatenative compositionality. However, it has all of
the relevant attributes and is entirely capable of encoding and
decoding a representation consistently. From there I showed that
this new form of compositionality can yield systematicity.
Pollack even gave a clear empirical example of a connectionist
network that performed syntactic transformations, a paradigmatic
operation for language of thought.
7.2. The Actual Lure of Connectionism
Fodor and Pylyshyn offered several explanations for the lure of
connectionism. Most, however, were not very substantial. They
mostly dealt with facts of implementation, misconceptions about
either side, as well as touching on the two I will talk about in
a moment. Needless to say, the massive growth in the
connectionist movement recently must be for good reason. As we
have seen, connectionism is gaining ground, but still lagging
behind. The two biggest reasons, in my opinion, are the nature
of other mental processes, and the problems with classical AI.
7.2.1. Those Overlooked Cognitive Abilities
Fodor has consistently used analogies to the English language to
better explain his views. However, with his theory of mind
involving a language of thought, constantly using these examples
can easily lead to trying to explain everything possible
linguistically. Fodor has come to see language as the basis for
all intelligence.
7.2.2. That Pesky Chinese Room
The Chinese Room argument was originally presented by John Searle
in 1980 and has been widely debated since. The argument itself
has many problems of its own, but the basic idea behind it well
symbolizes the general concern over classical AI. The Chinese
Room argument is primarily attacking the claim that atomic
symbols are able to possess meaning without internal structure.
This is one of the biggest objections to classical AI. I will
not go into any further detail, but suffice it to say, classical
AI is not without its own problems. Interestingly enough,
connectionism seems to handle those problems quite well.
7.3. Concluding Remarks
Where do we end up? Fodor and Pylyshyn set out to establish that
connectionism cannot support a language of thought because of its
lack of compositionality and systematicity. It is possible to
possess both of these in distributed representations even if the
compositionality involved is of a more general and functional
variety than Fodor's concatenative version.
Bibliography
Footnotes