(These are excerpts from my book "Intelligence is not Artificial")
Common Sense Knowlege for Neural Networks
Computers operate on symbolic representations of the world.
They are fed data that are either numbers or names, store them somewhere
as sequences of zeroes and ones, and process them according to sequences of
instructions that we call "programs" (most of which produce new data to be
The main criticism against the program of symbolic Artificial Intelligence
was that this is "not" what the human brain does.
This is actually the main criticism levied against the very idea of using
computers to simulate human intelligence.
Alas, it is hard to deny that this is precisely what "i" do.
Yes, my brain is a wildly chaotic (and ugly) mess of "gray matter",
but "i" think symbolically. This sentence that is forming in my mind and
that i am typing is a sequence of symbols, each of them referring to many other symbols that are in my mind, and you are reading it as a sequence of symbols
that create some meaning in your mind, and the meaning is symbolic too.
Neither you nor i have any evidence that there is some electrochemical activity
going on inside our skulls corresponding to our symbolic life.
What we "are" is our symbolic life.
The "symbolic" approach seems to imply that an "algorithm" exists for everything that "i" think, and the critics can't accept the idea that everything in their mental life is algorithmic in nature. The problem with their criticism is that nobody has found something yet that "i" do and that cannot be explained in terms of representation and algorithm, i.e. computation. There is no question that the problem is difficult: it would take a long time to write all the algorithms that do all the things that "i" do, but nobody has proven yet that it is impossible, and every year it looks more (not less) possible than it looked ten years earlier. But difficult doesn't mean "impossible". In some cases it is actually quite simple. I don't know how to train a neural network to cross a street, but i can write down the basic rules in a few seconds: look right and left, wait for no cars coming in your direction, walk quickly to the other side. If it's raining, be also careful not to slip on the wet pavement. This is an algorithm.
It is a bit ironic that neural networks exhibit no common sense: aren't they
supposed to be close approximations of how our brains work?
It is natural to suspect that the problem lies in the opposite direction:
symbolic Artificial Intelligence is more (not less) similar to what "i" am
than a neural network. I don't want to sound too philosophical, but "i" have common sense, whereas "i" know absolutely nothing of the structure of my brain.
I have to read a book to find out the structure of the brain, whereas i don't
have to read any book to pick up a piece of paper that fell to the floor or,
quite simply, to eat my food. What "i" do is symbolic in nature.
Even if i wanted to, i could not become a neural network: "i" don't process an
image pixel by pixel and feed it to a layer of neurons and then to another
layer and then to another layer and then change the weights of the connections
according to the backpropagation algorithm and so forth. "I" only use symbols.
When neuroscientists tell me that my symbols are the product of some inner
working of the brain, i am happy to know that their findings may some day
fix a problem in my brain the same way that medical discoveries about the heart
may help fix a problem in my heart. But "i" don't need that technical knowledge
to keep thinking just like my heart doesn't need technical knowledge to keep beating.
It is not easy (yet) to encode knowledge of the world into a neural
network. Encoding knowledge was precisely the goal of the "other" branch of
Artificial Intelligence, the symbolic branch that
used mathematical logic and aimed at simulating the way our
minds work, not the way the brain is structured.
"Expert systems" such as Dendral encoded knowledge as a set of
statements in first-order predicate logic, things such as "Piero is a writer"
and "If X is a writer, then X has readers".
This approach reaches conclusions by applying simple rules of deduction
to symbols like the X in that statement.
One way to graft common-sense knowledge onto neural networks is to
integrate the symbolic, knowledge-based, deductive methods of expert systems
with the "subsymbolic", data-driven, learning methods of neural networks.
The initial impulse to mixing the two approaches came from the limitations of
each field: expert systems had the problem of creating the knowledge base (that
typically involved "eliciting" the knowledge from often uncooperative human
experts) whereas neural networks were plagued by the problem of "local minima".
Theoretically, there was also a debate about how
symbolic thinking can arise from neural computation.
The issue was already raised by future free-speech activist David Touretzky at Carnegie Mellon University in collaboration with Geoffrey Hinton ("Symbol Among Neurons", 1985).
Stephen Gallant at Northeastern University worked on the "connectionist expert system" MACIE (1985), that was one of the first neural networks capable of explaining its output.
Paul Smolensky (then at the University of Colorado) thought he could solve the problem with his "tensor analysis" that found a formal equivalence between the high-level description of neural networks and symbolic systems ("One Variable Binding and the Representation of Symbolic Structures in Connectionist Systems", 1987).
Jude Shavlik at the University of Wisconsin worked on a "knowledge-based artificial neural network" ("An Approach to Combining Explanation-Based and Neural Learning Algorithms", 1989).
During the 1990s books such as
"Integrating Rules and Connectionism for Robust Commonsense Reasoning" (1994)
by Ron Sun of Brandeis University explored the subsymbolic-symbolic fusion.
Collections of papers appeared in Daniel Levine's and Manuel Aparicio's "Neural Networks for Knowledge Representation and Inference" (1993) as well as Suran Goonatilake's and Sukhdev Khebbal's "Intelligent Hybrid Systems" (1995).
Dov Gabbay of King's College London and Artur d'Avila Garcez of the City University of London published hybrid models of neural networks for knowledge representation and inference starting with the book "Neural-Symbolic Learning Systems" (2002).
"Markov Logic Networks", introduced in 2006 by Pedro Domingos and Matt Richardson at the University of Washington, and expanded by Domingos' student Jue Wang as "Hybrid Markov Logic Networks" (2008), combine first-order logic and Markov networks and use the Markov Chain Monte Carlo method for (approximate) inference.
Leon Bottou (famous for his "stochastic gradient descent" method and now at Microsoft and later at Facebook) discussed informal reasoning, an intermediate layer between subsymbolic computation and logical inference ("From Machine Learning to Machine Reasoning", 2011).
Richard Socher worked on common sense reasoning under Andrew Ng at Stanford and invented "neural tensor networks" ("Neural Tensor Networks For Knowledge Base Completion", 2013).
Luciano Serafini of Fondazione Bruno Kessler in Italy and Artur d'Avila Garcez of City University of London described "logic tensor networks" that blend Socher's neural tensor networks and first-order many-valued logic ("Logic Tensor Networks", 2016).
It turns out that logic tensor networks are similar to the BLOG (Bayesian LOGic) developed by Stuart Russell at UC Berkeley in 2005. These belong to a different but parallel way of thinking about extending mathematical logic (that deals with true and false statements only, i.e. with one and zero) to deal with probabilities (that is, with a continuum of values between zero and one). The resulting logics are not limited to true/false conclusions but admit degrees of truth.
At the same time that Zadeh was beginning to work on fuzzy logic, Alfred Tarski's student Haim Gaifman showed how to graft probability theory onto first-order logic, i.e. tried to ground probabilities on firm logical foundations ("Concerning Measures in First Order Calculi", 1964).
After Judea Pearl introduced his Bayesian networks, contributions for integrating logic and probabilities came from several mathematicians: Joseph Halpern at Cornell University ("An Analysis of First-order Logics of Probability", 1990), Stephen Muggleton of the Turing Institute in Britain (who popularized "inductive logic programming" in 1991), Venkatramanan Subrahmanian at the University of Maryland ("Probabilistic Logic Programming", 1992), David Poole at the University of British Columbia ("Representing Bayesian Networks within Probabilistic Horn Abduction", 1993), Peter Haddawy at the University of Wisconsin ("Generating Bayesian Networks from Probability Logic Knowledge Bases", 1994), etc.
Google's DeepMath of 2016 (mainly Francois Chollet's project) studies how a neural network can do high-level logical thinking like proving theorems. Francois Chollet started from the obvious fact that humans can learn from very few examples, are relatively good at long-term planning, and naturally form generalizations of situations that they can apply in the future to a broad variety of situations.
Microsoft's DeepCoder of 2017 (in collaboration with Matej Balog of Cambridge University) is a similar project for automated program generation.
There is one thing that the biological neural networks of our brains do better than the artificial neural networks of Artificial Intelligence: deduction. Mathematical logic is good at deriving the effect from the cause: if it is raining, things will get wet. Most machine Learning (in particular, deep learning) is good at guessing the cause given the effect. That's because the machine is "trained" by showing many effects of the cause (for example, many examples of the object "apple"). Nonetheless, if the machine can't derive the effect from the cause, after it has "learned" the cause it is not able to reproduce the behavior of the observed system. There is a long history of trying to teach machines how to learn causal knowledge, starting with Judea Pearl's landmark book "Causality" (2000) via Shohei Shimizu at Riken in Japan ("A Linear Non-Gaussian Acyclic Model for Causal Discovery", 2006) and Bernhard Schoelkopf at the Max Planck Institute in Germany ("On Causal and Anticausal Learning", 2012). In 2013 Isabelle Guyon (now no longer at Bell Labs) organized a competition to develop algorithms that can learn causal inference, the Cause-effect Pairs Kaggle Competition (266 teams competed).
Back to the Table of Contents
Purchase "Intelligence is not Artificial")