Mental Modules

Vision is one of the most important and complex cognitive faculties.

In 1982 the British psychologist David Marr delivered a landmark study on vision and, along the way, devised an influential cognitive architecture. Marr concluded that our vision system must employ innate information to decipher the ambiguous signals that it perceives from the world. 

Processing of perceptual data must be performed by "modules", each specialized in some function, which are controlled by a central module.  In a fashion similar to the linguist Noam Chomsky and the philosopher Jerry Fodor, Marr assumes that the brain must contain semantic representations that are innate and universal (i.e., of biological nature) in the form of modules that are automatically activated. The processing of such representations is purely syntactical.  Marr, Chomsky and Fodor advanced the same theory of the mind, albeit from three different perspectives: they all believe that the mind can be decomposed in modules, and they all believe that syntactic processing can account for what the mind does.

Specifically, Marr explained the cognitive faculty of vision as a process in several steps. First, the physical stimulus from the world is received (in the form of physical energy) by transducers, that transform it into a symbol (in the form of a neural code) and pass it on to the input modules. Then these modules extract information and send it to the central module in charge of higher cognitive tasks. 

Each module corresponds to neural subsystems in the brain. The central module exhibits the property of being "isotropic" (able to build hypotheses based on any available knowledge) and "Quinian" (the degree of confirmation assigned to a hypothesis is conditioned by the entire system of beliefs).

The visual system is thus decomposed in a number of independent subsystems.  They provide a representation of the visual scene at three different levels of abstraction: the "primal sketch", which is a symbolic representation drawn from the meaningful features of the image (anything causing sudden discontinuities in light intensity, such as boundaries, contours, shading, textures); a two-and-a-half dimensional sketch, which is a representation centered on the visual system of the observer (e.g., describing the surrounding surfaces and their properties, mainly distances and orientation) and computed by a set of modules specialized in parameters of motion, shape, color, etc.; and finally the tri-dimensional representation, which is centered on the object and is computed according to some rules (Shimon Ullman's “correspondence rules”). 

This final representation is what is used for memory purposes. Not what the retina picked up, but what the brain computed.


