Theoretical Neuroscience

SciencePedia

Key Takeaways

Theoretical neuroscience uses Marr's three levels of analysis (computational, algorithmic, implementational) as a foundational framework to deconstruct complex brain functions.
The Bayesian brain hypothesis posits that the brain performs optimal inference, combining prior beliefs with sensory data to navigate an uncertain world.
The dynamics of neural networks, featuring concepts like attractors and criticality, explain emergent properties such as memory, stability, and efficient information processing.
Principles from engineering and statistics, like optimal control and Bayesian inference, provide a unifying mathematical language for understanding perception, action, and cognition.

Introduction

The human brain is arguably the most complex system in the known universe, a network of billions of neurons that gives rise to perception, action, thought, and consciousness. Understanding how this biological hardware produces such sophisticated behavior is one of the greatest challenges in science. Theoretical neuroscience confronts this challenge by applying the rigorous tools of mathematics, physics, and computer science to uncover the fundamental principles of neural computation. It seeks to move beyond a mere description of the brain's components to a deeper understanding of the logic and dynamics that govern its function. This article tackles the knowledge gap between the physical brain and its emergent cognitive abilities.

To guide our exploration, this article is structured in two main parts. In the first chapter, "Principles and Mechanisms," we will delve into the foundational ideas that form the bedrock of the field. We will explore David Marr's influential levels of analysis, the architectural designs of feedforward and recurrent networks, the dynamical systems view of memory through attractors, and the powerful unifying framework of the Bayesian brain. Following this, the "Applications and Interdisciplinary Connections" chapter will demonstrate the remarkable power of these theories in action. We will see how they provide elegant explanations for a vast range of phenomena, from the intricacies of sensory perception and motor control to the grand challenges of consciousness and the design of next-generation artificial intelligence.

Principles and Mechanisms

To understand the brain is to embark on a journey across vast scales of complexity, from the molecular dance at a single synapse to the coordinated symphony of billions of neurons that gives rise to thought. The task seems daunting. Where do we even begin? Fortunately, the great computer scientist David Marr gave us a compass for this journey. He proposed that to understand any complex information-processing system, we must approach it at three distinct, complementary levels of analysis. This framework isn't just a useful organizational tool; it is the very lens through which theoretical neuroscience views the brain.

First, there is the computational level: What is the problem the system is trying to solve, and why? What is the fundamental goal? Second, the algorithmic level: How does the system solve this problem? What is the recipe, the sequence of steps, or the representation of information it uses? And finally, the implementational level: How is this algorithm physically realized? What are the nuts and bolts—the neurons, synapses, and molecules—that execute the recipe? Let's see this in action.

The Logic of Seeing: A Case Study in Vision

Consider a task so fundamental to our experience that we do it without a moment's thought: seeing the edge of an object.

At the computational level, the goal is clear: to identify places in our visual field where there are abrupt changes in brightness, color, or texture. Why? Because these discontinuities are not random; they are a reliable signature of the boundaries of physical objects in the world. Finding edges is the first step to carving the world up into things.

At the algorithmic level, how can we build a machine to find these changes? A sharp change in brightness corresponds to a place where the rate of change of brightness is at a maximum. If you've studied a little calculus, you know that the maximum of a function's rate of change (its first derivative) occurs where its second derivative is zero. So, a plausible algorithm is: take the image, calculate its second derivative everywhere, and mark the points where that derivative crosses zero.

There's a catch, however. The real world—and the signals from our eyes—are noisy. Directly taking a second derivative would be a disaster; it would amplify all the high-frequency noise, leaving us with a mess of spurious "edges". The solution is to first smooth the image, blurring it just enough to suppress the noise without erasing the important features. A Gaussian function, the familiar "bell curve," is the perfect tool for this. The complete algorithm, then, is to first convolve the image with a Gaussian kernel ( $G_{\sigma}$ ) and then take the Laplacian ( $\nabla^2$ , a 2D second derivative). The zero-crossings of this $\nabla^2(G_{\sigma} \ast I)$ operation give us a robust map of the edges.

This brings us to the implementational level. Did nature stumble upon the same solution? Astonishingly, yes. Neurons in the early stages of our visual system, like the retinal ganglion cells, have receptive fields with a "center-surround" structure. An individual neuron might be excited by light in the very center of its receptive field but inhibited by light in the area surrounding it. This structure can be beautifully modeled as a Difference of Gaussians (DoG), which is a remarkably close approximation of the very same Laplacian of Gaussian ( $\nabla^2 G_{\sigma}$ ) operator we derived from first principles! The brain, it seems, is a master mathematician.

The Architecture of Thought: Feedforward and Recurrent Designs

The edge detector is a beautiful example of a self-contained module. But the brain's true power comes from connecting billions of such neurons into vast, intricate networks. The patterns of these connections are not random; they follow specific architectural principles that are deeply tied to the functions they perform. The two most fundamental patterns are feedforward and recurrent networks.

A feedforward network is like an assembly line. Information enters at one end, is processed in a series of stages or "layers," and exits at the other. There are no loops; the flow is strictly one-way. This architecture is perfect for tasks that involve a static, input-to-output mapping, like identifying an object in a photograph. The input is the image, the output is the label "cat," and the network performs the transformation. Such a system is inherently memoryless; its output at any given moment depends only on its input at that same moment. The depth of the network allows it to learn incredibly complex transformations, but it has no intrinsic capacity to remember the past.

To handle time, the brain needs a different design. A recurrent network contains loops, allowing signals to be fed back and to circulate within the network. This simple addition is transformative. The network's state at any given moment now depends not only on the current input, but also on its own previous state. This recurrent activity creates a form of memory, an internal context. These networks are inherently stateful and are essential for processing any information that unfolds over time, such as understanding a sentence, planning a sequence of movements, or remembering a phone number just long enough to dial it.

The Stability of Mind: Attractors and Bifurcations

How does a recurrent network "remember"? The memory isn't stored in a specific location, like a file on a hard drive. It is an emergent property of the network's collective dynamics. We can gain a powerful intuition for this by simplifying our picture. Instead of tracking every single spike, we can describe the average activity of a large population of neurons with a single variable, its firing rate, $r(t)$ . The evolution of this rate over time can often be described by a simple differential equation:

\tau \frac{dr}{dt} = -r + \phi(w r + I)

Here, $\tau$ is a time constant, $w$ is the strength of the recurrent connections within the population, $I$ is the external input, and $\phi(\cdot)$ is a nonlinear function (like a sigmoid or tanh) that prevents the firing rate from exploding.

What makes this equation special are its fixed points. A fixed point is a state $r^*$ where the rate of change is zero ( $\dot{r} = 0$ ), meaning the activity is perfectly balanced and self-sustaining. We can think of the system's state as a ball rolling on a landscape. A stable fixed point is like the bottom of a valley. If you nudge the ball a little, it will roll back down to the bottom. This is an attractor.

This is the basis of a profound idea: memories as attractors. A transient input can push the network's state into a particular valley of attraction. Even after the input is gone, the network's own recurrent dynamics will hold it there. This persistent pattern of activity is the memory. The collection of starting points that all lead to the same valley is its "basin of attraction," which gives the memory robustness against noise.

Of course, not all fixed points are stable valleys; some are like the tops of hills, from which any small perturbation will cause the ball to roll away. To determine the stability of a fixed point, we can give the system a mathematical "poke." We linearize the dynamics around the fixed point and find the eigenvalue $\lambda$ that governs how small perturbations grow or decay. A negative $\lambda$ (in this continuous-time model) means the valley is stable; a positive $\lambda$ means it's an unstable hilltop. For multi-dimensional networks, this generalizes to checking whether the real parts of all eigenvalues of a special matrix, the Jacobian, are negative.

Even more wonderfully, as we slowly change the input to the network (the parameter $\mu$ in the abstract form $\dot{x} = f(x, \mu)$ ), the shape of this entire landscape can change. At a critical value of the input, a valley and a hilltop can suddenly appear out of thin air! This event is a saddle-node bifurcation, and it represents the birth of a new memory, a new stable state for the network. It's a universal mechanism by which a system can undergo a dramatic, qualitative shift in behavior in response to a small, smooth change in conditions.

Poised on the Edge: Normalization and Criticality

With so much recurrent excitation, neural networks live a dangerous life, perpetually on the verge of runaway activity. To function, they must employ sophisticated regulatory mechanisms at both local and global scales.

One of the most widespread computational motifs in the brain is divisive normalization. The response of a neuron is not determined by its inputs in isolation; instead, its raw response is divided by a term that includes the summed activity of a pool of nearby neurons. This simple operation has profound consequences. It implements a form of automatic gain control, ensuring that a neuron's response doesn't saturate and remains sensitive across a wide range of input intensities. It makes the neural code relative, emphasizing contrast and change over absolute levels. This principle is so fundamental that it can explain a vast array of nonlinear response properties in sensory systems, and the mathematical form of the divisive normalization equation can be shown to be equivalent to the classic Naka-Rushton function used for decades to describe sensory responses.

On a global scale, the brain appears to engage in an even more spectacular balancing act. Imagine activity spreading through the network like a cascade or an avalanche. One neuron fires, causing a few of its neighbors to fire, who in turn cause others to fire. The average number of subsequent spikes triggered by a single spike is called the branching ratio, $\eta$ .

If $\eta \lt 1$ (the subcritical regime), any cascade of activity will quickly die out. Information cannot propagate effectively across the brain.
If $\eta \gt 1$ (the supercritical regime), cascades will grow exponentially, leading to runaway, seizure-like activity.
But if $\eta = 1$ (the critical regime), the system is perfectly balanced. A cascade can continue indefinitely without dying out or exploding. Activity propagates in "avalanches" of all shapes and sizes.

Remarkably, these critical avalanches follow a specific statistical pattern: their size distribution is a power law, often with an exponent of $-3/2$ . This is the same law that governs many other critical systems in physics, from sandpiles to magnets. And here is the punchline: experimental recordings from living cortical tissue have revealed neural avalanches that follow this exact statistical law!. This has led to the criticality hypothesis: that the brain actively tunes itself to hover at this critical point, poised between order and chaos, to maximize its ability to transmit, store, and process information.

The Brain as a Bayesian Inference Engine

We have seen how the brain can be structured to perform computations, to maintain memories, and to regulate its own dynamics. But how do these circuits come to represent the world accurately? How do they learn from experience? A powerful and unifying answer comes from the Bayesian brain hypothesis.

The central idea is that the brain's fundamental job is to contend with uncertainty. Our sensory inputs are noisy, incomplete, and ambiguous. The hypothesis posits that the brain handles this uncertainty by implementing a form of Bayesian statistical inference. It maintains a prior belief about the state of the world, $p(s)$ , based on past experience. When new sensory data ( $x$ ) arrives, it uses this evidence (via a likelihood function, $p(x|s)$ ) to update its belief, forming a posterior belief, $p(s|x)$ , according to Bayes' theorem: $p(s|x) \propto p(x|s)p(s)$ . This posterior represents the best possible guess about the world, given both prior knowledge and current evidence.

This view requires us to think of probability not as a long-run frequency of repeatable events, but as a degree of belief about a singular, unique state of affairs—the very definition that underlies the Bayesian interpretation of probability.

This isn't just a metaphor; it's a normative theory. It describes what a rational agent should do to make optimal decisions under uncertainty. And because it makes quantitative predictions, it is falsifiable. For instance, in an experiment where subjects must combine two noisy visual cues to estimate a location, the Bayesian model predicts that their final estimate should be a weighted average of the cues, where each weight is proportional to the cue's reliability (its inverse variance). Behavioral experiments have shown that humans do precisely this! We can also search for neurophysiological correlates of the key variables in the Bayesian calculation, such as the precision of the likelihood, and test if neural activity changes in the predicted way when we manipulate sensory uncertainty.

Perhaps the most beautiful part of this story is how it connects back to the cellular level. For the brain to update its internal model, synapses must change their strength. This requires a learning signal. It turns out that a synapse's strength doesn't just change based on the activity of its pre- and post-synaptic partners (the "Hebbian" rule). It requires a third factor: a global, modulatory signal broadcast by neuromodulators like dopamine, acetylcholine, and noradrenaline. These neuromodulators appear to carry exactly the kinds of signals a Bayesian learning system would need:

Dopamine famously signals reward prediction error: "Was this outcome better or worse than I expected?" This is the core teaching signal for reinforcement learning, allowing the brain to learn which actions lead to good outcomes.
Noradrenaline is thought to signal unexpected uncertainty or volatility: "My model of the world is wrong! The rules have changed!" This triggers a state of high alert and increases the learning rate, allowing for rapid adaptation to a new environment.
Acetylcholine may signal expected uncertainty: "The rules are stable, but the world is noisy right now." This could promote heightened attention to sensory details, helping to extract a clearer signal from the noise.

In this grand synthesis, we see the levels of analysis come together. A high-level computational theory of learning (Bayesian inference) is implemented via specific algorithmic signals (prediction errors, uncertainty) which are, in turn, realized by the implementational details of molecular neuromodulation affecting synaptic plasticity. From the logic of a single neuron to the probabilistic reasoning of an entire mind, theoretical neuroscience reveals a system of profound elegance, unity, and power.

Applications and Interdisciplinary Connections

Now that we have explored some of the foundational principles and mechanisms of theoretical neuroscience, we are equipped to go on a grand tour. We shall see how these ideas are not mere abstractions, but powerful lenses through which we can understand a breathtaking range of phenomena, from the simplest sensations to the most profound mysteries of the mind. Like a physicist applying the laws of mechanics from the fall of an apple to the orbit of the moon, we will apply the principles of computation, dynamics, and inference to the universe within our skulls. The beauty of a good theory is its power to unify, to reveal the common logic underlying a thousand disparate facts. Let us begin our exploration.

Decoding the Senses: The Logic of Perception

Our journey into the brain's functions begins where the outside world first makes contact: our senses. Vision, our most dominant sense, provides a perfect canvas on which to paint the principles of neural computation. When you look at an object, your brain is not a passive camera simply recording pixels. It is an active interpreter, immediately beginning a process of deconstruction and reconstruction to extract meaning.

Consider one of the first computational hurdles: seeing an object against a changing background. A black cat is visible on a white wall in bright sunlight and also on the same wall in the dim light of dusk. The absolute amount of light reaching your eye from the cat is vastly different, yet it remains a "black cat." This is because the visual system is exquisitely sensitive to contrast, not absolute light levels. But the story is more subtle still. The perception of a spot of light is not independent of its surroundings. A gray spot will look brighter against a dark background than against a light one. How does the brain achieve this? Early theories proposed a simple subtraction: the response to the center of a neuron's receptive field is reduced by the response to its surround. But this model fails to capture the robustness of perception across different contrast levels.

A more powerful and now widely accepted theoretical principle is divisive normalization. Here, the response of a neuron to its preferred stimulus is divided by the pooled activity of a large population of neighboring neurons. It’s a form of automatic gain control. When the background is "busy" (high contrast), the gain is turned down, making the neuron less sensitive. This allows the system to adjust its operating range to the current context, explaining a huge array of perceptual phenomena. Theoretical modeling allows us to compare these ideas precisely: we can construct mathematical models of both subtractive and divisive inhibition and see which one better predicts neural responses under varying conditions. Divisive normalization, it turns out, is a "canonical computation" that the brain seems to use over and over again, not just in vision but in hearing, touch, and even higher cognitive functions like attention.

As information flows from the retina deeper into the brain, to the primary visual cortex (V1), the processing becomes more sophisticated. Neurons in V1 are no longer interested in simple spots of light; many respond selectively to edges and bars of specific orientations. This is the beginning of building a world of objects from a scene of light and dark. The theoretical question is: what kind of filter could perform this feat? One of the most successful models is the Gabor filter, a simple and elegant mathematical object. It is essentially a sine wave grating multiplied by a Gaussian window. This structure gives it two key properties: it is localized in space (the Gaussian window) and localized in frequency and orientation (the sine wave). In the language of Fourier analysis, its spectrum consists of two localized lobes, meaning it is tuned to a narrow band of spatial frequencies and orientations. This beautifully matches the measured properties of V1 simple cells, suggesting that the brain has discovered this optimal solution for simultaneously representing "what" and "where" in the visual world.

The Master of Movement: Engineering Principles in Motor Control

The brain is not a passive observer; it is an active agent. Moving our bodies presents a staggering engineering challenge. To simply reach for a glass of water, your brain must coordinate dozens of muscles, each with complex properties, to guide your hand along a precise path through space. Yet, you do it without a thought. When you watch someone move, you can immediately spot the difference between the fluid, graceful motion of a dancer and the jerky, awkward motion of a primitive robot. What is the source of this grace?

Optimal control theory, a branch of engineering, provides a stunningly simple answer. The brain acts as if it is solving an optimization problem. One of the most influential ideas is the minimum-jerk hypothesis. Jerk is the rate of change of acceleration; a high-jerk movement is jerky and shaky, while a low-jerk movement is smooth. The theory posits that for a simple point-to-point movement, the brain chooses the one trajectory out of an infinite number of possibilities that minimizes the total squared jerk over the entire movement. When you write down the mathematics of this principle, the solution is a unique, bell-shaped velocity profile that looks remarkably like the profiles of actual human reaching movements. The theory doesn't just describe what the arm does; it provides a normative reason why it does it that way. It is the smoothest possible path that satisfies the boundary conditions of starting and ending at rest.

The same principles of optimal control can explain not just discrete actions like reaching, but also continuous tasks like maintaining balance. Standing upright is like balancing an inverted pendulum, an inherently unstable system that requires constant, subtle corrections. Here again, the brain acts as an expert controller. We can model the body's dynamics with the equations of physics and formulate the brain's goal as a cost function within the framework of a Linear Quadratic Regulator (LQR). This cost function penalizes both deviations from the upright posture and the amount of "control effort" (e.g., ankle torque) used to correct them. The theory then predicts the optimal feedback strategy to maintain stability with minimum effort. From the grace of a ballerina's leap to the subtle sway of standing still, the language of control theory gives us a unified framework for understanding the elegance of biological motion.

The Inner World: Models of Cognition and Belief

Beyond sensing and acting, the brain builds an internal model of the world, a landscape of beliefs, expectations, and memories. Theoretical neuroscience has made enormous strides in formalizing these "inner" processes.

A leading paradigm is the Bayesian Brain hypothesis, which suggests that the brain is fundamentally an inference engine. It constantly makes its best guess about the causes of its sensory inputs by combining incoming data (the "likelihood") with its prior beliefs or expectations ("the prior"). This process, sometimes called Analysis-by-Synthesis, can be formalized using Bayes' rule: $p(\text{cause} | \text{data}) \propto p(\text{data} | \text{cause}) p(\text{cause})$ . One of the most compelling pieces of evidence for this view comes from perceptual illusions. An illusion, in this framework, is not a "mistake" by the brain. It is the optimal, logical conclusion when a strong prior belief overrides ambiguous or noisy sensory data. The MAP (Maximum A Posteriori) estimate of what is out there in the world is a precision-weighted average of the sensory evidence and the prior expectation. When sensory evidence is weak (high noise), or the prior is very strong, our perception is pulled toward the prior, creating a "bias" that we experience as an illusion.

This framework of balancing costs and benefits extends to our decisions and motivations. Why do you sometimes leap out of bed, full of energy, and other times hit the snooze button repeatedly? Reinforcement learning theory suggests we are always trying to maximize future rewards. But there is also a cost to action. Theoretical models propose that the neuromodulator dopamine plays a key role in setting the "vigor" of our actions by regulating the perceived cost of effort. Higher dopamine levels may reduce the subjective cost term in our internal calculation, making it "worth it" to act more quickly and forcefully to obtain a reward. By writing this down as a simple objective function, $J(\text{vigor}) = \text{Reward Rate} - \text{Cost}(\text{vigor})$ , we can formally show how a change in a single parameter, representing dopaminergic tone, can shift the optimal strategy from lethargic to energetic.

Our inner world is also shaped by memory. We know that memories have different lifetimes—some fade in minutes, while others last a lifetime. The complementary learning systems theory proposes that this is because we have two different memory systems: a fast-learning hippocampal system for episodic memories and a slow-learning neocortical system for general knowledge. How does a memory transition from the fragile hippocampal store to the robust cortical one? The theory posits a process of replay-driven consolidation, where the hippocampus "teaches" the cortex, often during sleep. We can build a dynamical system model of this process, with equations governing the strength of the hippocampal trace ( $S_H$ ) and the cortical trace ( $S_C$ ). The model, a system of simple linear differential equations, can reproduce the classic forgetting curves seen in experiments, showing how a memory trace can initially decay but then strengthen over time as it is consolidated from the fast system to the slow one.

The Grand Challenges: Consciousness and Disease

Can these theoretical tools help us approach the most profound and difficult questions in neuroscience?

Consider the puzzle of consciousness. Why are we aware of some information processing in our brain but not others? Global Workspace Theory (GWT) suggests that consciousness arises when information is "broadcast" across a widespread network of brain regions, making it globally available for flexible cognitive processing. This abstract idea can be made concrete using the tools of network science. If we model the brain's connections as a graph, the capacity for global broadcasting can be proxied by the graph's global efficiency, a measure of how easily information can travel between any two nodes. We can then perform virtual experiments. What happens if we lesion the network? The theory predicts that a targeted attack on key "hub" nodes—those with high betweenness centrality that bridge many communication paths—should be far more devastating to global efficiency than random damage. This provides a formal, testable hypothesis linking network structure to a property thought to be essential for conscious awareness.

Another leading theory, Integrated Information Theory (IIT), proposes a direct mathematical measure of consciousness, called $\Phi$ (Phi). $\Phi$ is meant to quantify the extent to which a system's causal structure is both integrated (it cannot be broken down into independent parts) and differentiated (it has a large repertoire of possible states). While philosophically appealing, calculating $\Phi$ is computationally intractable for any system of interesting size. This presents a major challenge for experimental testing. Theoretical neuroscientists are therefore working to develop practical, computable surrogates for $\Phi$ that can be applied to real data like EEG. This involves a delicate balancing act: combining dimensionality reduction techniques, fitting causal models like multivariate autoregressive (MVAR) models, and using clever approximations to capture the essence of integration and differentiation without the impossible computational cost. This work is at the frontier where high-level theory meets the messy reality of data analysis.

The application of theoretical models is not limited to academic curiosity; it has profound clinical implications. Models of seizure dynamics, for instance, treat the brain as a complex dynamical system that can transition from a healthy state into a pathological, oscillatory (ictal) state. By modeling the neural masses involved, we can study the bifurcations that lead to seizure onset. A critical step in using such models for diagnosis or treatment planning is parameter identifiability. Given a recording of brain activity, can we uniquely determine the underlying parameters of our model? Using tools like the Fisher Information Matrix, we can formally assess whether our data are sufficient to pin down these parameters, a crucial step for building models that are not just plausible, but reliable and clinically useful.

Building Brains: Neuroscience as the Muse for AI and Engineering

The relationship between neuroscience and engineering is a two-way street. Not only can engineering principles illuminate brain function, but brain function is now a primary source of inspiration for the next generation of artificial intelligence and computer hardware.

One striking example is Reservoir Computing. For decades, training recurrent neural networks (RNNs) was a difficult art. Reservoir computing proposed a radical simplification: what if the recurrent part of the network—the "reservoir"—is fixed and random, and we only train a simple linear readout layer? This architecture, including the Echo State Network (ESN), proved to be remarkably powerful and efficient. The inspiration comes directly from the structure of cortical microcircuits, which are characterized by vast, recurrent, seemingly random connectivity. The theory states that as long as the reservoir has the "echo state property"—meaning its state is a unique function of the input history—this high-dimensional, nonlinear dynamical system acts as a rich feature embedding. It projects the history of a complex input signal into a space where the information needed for a task becomes linearly separable, ready to be picked off by a simple downstream decoder neuron. The brain, it seems, discovered this trick long ago.

This dialogue extends all the way down to the level of silicon. As we strive to build more efficient AI, we are increasingly turning to neuromorphic engineering, which aims to build chips that mimic the brain's architecture and processing style. A fundamental design choice is precision. Should a neuromorphic chip be analog, with continuous values and inherent noise, or digital, with discrete values and quantization noise? Theoretical neuroscience helps answer this. By modeling different neural codes—for instance, a rate code where information is in the average firing rate, versus a temporal code where it is in the precise timing of single spikes—we can analyze their robustness to noise. We can then calculate the minimum number of bits a digital system would need to achieve the same task performance (e.g., a certain misclassification error) as an equivalent analog system. Such an analysis might reveal, for instance, that a temporal code is far more demanding on timing precision than a rate code is on count precision, forcing a higher bit-depth for the system's clock than for its activity counters. This directly translates an understanding of neural coding into a concrete engineering specification.

A Unified View

Our tour is complete. We have journeyed from the detection of a photon in the eye, through the elegant control of our limbs, into the inner world of belief and memory, and finally to the frontiers of consciousness and the design of artificial minds. What is remarkable is not the diversity of these topics, but the unity of the theoretical principles that illuminate them. The same ideas—of dynamics, inference, optimization, and information—appear again and again, in different guises.

A neuron's response normalization is a gain control problem. The grace of a reach is an optimal control problem. A perceptual illusion is an optimal inference problem. The architecture of an AI is a brain-inspired dynamical system. This is the beauty and power of the theoretical approach. It seeks the fundamental principles of which the brain's myriad details are but a single, glorious instantiation. The work is far from finished, the map still has vast uncharted territories. But we have a compass, and the adventure of discovery has only just begun.