Inference with Intractable Likelihoods

SciencePedia

Key Takeaways

Many complex scientific models suffer from an intractable likelihood, making standard statistical inference impossible.
Approximate Bayesian Computation (ABC) solves this by simulating data and accepting model parameters that generate data similar to observations.
The effectiveness of simulation-based methods hinges on choosing informative summary statistics and managing the bias-variance trade-off.
The "simulate-and-compare" paradigm is a powerful, unifying approach used across genetics, economics, and cell biology to analyze complex systems.

Introduction

In the quest to understand the world, scientists build models—mathematical representations of reality, from the dance of galaxies to the evolution of life. The cornerstone of validating these models is the likelihood function, a powerful tool that quantifies how well a model's proposed parameters explain observed data. For decades, this principle has guided scientific discovery. However, as our models grow increasingly sophisticated to mirror the true complexity of nature, we often encounter a formidable barrier: the likelihood function becomes computationally impossible to calculate, a problem known as an 'intractable likelihood.' This gap between our most ambitious theories and our data threatens to stall scientific progress in fields where complexity reigns.

This article confronts this challenge head-on. It explores the ingenious workarounds and revolutionary computational techniques that allow scientists to perform robust statistical inference even when the likelihood is unknowable. In the first chapter, "Principles and Mechanisms," we will delve into why likelihoods become intractable and explore the foundational recipes of methods like Approximate Bayesian Computation (ABC) that sidestep the problem entirely. Subsequently, in "Applications and Interdisciplinary Connections," we will witness these powerful methods in action, unlocking secrets in genetics, cell biology, and economics, demonstrating a universal logic of simulation-based discovery.

Principles and Mechanisms

Imagine you are an architect, and you’ve just designed a magnificent, intricate cathedral on paper. This design is your model of the world, complete with all its beautiful rules and relationships. Your parameters, let’s call them $\theta$ , are the crucial dimensions in your blueprint—the height of the spire, the thickness of the walls, the curvature of the arches. Now, you go out into the world and find an actual, ancient cathedral. This is your data. The grand question is: could your design have produced this particular building?

To answer this, we need a way to connect the blueprint to the building. In science, this connection is forged by a powerful concept called the likelihood function, often written as $p(\text{data} | \theta)$ . It answers a very specific question: "Assuming your blueprint (your model with parameters set to $\theta$ ) is true, what is the probability that you would end up with the exact building (the data) we see before us?" By finding the parameters $\theta$ that maximize this likelihood, we find the blueprint that best explains the observed reality. This is the bedrock of modern statistical inference.

For a great many simple problems, this works like a charm. But what happens when our models become as complex and beautiful as the reality they seek to describe? What happens when our blueprint isn't just a few lines on a page, but a dynamic simulation of a burgeoning star, an evolving ecosystem, or a bustling molecular city inside a cell? Here, we often hit a formidable barrier: the intractable likelihood.

The Wall of Intractability

For our most ambitious scientific models, the likelihood function becomes a monstrous, unknowable entity. We can’t write it down, we can’t calculate it. We have the blueprint and we have the building, but the mathematical bridge between them has collapsed. Why does this happen?

It’s a problem of unimaginable complexity, a beast of combinatorics. Consider trying to understand the genetic history of a population based on DNA samples. The likelihood of seeing the specific genetic patterns in our data depends on the precise ancestral family tree that connects everyone in our sample. To get the true likelihood, we would have to calculate the probability of our data for every single possible family tree, and then average them all. The number of possible trees, or genealogies, for even a modest sample of people is greater than the number of atoms in the known universe. It’s not just hard to compute; it is fundamentally impossible.

Or, imagine peering into a chemical reaction, a microscopic dance of molecules. The state of our system—the count of each type of molecule—changes with every random molecular collision. The likelihood of arriving at a certain chemical concentration after one minute depends on the infinite number of possible paths the reaction could have taken—every possible sequence of collisions at every possible instant in time. Summing over all these paths is an integral of terrifying, infinite dimension.

In these cases, and so many more in fields from cosmology to economics, we stand before a great wall. We can use our model to simulate reality, like an engine that can spit out new, fake data. But we cannot run the engine in reverse to ask how likely our real data was. So what do we do? Do we give up and retreat to simpler, less realistic models? No! This is where the true genius of modern science shines. If we can’t climb the wall, we find clever ways to go around it.

A Clever Detour: The Magic of MCMC

Sometimes, the wall has a secret door. This happens when our likelihood is only partially intractable. In many Bayesian inference problems, what we truly want is the posterior distribution, $p(\theta | \text{data})$ , which tells us the probability of our parameters given the data we’ve seen. Bayes' theorem tells us it’s proportional to the likelihood times our prior beliefs about the parameters:

$p(\theta | \text{data}) \propto p(\text{data} | \theta) \times p(\theta)$

To make this a proper probability distribution that sums to one, we must divide by a normalizing constant, often called the marginal likelihood or the evidence, $Z = p(\text{data})$ . This $Z$ is the average likelihood over all possible parameters, $Z = \int p(\text{data} | \theta) p(\theta) d\theta$ , and it’s very often an intractable integral itself. So while we can compute the shape of the posterior distribution, we don't know its absolute height.

Here, an ingenious class of algorithms called Markov Chain Monte Carlo (MCMC) comes to the rescue. One of the most famous is the Metropolis-Hastings algorithm. Instead of trying to map out the entire posterior landscape, it prescribes a clever way to "walk" through the parameter space. At each step, you propose a move to a new location. You then decide whether to accept the move with a certain probability.

Here’s the magical part: the acceptance probability depends only on the ratio of the posterior density at the new and old locations. When you form this ratio, the intractable constant $Z$ appears in both the numerator and the denominator, and it spectacularly cancels out!

\text{Acceptance Ratio} = \frac{p(\theta_{\text{new}} | \text{data})}{p(\theta_{\text{old}} | \text{data})} = \frac{\frac{p(\text{data} | \theta_{\text{new}}) p(\theta_{\text{new}})}{Z}}{\frac{p(\text{data} | \theta_{\text{old}}) p(\theta_{\text{old}})}{Z}} = \frac{p(\text{data} | \theta_{\text{new}}) p(\theta_{\text{new}})}{p(\text{data} | \theta_{\text{old}}) p(\theta_{\text{old}})}

This is profound. It means we can explore a probability distribution in perfect proportion to its density without ever needing to compute the density itself! It's like exploring a mountain range in a thick fog. You may not know your absolute altitude or the height of the tallest peak, but by checking whether each step takes you uphill or downhill, you can devise a strategy to create a map of the entire range. MCMC allows us to do just that for our parameters, providing a powerful solution when the intractability is confined to a single pesky number, $Z$ .

When the Map is Unknowable: Approximate Bayesian Computation

But what if the problem is more severe? What if, as in our genetics and chemistry examples, we can’t even compute the likelihood term $p(\text{data} | \theta)$ itself? Now we can’t even tell if a step is uphill or downhill. The MCMC trick won't work. We need a completely different philosophy.

This new philosophy is called Approximate Bayesian Computation (ABC), and its core idea is breathtakingly simple and intuitive:

If your model is a good description of reality, then simulations from your model should look like the real data you've observed.

This shifts the entire problem from calculating probabilities to comparing patterns. The simplest ABC algorithm, known as rejection sampling, works like this:

Take a guess at the parameters, $\theta$ , by drawing a sample from your prior distribution (your initial beliefs).
Feed these parameters into your model and run a full simulation, generating a synthetic, or "fake," dataset, $y_{\text{sim}}$ .
Compare your fake data $y_{\text{sim}}$ to your real-world data $y_{\text{obs}}$ .
If the fake data is "close enough" to the real data, you keep the parameters $\theta$ that you guessed. If not, you throw them away.
Repeat this process millions of times.

The collection of parameter values that you keep forms an approximation of the posterior distribution! We have completely sidestepped the need to ever write down the likelihood function. We just let the model speak for itself through simulation. In a simple, idealized case, if we demanded that the simulation exactly matched a key aspect of our data, the ABC procedure would be equivalent to slicing our prior distribution with the knife of data, leaving only the parameters compatible with what we saw.

The Art of "Close Enough"

Of course, the devil is in the details. The practical success of ABC hinges on how we define "close enough," a process that is as much an art as it is a science. This challenge breaks down into three key questions.

First: How do we compare datasets? Comparing entire, high-dimensional datasets like a full genome or a time-series of stock prices is impractical. The probability of a simulation matching the real data exactly is zero. The solution is to not compare the data itself, but to compare a handful of summary statistics. These are carefully chosen numbers that distill the complex data down to its essential features—for example, the average genetic diversity in a population, or the volatility of a financial asset.

Second: Which summaries do we choose? This choice is critical. A bad set of summaries will lead you astray. An ideal summary statistic is one that is highly informative—it changes sensitively when you change the parameter you care about. You want to focus the comparison on the features of the data that actually hold information, while ignoring those that are just random noise. It's about finding the signal and ignoring the static.

Third: How do we measure distance and set the threshold? Once we have our summary statistics, say $S_{\text{obs}}$ from our real data and $S_{\text{sim}}$ from a simulation, we need a distance metric, $\rho(S_{\text{obs}}, S_{\text{sim}})$ , to quantify how far apart they are. A simple Euclidean distance can be misleading if the summaries are on wildly different scales or have different amounts of natural variation. More sophisticated metrics, like a Mahalanobis distance, can account for these differences, acting like a properly calibrated ruler.

Then we must choose a tolerance, $\epsilon$ . We accept a simulation if its distance is less than $\epsilon$ . This sets up a fundamental compromise, one of the most beautiful and ubiquitous trade-offs in all of science: the bias-variance trade-off.

If you choose a very small $\epsilon$ , you are being very strict. The parameters you accept will be a very good approximation of the true posterior (low bias), but you will reject almost every simulation, making the method incredibly slow and the results statistically unstable (high variance).
If you choose a large $\epsilon$ , you are being lenient. You'll accept lots of parameters, making the method fast and stable (low variance), but the resulting distribution will be a crude and smeared-out approximation of the truth (high bias).

Navigating this trade-off is at the heart of the art and science of applying ABC.

Creative Alternatives: Building a Better Bridge

ABC is a powerful tool, but it's not the only one. The same spirit of principled approximation has led to other creative solutions for bypassing the wall of intractability.

One elegant idea is Synthetic Likelihood. It starts like ABC: for a given parameter $\theta$ , we run many simulations and collect the summary statistics from each. But instead of just comparing distances, we look at the entire cloud of simulated summary statistics and fit a simple, tractable probability distribution to it—typically a multivariate normal distribution (a bell curve in multiple dimensions). This fitted distribution becomes our new likelihood function. It's a "synthetic" likelihood, built from simulations, that we can then plug into standard, powerful methods like MCMC. It’s a beautiful hybrid that combines simulation with the formal machinery of likelihood inference.

Another approach, used when a model has many interacting parts, is Composite Likelihood. The idea is to build a tractable, but "incorrect," likelihood by multiplying together the likelihoods of smaller, manageable chunks of the data (like pairs of data points), deliberately ignoring the fact that these chunks are not truly independent. This seems like cheating! But remarkably, the resulting estimator is often consistent—it converges to the right answer as you get more data. The catch is that because you ignored the correlations, your estimates of uncertainty (your error bars) will be wrong. They are typically too optimistic. But even this can be fixed with more advanced statistical tools (like "sandwich estimators") that correct for the ignored dependence.

The Beauty of Approximation

Our journey began with the ideal of the perfect likelihood function, the one true bridge between our models and reality. We crashed against the wall of intractability, a barrier thrown up by the profound complexity of the very systems we wish to understand. But instead of admitting defeat, we found a collection of ingenious detours.

Whether it’s the canceling-constant trick of MCMC, the "simulate-and-compare" ethos of ABC, or the "build-a-new-bridge" strategies of synthetic and composite likelihoods, the underlying theme is one of creative and principled approximation. It reflects a deep truth about the scientific process. It is not always about finding exact, perfect answers. It is about understanding our models, understanding our data, and, most importantly, understanding the limits of our ability to connect the two. In that space of honest approximation lies a great deal of the beauty, creativity, and progress in science.

Applications and Interdisciplinary Connections

In our previous discussion, we explored the curious situation where we can describe the rules of a game with perfect clarity, yet find ourselves utterly unable to calculate the probability of any particular outcome. The likelihood function, the mathematical bridge connecting our model's parameters to our data, becomes an impassable chasm—it is "intractable." This might seem like a paralyzing setback, a full stop at the edge of scientific inquiry. But, as is so often the case in science, necessity becomes the mother of invention. When direct calculation fails, we learn to simulate.

The core idea is as simple as it is powerful. If you can't work backward from the evidence to the cause, then work forward from a guessed cause to its consequences, and see if they match the evidence. Think of a scientist as a detective trying to identify a suspect. The fingerprint evidence is smudged and unreadable (an intractable likelihood). What can you do? You build a "suspect-simulator." You feed it a potential suspect's characteristics (the model parameters), and the machine generates a simulated fingerprint. You then compare this forgery to the smudged evidence. If it’s a poor match, you tweak the suspect's characteristics and try again. If it’s a good match, you’ve found a promising lead. By doing this thousands of times, you build up a profile of the most likely culprits. This strategy—of simulating and comparing—is the heart of a suite of revolutionary techniques that have blown open problems in fields as disparate as genetics, economics, and cell biology.

Decoding the Blueprints of Life: Genetics and Evolution

Perhaps nowhere is the challenge of intractability more present than in the study of life itself. Evolution is a grand, stochastic play, driven by the concatenated effects of chance and necessity acting on uncountable numbers of individuals over eons. Writing down the exact probability of arriving at the genetic makeup of a modern population is a task of cosmic absurdity. Yet, we can simulate it.

Imagine we want to measure the very force of evolution—natural selection. We might have data from a population showing that a particular gene variant has become more common over 50 generations. Was this due to selection, or just random luck (what geneticists call genetic drift)? The traditional likelihood is a tangled mess of branching probabilities. Instead, we can create a digital terrarium, a simulation based on the classic Wright–Fisher model, which acts out the process of reproduction, selection, and drift. We can set the "selection strength" dial, say a parameter $s$ , to a particular value and run the simulation. At the end, we see if the change in our simulated population looks like the change in the real one. By repeating this for many different values of $s$ , we can generate a whole distribution of plausible values for the selection coefficient, effectively taking a measurement of evolution in action.

This "simulate-and-compare" logic can be used to solve even more complex evolutionary mysteries. For instance, how do we determine if a crucial gene in, say, a species of butterfly, was inherited from an ancient ancestor or acquired more recently through interbreeding with a different species? This process, called adaptive introgression, leaves subtle fingerprints in the genome. The full likelihood is again beyond reach. But we can define a set of clever clues, or "summary statistics": things like the average length of DNA segments shared with the donor species, or an imbalance in shared genetic variants. We can then simulate different historical scenarios—one with no interbreeding, one with neutral interbreeding, and one where the interbred gene was strongly favored by selection. Each scenario produces a characteristic pattern of clues. By finding which simulation's clues best match those in our real butterfly genome, we can perform a kind of genomic forensics, choosing the most probable evolutionary history from a lineup of suspects. This turns our method into a powerful tool for model choice.

The real beauty of this approach emerges when we tackle truly subtle questions. Consider a population facing a new environmental stress, like a prolonged drought. If the population adapts, did it do so because its members were inherently flexible—a phenomenon known as plasticity—or did that initial flexibility simply buy time for slower genetic changes to "hard-wire" the adaptation, a process called genetic assimilation? The raw data, a simple time series of how many individuals show the drought-tolerant trait, can be ambiguous. The magic trick is to design summary statistics that capture the dynamic signature of the process. For example, we could measure how the correlation between the environment (drought) and the trait changes over time. Pure plasticity would maintain a strong correlation, while genetic assimilation would show the correlation weakening as the trait becomes genetically canalized. By simulating both hypotheses and comparing these dynamic signatures, we can disentangle two deeply intertwined evolutionary processes. This reveals that the art of these methods lies not just in the simulation, but in the creative act of choosing what to measure.

The Choreography of the Cell

If we zoom in from the scale of populations to the microscopic dance of molecules within a single cell, the world becomes even more dominated by randomness. Here, intractable likelihoods are not the exception; they are the rule.

Consider the journey of a single cell migrating in response to a chemical attractant. Its path is a "random walk," a series of jittery, unpredictable steps. It's meaningless to ask for the probability of observing the exact path taken; for any continuous path, the probability is technically zero. But we can ask a more sensible question. We can characterize the cell's movement by parameters like "persistence" ( $\alpha$ ), which governs how straight it tends to move, and "directional bias" ( $\beta$ ), which measures the pull of the attractant. To estimate these, we don't need the exact path. We can summarize it, for example, by the ratio of its net displacement towards the signal to the total path length it traveled. We can then simulate thousands of virtual cells with different values of $\alpha$ and $\beta$ and discover which parameter settings produce trajectories with the same summary measure as our real cell.

This logic is transformative in synthetic biology, where engineers aim to design and build new biological circuits. A classic example is the "genetic toggle switch," a pair of genes that mutually repress each other, creating a bistable system that can exist in one of two states. The underlying process of gene expression is fundamentally "bursty" and stochastic. This means a population of genetically identical cells will show a wide, and often bimodal, distribution of protein levels. If one naively tries to fit a simple statistical model that assumes a bell-shaped (Gaussian) distribution, the results are not just inaccurate; they are nonsensical. The model is blind to the most important feature of the data: its bimodality.

Approximate Bayesian Computation (ABC), however, excels here. Instead of relying on a few simple moments like mean and variance, we can compare the entire shape of the distribution from our simulation to the distribution from our experimental data. Using sophisticated distance metrics that measure the "work" required to transform one distribution into another (like the Earth Mover's Distance), ABC can "see" features like bimodality. It will favor parameter values that reproduce the two distinct states of the toggle switch, providing a far more truthful inference. This shows that simulation-based methods are not merely a crutch for when likelihoods are hard; they are a superior tool for when reality is more complex than our simple statistical formulas allow. The power of the approach is further enhanced by thoughtful statistical design, such as using metrics like the Mahalanobis distance to properly weight and combine information from multiple, correlated summary statistics.

From Genes to Markets: The Universal Logic of Simulation

The philosophical thread connecting these examples—that simulation can stand in for intractable calculation—is not confined to biology. It has been independently discovered and powerfully applied in a seemingly distant domain: economics.

Economists often build "structural models" to explain the complex, dynamic choices made by individuals or firms. For example, what factors influence a person's decision to enter the workforce each year? The true model might involve unobserved personal traits and serially correlated shocks to earning potential, making the exact likelihood of a person's entire work history impossible to calculate.

To solve this, economists developed techniques like the Simulated Method of Moments (SMM) and Indirect Inference (II). These are deep philosophical cousins of ABC. In Indirect Inference, for instance, a researcher might proceed with a clever two-step. First, they fit a simple, tractable (even if technically "wrong") auxiliary model—say, a standard logistic regression—to the real-world data. This gives them a set of auxiliary parameters. Then, they turn to their complex structural model. They simulate data from it using a guess for the true structural parameters, and then fit the same simple auxiliary model to this simulated data. The goal is to tweak the dials on the complex model until the simple model yields the same parameter estimates on both the real and simulated data. We use the simple model as a common yardstick. This approach not only provides a path around the intractable likelihood but also often has the beautiful side effect of smoothing out the optimization problem, making it computationally easier to solve. This parallel evolution of ideas underscores the universal power of the underlying logic.

The Frontier: Nested Simulations and Learning on the Fly

The journey doesn't end here. The "simulate-and-compare" paradigm is being pushed to ever more stunning levels of sophistication. What if we are trying to track a system where not only the state is unknown, but the very parameters governing its dynamics are also unknown and potentially changing?

Consider tracking a satellite whose motion is described by a complex stochastic differential equation, but where some of the physical constants in that equation are themselves uncertain. This is a problem of joint state and parameter estimation. The cutting edge of simulation-based inference offers a solution of breathtaking elegance: the Sequential Monte Carlo Squared ( $\text{SMC}^2$ ) algorithm.

This method employs a nested, hierarchical simulation—a particle filter within a particle filter. Imagine an "outer" layer of computational particles, where each particle represents a complete set of possible physical laws (a parameter vector $\theta$ ). Now, for each of these parameter particles, we run a separate, "inner" particle filter that uses those specific laws to track the observable state of the satellite. When a new observation from the real satellite comes in, we check how well each inner filter predicted it. The outer parameter particles whose inner filters made the best predictions are given more weight. The entire system learns on the fly, simultaneously refining its estimate of the satellite's position and its understanding of the laws that govern it. It is a grand computational tournament of parallel universes, where those that best match reality are continually rewarded and replicated. Algorithms like these, which rely on deep results like the "pseudo-marginal principle" to ensure their validity, represent the frontier of this field.

A New Kind of Science

From the sprawling history of a species to the frantic jitter of a single molecule, and from the opaque choices of an individual to the hidden dynamics of a financial market, a unifying principle has emerged. When the path from observation back to theory is mathematically impassable, we forge a new one armed with computational power: we simulate, we compare, and we learn.

This represents more than just a new set of tools; it reflects a paradigm shift in the scientific process. We are no longer constrained to building models that are simple enough to be analytically solvable. We are now free to imagine and construct models that are as rich, complex, and stochastic as the phenomena we seek to understand. Computation has become the bridge connecting our most ambitious theories to the messy, beautiful reality of the observable world, opening up frontiers of inquiry that were, until recently, beyond the horizon of possibility.