Shortest Credible Interval

SciencePedia

Key Takeaways

The shortest credible interval, or Highest Posterior Density (HPD) interval, is the most concise range to summarize a parameter's posterior distribution for a given probability level.
Unlike equal-tailed intervals, the HPD interval excels with skewed or multimodal distributions by exclusively including the most plausible values, even if this results in disjoint intervals.
The HPD interval often becomes a one-sided range when analyzing boundary-constrained parameters or monotonic posterior distributions, common in risk and reliability analysis.
In practice, HPD intervals are often computed from MCMC samples by finding the shortest interval that contains the desired percentage of sorted samples.

Introduction

In every field of empirical science, from astronomy to zoology, measurement is never perfect. We grapple with instrument limitations, random noise, and inherent variability, meaning any single measurement is only an approximation of the truth. This raises a fundamental challenge: how do we communicate not just our best guess, but also the uncertainty surrounding it? The Bayesian framework addresses this by representing our knowledge as a complete probability distribution, known as the posterior. But this presents a new question: how do we distill this entire landscape of belief into a single, concise, and honest interval?

This article explores the most efficient answer to that question: the shortest credible interval, or Highest Posterior Density (HPD) interval. In the chapters that follow, we will unpack this powerful statistical tool. The first chapter, "Principles and Mechanisms," will explain what a shortest credible interval is, how it is constructed, and why it provides a superior summary for complex and asymmetric beliefs. Subsequently, the "Applications and Interdisciplinary Connections" chapter will demonstrate the HPD interval's practical value in solving real-world problems across biology, ecology, and engineering, revealing it as an indispensable method for truthfully quantifying scientific knowledge.

Principles and Mechanisms

Imagine you are an ancient astronomer, pointing your newly-built telescope at the heavens. You are trying to measure the distance to a nearby star. You take a measurement. Then you take another, and it's slightly different. You take a hundred more. They cluster around a certain value, but they are not all the same. Your instrument has limitations, the atmosphere shimmers, and a thousand other tiny effects introduce noise. Now, what do you tell the king? Do you give him a single number? That seems dishonest. It doesn't capture the uncertainty. A better approach would be to give him a range of values and a level of confidence, to say, "Your Majesty, I am 95% certain the star lies between this and that distance."

This is the fundamental problem of inference: how to summarize our knowledge, including its limitations, after we've seen the data. In the Bayesian world, our "knowledge" is not a single number but a full probability distribution. After our observations, we have what's called a posterior distribution, which assigns a degree of belief, or probability density, to every possible value of the parameter we're measuring—like the distance to that star, or the age of an ancient fossil. A 95% credible interval is simply a range that contains 95% of the "probability ink" from this posterior distribution. It is a direct statement of belief: given our data and our model, there's a 95% chance the true value lies in this interval. This is a wonderfully intuitive idea, and it differs profoundly from the frequentist "confidence interval," which speaks of the long-run performance of a procedure rather than our belief about a single result.

But a new question arises immediately. There are many possible ranges that could contain 95% of the probability. Which one should we choose?

The Shortest and Sweetest Interval

Let's think about what we want from an interval. We want it to be informative. We want it to pinpoint the location of the parameter as precisely as possible. This means we want the shortest possible interval that still captures our desired 95% of belief. How would we construct such a thing?

Imagine the posterior distribution is a landscape, a range of hills and valleys, where the height at any point represents the probability density. To build our 95% credible interval, we want to claim the most valuable real estate. We should start at the highest point of the landscape—the posterior mode, which is the single most probable value. Then, we gradually expand our territory, always incorporating the next-highest-density land available. We continue this process until the total area of our territory covers 95% of the entire landscape. The resulting region is called the Highest Posterior Density (HPD) interval.

By its very construction, the HPD interval is the shortest possible range for a given probability level. Why? Because we have prioritized including only the most "plausible" values and have been ruthless in excluding the less plausible ones. Any other interval containing 95% of the probability would have to trade some high-density region for a low-density one, which would mean stretching the interval wider to make up for the lost probability mass.

When Beliefs are Lopsided

This "shortest is best" philosophy really shows its power when the posterior distribution—our state of belief—is not a symmetric, friendly bell curve. Often, our beliefs are skewed.

Consider a population biologist studying a rare genetic mutation. They sample 100 individuals and find zero instances of the mutation. What can they say about its true frequency, $p$ , in the population? Common sense tells us $p$ is likely very small, but it's probably not exactly zero. The posterior distribution for $p$ will be heavily skewed. It will have its peak density at $p=0$ and a long tail that trickles out towards higher values.

How would we build a 95% interval here?

One common method is the equal-tailed interval, where we simply lop off 2.5% of the probability from each end of the distribution. But look at the long right tail! To find the point that cuts off the top 2.5%, we have to go very far out into a region where the probability density is incredibly low. We are including values that are not very plausible.

The HPD interval does something much smarter. Since the density is highest at $p=0$ and decreases from there, the HPD interval will start exactly at 0 and extend outwards until it has captured 95% of the probability. It refuses to include the far-out, low-plausibility values from the tail, because to do so, it would have to exclude values near 0 that are much more plausible. The result is an interval that is shorter and, arguably, a more honest summary of our beliefs.

This can lead to a fascinating and counter-intuitive consequence. In a heavily skewed distribution, the mean (the "center of mass") is pulled far into the long tail. The HPD interval, however, is built around the mode (the peak). It's entirely possible for the tail to pull the mean so far away from the peak that the mean ends up outside the 95% HPD interval!. This isn't a paradox; it's a profound lesson. The HPD interval tells you the range of the most plausible values, while the mean tells you the long-run average. For skewed beliefs, these are not the same thing.

The Shape of Uncertainty: Cliffs and Canyons

The real beauty of the HPD interval is that it doesn't force our beliefs into a simple, single range. It adapts to whatever shape our posterior distribution takes, no matter how strange.

Imagine our posterior is for a parameter that is physically constrained, like a variance, which cannot be negative. If our data suggests the value is very close to zero, our posterior distribution might look like a ski slope, starting at its highest point on the "cliff edge" at zero and decreasing from there. The HPD interval, in this case, will naturally start at the boundary. The most plausible values include the boundary itself, and our interval reflects that.

Now for an even more exotic case. Suppose we are analyzing a signal from a sensor, but we're not sure if the sensor was made at Factory A or Factory B. The sensors from each factory have slightly different characteristics. After analyzing the data, our posterior belief about a key parameter, $\theta$ , might have two distinct peaks: one centered on the typical value for Factory A and another centered on the typical value for Factory B. We have a bimodal (two-peaked) distribution. There is a "canyon" of low probability between the two peaks.

What is the 95% HPD interval for $\theta$ ? Following our rule, we start claiming the highest-density regions. This means we'll take the area around the first peak and the area around the second peak. The low-density valley in between will be one of the last regions to be considered. It's very likely that we will accumulate our 95% probability by taking two separate regions before we need to bridge the gap between them. The result? The 95% HPD "interval" is actually the union of two disjoint intervals!.

This is a spectacular result. The HPD interval has told us something crucial: the plausible values for $\theta$ are clustered in two distinct groups, and the values in between are actually not very plausible at all. Any method that forced us to report a single, connected interval would obscure this vital feature of our knowledge.

From Theory to Practice

So, how do we find these intervals in the real world? The underlying principle is that for a unimodal distribution, the posterior density at the lower bound of the HPD interval must be equal to the posterior density at the upper bound, $p(L|\text{data}) = p(U|\text{data})$ . This makes intuitive sense: if the density at one endpoint were lower than at the other, you could shorten the interval by trimming the lower-density end and adding a smaller piece at the higher-density end, while keeping the total probability the same.

In simple cases, we can use calculus to solve for the endpoints that satisfy this condition. But for the complex models used in modern science, the posterior landscapes are far too rugged for simple mathematics. Here, the computer comes to our rescue. Modern Bayesian analysis relies on algorithms like Markov Chain Monte Carlo (MCMC) that, in essence, wander around the posterior landscape, spending more time in higher-altitude regions. The output is a large list of, say, 10,000 samples drawn from the posterior distribution.

With this list of samples in hand, finding an approximate HPD interval becomes a surprisingly simple computational task. We sort the samples from smallest to largest. If we want a 90% interval from our 10,000 samples, we need to find a sub-list of 9,000 consecutive samples that has the smallest range (i.e., the smallest difference between its largest and smallest value). We can simply check all possible consecutive sub-lists of length 9,000—the one from sample 1 to 9000, from 2 to 9001, and so on—and find the one with the minimum width. It's a brute-force approach, but it perfectly implements the HPD philosophy in a practical setting.

The shortest credible interval, then, is far more than a technical choice. It is a principle for honestly and efficiently communicating what we know. It respects the true shape of our uncertainty, whether it be a simple hill, a lopsided slope, or a landscape of multiple peaks and valleys. It provides the most concise summary of the most plausible realities, a quality any scientist—or king—should value.

Applications and Interdisciplinary Connections

What does it mean to “know” something in science? If we measure the speed of light, we don’t just get one number; we get a range, a statement of our uncertainty. If we predict the path of a hurricane, we don’t draw a single line on a map; we draw a “cone of uncertainty.” The heart of modern science isn’t just about finding the “right” answer, but about honestly and precisely describing the boundaries of our knowledge. In the Bayesian world, where our knowledge is captured by a probability distribution, the shortest credible interval is our most powerful tool for drawing these boundaries. It is the most concise, most efficient summary of what we believe. But this is no mere statistical abstraction. This tool comes alive when we see it at work, wrestling with real questions across the scientific landscape.

Pinpointing Events in Deep Time: The Biologist's Time Machine

How can we possibly know when humans and chimpanzees last shared a common ancestor? We can’t use a stopwatch; the event is buried millions of years in the past. The answer lies hidden in our DNA. By comparing the genetic sequences of different species and making assumptions about the rate at which mutations accumulate—a concept known as the “molecular clock”—we can create a statistical model to estimate these ancient divergence times.

Of course, this clock isn't perfect; it ticks irregularly. Bayesian methods embrace this uncertainty. Instead of producing a single date, a computer simulation, typically a Markov chain Monte Carlo (MCMC) sampler, will generate thousands or millions of plausible dates, each one a sample from the posterior probability distribution. Imagine our simulation gives us a cloud of possible dates for a speciation event, a series of values in millions of years: 11.2, 11.3, 11.4, ..., 12.9, 13.0, and then a few surprising outliers like 13.8. How do we summarize this cloud of possibilities into a single, honest range?

This is where the shortest credible interval, or Highest Posterior Density (HPD) interval, demonstrates its simple power. We take all our sampled dates, sort them, and then find the shortest possible interval that contains, say, 95% of them. For our list of samples, this procedure quickly reveals that the interval from 11.2 to 13.0 Ma is shorter than one that tries to include the outlier 13.8 Ma at the expense of more plausible values at the other end. The HPD interval naturally isolates the most "crowded" region of our belief distribution, correctly identifying the range of dates with the highest posterior density.

Modern science often grapples with not just uncertainty in parameters, but uncertainty in the models themselves. What if we have two competing theories for how the molecular clock ticks? One model, a "relaxed clock," might yield a 95% HPD interval of $[79.9, 85.6]$ Ma for a divergence, while a simpler "strict clock" model suggests an interval of $[77.1, 82.2]$ Ma. The Bayesian framework doesn't force us to choose. Instead, we can calculate a "model-averaged" posterior, weighting each model's prediction by our confidence in it, derived from how well it explains the data. We can then ask sophisticated questions, like "What is the probability that the true date lies in the region where both models agree?" This is science in action: acknowledging, quantifying, and integrating multiple sources of uncertainty to build a more robust picture of our deep past.

The Subtle Art of Counting the Unseen

Imagine an ecologist who sets a light trap to survey a rare species of moth. A whole night passes, and the trap is empty. Is this a failed experiment? Or is it valuable data? To a Bayesian statistician, an observation of "zero" is often rich with information. The challenge is that a zero can mean two very different things: either no moths were present in the area (a "structural zero"), or moths were present but simply evaded the trap (a "sampling zero").

Models like the Zero-Inflated Poisson (ZIP) are designed for precisely this situation. They include a parameter, let's call it $\pi$ , which represents the probability of a structural zero—that the moths were truly absent. Now, after observing an empty trap, our belief about $\pi$ is updated. It turns out, for simple cases, that the posterior probability density for $\pi$ is not a symmetric bell curve. Instead, it might be a distribution that is lowest at $\pi=0$ and steadily increases towards $\pi=1$ .

If we want to construct a 95% credible interval for $\pi$ , what should we do? A naive "equal-tailed" approach would chop off 2.5% of the probability from the low end and 2.5% from the high end. But this would mean we'd be throwing out values of $\pi$ near 1, where our belief is strongest, in order to keep values near 0, where our belief is weakest. This makes no sense!

The HPD interval does the only logical thing. Since the posterior density is always increasing, the region of highest density is concentrated at the upper end. The 95% HPD interval will therefore be a one-sided interval of the form $[L, 1]$ . It respects the shape of our knowledge, telling us that based on the evidence, our belief is now concentrated on higher probabilities that the moths were truly absent. This is a beautiful and profound illustration of the HPD principle: it’s not just about containing 95% of the probability, but about containing the most plausible 95%.

Gauging Risk and Reliability: From Engineering to Economics

What does a failing space probe have in common with a volatile stock market and an unpredictable economy? They are all complex systems where we desperately want to understand the sources of risk, failure, or variance. We want to ask questions like: Is this system reliable? Which component is the biggest source of risk? What fraction of market volatility is due to sudden, shocking jumps versus everyday noise?

In reliability engineering, a system might be built from components in series, where the entire system fails if just one component does. The total system failure rate, $\Lambda$ , is the sum of the individual component rates, $\Lambda = \lambda_1 + \lambda_2$ . A Bayesian analysis might find that the posterior distribution for $\Lambda$ is an exponential distribution—a curve that starts at its maximum value at $\Lambda=0$ and steadily decreases.

In financial modeling, the total variance of a stock's returns can be broken down into a continuous part and a "jump" part from sudden market shocks. A key question is what proportion, $\phi$ , of the total variance is due to these dangerous jumps. It turns out that the posterior for this proportion often follows a Beta distribution which, for typical parameter values, is also a decreasing function, highest at $\phi=0$ .

In macroeconomics, the fluctuations in the national economy are driven by various "shocks"—to policy, to technology, and so on. An economist might want to know the fraction, $R$ , of the total output variance that can be explained by just one of these shocks. Again, the posterior for this fraction $R$ is often a decreasing function.

Notice the unifying theme. In all these seemingly disparate fields, the quantity of interest—a failure rate or a risk proportion—has a posterior distribution that is monotonic. Just as with our moth ecologist, an equal-tailed interval would be nonsensical. The HPD interval provides the clear and intuitive answer: it is a one-sided interval of the form $[0, U]$ . This tells us that our strongest belief is that the risk or failure rate is very small, but we cannot rule out that it could be as large as $U$ . This single, powerful concept unifies our understanding of risk across a vast range of applications.

A Matter of Difference: The Special Case of Symmetry

So, is the shortest credible interval always some lopsided, asymmetric thing? Not at all. Consider one of the most classic scientific questions: comparing two processes. Is a new drug more effective than a placebo? Is advertising campaign A better than campaign B? The fundamental quantity of interest is the difference in their success rates, $\delta = p_A - p_B$ .

If we start with symmetric and impartial beliefs about the effectiveness of A and B (for instance, assuming their success probabilities are uniformly distributed between 0 and 1), the resulting posterior distribution for the difference $\delta$ is often unimodal and perfectly symmetric, with its peak at $\delta=0$ . In this special, but very important, case, how do we find the shortest interval containing 95% of the probability? We simply center the interval on the peak of the distribution. The shortest interval is the one symmetric about the mode.

Here, the shortest credible interval (HPD) and the familiar equal-tailed credible interval become one and the same. This is a crucial and comforting result. It shows that the HPD concept is a powerful generalization, not a completely alien idea. It contains the simple, symmetric case we often first learn about as a natural consequence. It doesn't throw out our intuition; it refines and completes it.

The Honest Broker of Uncertainty

This journey across biology, ecology, engineering, and economics reveals a common thread. The shortest credible interval is not a rigid ruler but a flexible template that molds itself to the shape of our knowledge. When our knowledge is skewed, so is the interval. When it is described by a simple triangular shape from a model calibration problem, the interval's boundaries are found by a simple geometric cut across the triangle. Even when dealing with fantastically complex objects, like the maximum value of a function learned by a Gaussian Process, this principle of finding the most compact region of belief remains our steadfast guide.

The shortest credible interval’s true power lies in its honesty. It forces us to confront the true shape of our uncertainty—the posterior distribution—and report it without distortion. It provides the most parsimonious summary of a probability distribution, giving us the shortest possible statement of what we think is true for a given level of confidence. And for a scientist, or any curious mind, there is no higher goal.