The Constancy Assumption

SciencePedia

Key Takeaways

The constancy assumption is the critical belief that an active control treatment's effect, measured against a placebo in historical trials, remains unchanged in a current non-inferiority trial.
A violation of this assumption, due to changes in patient populations or standard care, can lead to the erroneous approval of an ineffective or even harmful new therapy.
Scientists mitigate this risk by using conservative statistical margins and by designing three-arm trials that include a placebo group to directly test the assumption.
The core idea of assuming uniformity extends beyond medicine, appearing as a foundational principle in fields like botany, geophysics, and AI, where it is both an enabling tool and a potential source of error.

Introduction

In the pursuit of knowledge, science often relies on assumptions to build bridges across gaps in our understanding. One of the most consequential of these is the constancy assumption—a belief that what was true in the past remains true today. This concept is a cornerstone of modern research, particularly in fields where direct, ideal comparisons are ethically or practically impossible. It allows us to stand on the shoulders of historical data, but it also forces us to take a leap of faith, one that carries profound risks if the ground beneath us has shifted.

This article dissects this pivotal idea. We will explore how we evaluate new innovations when the goal is not to be superior, but simply "good enough," and how this question forces a reliance on the past. The reader will learn why this reliance is both necessary and dangerous, and how scientists grapple with the fragility of their own assumptions.

First, we will delve into the Principles and Mechanisms of the constancy assumption, using the high-stakes world of medical non-inferiority trials as our primary case study. Then, in Applications and Interdisciplinary Connections, we will journey beyond medicine to uncover how this same fundamental logic underpins inquiry in fields as diverse as geophysics, botany, and artificial intelligence, revealing it as a universal theme in the scientific quest for truth.

Principles and Mechanisms

In our journey to understand the world, science often asks "Is this better?". We seek new medicines that cure more effectively, new materials that are stronger, new energy sources that are cleaner. The logic is one of superiority. But what if the question is different? What if we have a new drug that isn't necessarily more powerful, but is perhaps taken once a day instead of three times, has fewer side effects, or is much cheaper to produce? In this case, we're not looking for a knockout victory. We're asking a more nuanced question: "Is this new thing not unacceptably worse?".

This is the world of non-inferiority trials, and it is one of the most intellectually subtle and challenging areas in modern medical science.

A Different Kind of Victory: The Quest for "Non-Inferiority"

Imagine we are testing a new antibiotic, let's call it $T$ , against the current standard-of-care antibiotic, $C$ . The outcome we care about is the probability of a patient dying, which we'll call $p$ . Naturally, a lower probability is better. In a traditional superiority trial, we would try to prove that $p_T p_C$ .

But in a non-inferiority trial, our goal is to prove that $T$ is, at worst, only marginally less effective than $C$ . We must first define what "marginally less effective" means. We set a non-inferiority margin, a small, pre-specified number represented by the Greek letter delta, $\Delta$ . This is the largest loss of efficacy we are willing to tolerate in exchange for the new drug's other benefits. For instance, we might decide that an increase in mortality risk of $1\%$ ( $\Delta = 0.01$ ) is the absolute maximum acceptable trade-off.

The logic of the trial is then turned on its head. The "null hypothesis" ( $H_0$ )—the state of affairs we aim to disprove—is that the new drug is indeed inferior, meaning the difference in risk $p_T - p_C$ is greater than or equal to our margin $\Delta$ . Our goal is to gather enough evidence to reject this pessimistic view and conclude the "alternative hypothesis" ( $H_1$ ), which is that the new drug is non-inferior ( $p_T - p_C \Delta$ ). To claim victory, we must be confident that the true difference is not in the "unacceptably worse" zone.

This seems straightforward enough. But a deep and dangerous trap lies hidden in this logic.

The Ghost in the Machine

Let’s stick with our trial of drug $T$ versus drug $C$ . Suppose the trial ends, and we find that the death rates are almost identical: $p_T \approx p_C$ . We triumphantly declare our new, easier-to-take drug "non-inferior."

But what if, during our trial, something unexpected happened? What if a new, resistant strain of bacteria emerged, rendering both drugs useless? Or what if the trial was poorly run, with patients not taking their medication correctly? In that case, both drugs would appear similar because both were ineffective. Our conclusion of non-inferiority would be a disastrous illusion, potentially leading to the approval of a worthless medicine.

This brings us to a fundamental concept: assay sensitivity. A trial has assay sensitivity if it has the ability to distinguish an effective treatment from an ineffective one. In our example, the nightmare scenario is a trial that lacks assay sensitivity. The only way to be sure that our trial has this property is to see the active control drug, $C$ , actually work. And the only way to see it work is to compare it to... nothing. A placebo.

But in many modern trials, especially for serious conditions like life-threatening infections or cancer, giving a patient a placebo when a known effective treatment exists is unethical. So, the placebo arm is often missing. We have our two active drugs, $T$ and $C$ , but the one thing that could give us confidence in our results—the proof that $C$ is actually working in this very trial—is absent. It is a ghost in the machine of our experiment.

A Leap of Faith Across Time

How do we solve this riddle? We look to the past. The active control, $C$ , is a standard treatment precisely because it was proven effective in historical, placebo-controlled trials. The entire logic of a modern non-inferiority trial rests on a bold and crucial "leap of faith" known as the constancy assumption.

The constancy assumption posits that the effect of the active control ( $C$ ) versus placebo ( $P$ ) that was measured in historical trials is preserved and remains the same in our current trial. If historical trials showed that drug $C$ reduced the risk of an event by $10\%$ compared to a placebo, we assume that if we had a placebo group in our trial today, we would see that same $10\%$ risk reduction.

This assumption is the conceptual bridge that connects our current trial to historical evidence, allowing us to believe our trial has assay sensitivity. The historical effect of the control drug is the yardstick against which we measure our new drug. The constancy assumption is what allows us to borrow that yardstick from the past. But what if the yardstick has changed?

When the World Changes and the Bridge Collapses

Assuming the world stands still is a dangerous game. The conditions under which historical trials were run might be very different from the conditions today. This is where our conceptual bridge can collapse. Many real-world changes can threaten, or invalidate, the constancy assumption:

Improvements in Standard of Care: Imagine that ten years ago, the placebo event rate for a heart condition was $20\%$ . Today, with better lifestyle coaching, statins, and general care, the event rate for someone on "placebo" (meaning, everything except the active drug) might only be $11\%$ . This improvement in background care leaves much less room for the active control drug to show a benefit. Its effect size shrinks.
Changes in Patient Populations: Early trials might enroll severely ill patients, where a drug can have a large effect. Later trials might include patients with milder disease, where the drug's effect is naturally smaller.
Evolution of the "Enemy": For infectious diseases, this is a constant battle. The bacteria or viruses a drug was designed to fight can evolve resistance, rendering the once-powerful active control much weaker.
Differences in Trial Conduct: Even subtle changes in how an endpoint is defined, how adherence to medication is monitored, or the use of "rescue" therapies can alter the apparent effect of a drug.

Let's see how devastating this can be with a concrete, cautionary tale based on a hypothetical scenario. Imagine a historical trial showed an active control drug, $C$ , reduced event rates from $20\%$ (placebo) to $10\%$ (control), a powerful absolute risk reduction of $10\%$ . Based on this, we set our non-inferiority margin $\Delta$ to be $5\%$ , meaning we'll accept a new drug $T$ if it's no more than $5\%$ worse than $C$ .

Now, in our modern trial, we observe an event rate of $10\%$ for drug $C$ and $12\%$ for our new drug $T$ . The difference is just $2\%$ , which is well within our $5\%$ margin of comfort. Statistically, the trial is a success! We conclude non-inferiority.

But here is the secret we didn't know: because of massive improvements in background care, the true placebo rate in our modern trial would have been $11\%$ . The "powerful" drug $C$ only reduced the rate from $11\%$ to $10\%$ , a meager $1\%$ effect. Its power has all but vanished. Our new drug $T$ , with its $12\%$ event rate, is actually worse than doing nothing. Yet, because we relied on an outdated historical yardstick, we were fooled into declaring an ineffective—even harmful—drug a success. This is the perilous trap of a violated constancy assumption.

Building a Safer Bridge: The Art of the Margin

Scientists are not naive; they are acutely aware of this danger. So, they don't just take a blind leap of faith. They try to build a safer, more robust bridge to the past. This involves several clever strategies.

First is the art of setting the margin. The margin $\Delta$ isn't just pulled out of thin air. It is calculated with deliberate conservatism. A common approach involves two steps:

Take the historical data for the control's effect. Instead of using the average effect, use the most pessimistic, statistically plausible value—the lower bound of its confidence interval. This already builds in a buffer for uncertainty.
Preserve a fraction of this effect. We don't set the margin to be equal to this historical effect. Instead, we demand that our new drug preserve a substantial fraction of it, say $50\%$ . The margin $\Delta$ is then set to be the part of the effect we are willing to lose. This ensures that even in the worst plausible case, the new drug retains a meaningful clinical benefit.

Second is the choice of language, or the effect scale. Sometimes, an effect is more stable across populations when measured in relative terms rather than absolute ones. For instance, a drug might consistently reduce the risk of an event by $60\%$ (a risk ratio of $0.40$ ), regardless of whether the baseline placebo risk is high ( $30\%$ ) or low ( $10\%$ ). In the first case, the absolute risk reduction would be $18\%$ , while in the second it would be only $6\%$ . If we believe the underlying biology is multiplicative, defining our margin on a relative scale (the risk ratio) is more robust to changes in baseline risk than using a fixed absolute difference.

Inviting the Ghost Back to the Party

The most powerful strategy, however, is to not rely solely on the past. The ultimate way to verify assay sensitivity is to get a direct, contemporary measurement of the control drug's effect. This has led to the design of three-arm non-inferiority trials that include the new drug ( $T$ ), the active control ( $C$ ), and a small, ethically managed placebo group ( $P$ ).

Including a placebo arm, even a small one, is a profound shift. It allows us to directly observe the effect of $C$ versus $P$ in the here and now, turning the constancy assumption from a leap of faith into a testable hypothesis. With careful ethical safeguards—such as limiting the time a patient can be on placebo and have clear rules for immediate rescue with effective therapy—this design provides two critical pieces of information.

First, it validates the entire premise of the trial. We can use a gatekeeping strategy: the first "gate" is to prove that $C$ is indeed superior to $P$ in our trial. Only if we pass through that gate does it become meaningful to open the second gate and test if $T$ is non-inferior to $C$ . If the active control fails to beat the placebo, the non-inferiority question becomes moot; the trial has failed to show assay sensitivity.

Second, the placebo group gives us a live calibration of all the non-specific effects of being in a trial—the background care, the patient's expectations, the natural course of the disease. It allows us to anchor our interpretations in the reality of the present, not the memory of the past. By inviting the ghost of the placebo back to the party, we can see clearly what is real and what is an illusion, ensuring that when we declare a new therapy "just as good," we can be confident that "good" still means something.

Applications and Interdisciplinary Connections

Having grasped the principles and mechanisms of our central idea, we now embark on a journey to see it in action. Like a master key, the constancy assumption unlocks doors in fields that, at first glance, seem worlds apart. It is in the application of an idea that its true power and beauty are revealed. We will see how this single concept is a silent partner in life-and-death decisions in medicine, a source of subtle error in our measurement of the living world, a foundational principle for peering deep into the Earth, and a guiding star in our quest to build intelligent machines. This is not a collection of disconnected examples; it is a demonstration of the profound unity of scientific thought.

The High-Stakes World of Medicine: Is a New Drug “Good Enough”?

Imagine a new drug has been developed for a severe heart condition. For decades, the standard treatment, let’s call it drug $A$ , has saved lives. Now, we have a new contender, drug $T$ . Ethically, we cannot give a sick patient a placebo when a life-saving treatment like $A$ exists. So, the only way forward is to compare the new drug $T$ directly against the old one, $A$ .

Our goal isn’t necessarily to prove that $T$ is better than $A$ ; perhaps it's simply safer, cheaper, or easier to take. We just need to be sure it is not unacceptably worse. This is the world of the noninferiority trial. But what does “unacceptably worse” mean? This is where our story begins. To define this margin of acceptability, we must look to the past. We dust off the old clinical trial reports from a time when it was ethical to compare drug $A$ to a placebo, $P$ . These historical trials tell us just how much benefit drug $A$ provides—its effect over nothing at all. Let's say history tells us that drug $A$ prevents 10 heart attacks out of 100 people compared to a placebo.

Now comes the great leap of faith. In our new trial comparing $T$ to $A$ , we make a crucial, untestable assumption: the constancy assumption. We assume that the benefit of drug $A$ over a placebo is the same today as it was in those historical trials. Based on this assumed "constant" benefit, we can declare that our new drug $T$ is "not unacceptably worse" if it manages to preserve a substantial fraction—say, at least half—of drug $A$ 's historical effectiveness.

But is this assumption safe? Of course not! The world has changed. Today's patients may be different, background medical care has improved, and the disease itself might have evolved. In the world of infectious diseases, for example, the "constant" effect of an antibiotic can vanish as bacteria develop resistance. The ground beneath our assumption is shaky.

Because the stakes are so high, regulatory science has built a sophisticated system of safeguards. We don't use the historical average effect of drug $A$ ; we conservatively use the lower bound of its confidence interval—the smallest plausible effect it might have. We then insist that the new drug preserves a significant fraction of this minimal effect. Furthermore, trial designers must work tirelessly to make the assumption plausible by meticulously matching the new trial's conditions—patient populations, endpoint definitions, dosage schedules—to the historical ones, and by conducting the trial with extreme rigor to ensure its quality, or "assay sensitivity".

The danger of a failed constancy assumption is not merely academic. Consider a vaccine trial. A historical vaccine, $V_C$ , was shown to reduce infection risk from $12\%$ to $3\%$ —an absolute risk reduction of $9\%$ . Now, due to herd immunity, the background risk of infection in unvaccinated people has plummeted to just $4\%$ . If we naively assume the absolute effect of $V_C$ is constant, we might set our noninferiority margin at, say, a $5\%$ loss of efficacy. This sounds reasonable. But look closer. In this new low-risk world, the old vaccine's effect, if constant on a relative scale (e.g., $75\%$ efficacy), would only reduce risk from $4\%$ to $1\%$ . The total benefit is now just $3\%$ . A margin of $5\%$ is larger than the entire effect we are trying to preserve! A new, useless, or even harmful vaccine could be declared "noninferior" under this broken assumption. This powerful example teaches us that the constancy assumption is not just about whether an effect is constant, but also about how it is constant—on what mathematical scale (absolute or relative) it remains stable.

The Same Idea, Everywhere: Uncovering Hidden Uniformity Assumptions

Once you have the pattern in mind, you start to see it everywhere. Science constantly relies on assumptions of uniformity or constancy, often without a second thought.

Think about a simple question: what is your lifetime risk of developing acute appendicitis? A quick way to estimate this is to take the annual incidence—a small number, say $0.1\%$ —and multiply it by an average lifespan. This calculation implicitly assumes that your risk is constant every single year of your life. But we know this isn't true. The risk of appendicitis is very low in young children, peaks in the teens and twenties, and declines again in old age. The "constancy over time" assumption provides a simple answer, but it masks the true, dynamic nature of the risk.

Or consider a botanist measuring how a plant "breathes". A leaf is covered in thousands of tiny pores called stomata that open and close to regulate gas exchange. Standard equipment encloses the whole leaf and measures the total flow of carbon dioxide and water vapor, implicitly assuming that all the stomata are behaving identically—a "constancy over space" assumption. But when a plant is under stress, it can exhibit "stomatal patchiness," where some regions of the leaf have open pores while others are closed. This violation of spatial uniformity leads to biased estimates of photosynthesis and water-use efficiency, because the relationship between gas flow and photosynthesis is nonlinear. Averaging over a non-uniform system gives a wrong answer.

This same problem haunts us when we look at our planet from space. An Earth-observing satellite measures the light reflected from a patch of forest. A naive machine learning model might assume that the forest's "color" or spectral signature is a constant property of the forest itself. But it is not. The measured radiance depends dramatically on the geometry: the angle of the sun and the angle of the satellite. This is due to the Bidirectional Reflectance Distribution Function (BRDF), which describes how reflectance changes with angles. An assumption of "constancy over viewing angle" is false for almost any real-world surface. A model trained on images taken with the sun high in the sky might fail completely on images taken near dawn or dusk.

Yet, sometimes this assumption is our greatest ally. In the field of geophysics, scientists probe the structure of the Earth's crust by listening to natural electromagnetic waves generated by currents in the ionosphere. These source currents are immense and thousands of kilometers away. Over the scale of a local survey (a few kilometers), the incoming waves are essentially planar and their properties are "laterally constant." Here, the uniformity assumption is not a risky leap of faith but a robust and enabling principle, forming the very foundation of the magnetotelluric method.

The Ghost in the Machine: Constancy, Causality, and AI

The constancy assumption reaches its most abstract and modern form in the realms of causality and artificial intelligence. When we analyze data, we are often trying to move beyond mere correlation to understand cause and effect. To do this, we must adopt an assumption of "faithfulness" or "stability". This is the belief that if two variables in our data are statistically independent, it is because there is no causal pathway connecting them. We assume it's not due to a bizarre coincidence, where, for instance, a positive causal effect along one path is perfectly and exactly canceled out by a negative effect along another. In essence, faithfulness is an assumption that the causal relationships in the system are "constant" and not hiding behind miraculous cancellations.

This quest for invariant relationships is at the heart of building robust and trustworthy Artificial Intelligence. Imagine an AI model trained to predict patient mortality using data from Hospital A. We want this model to work just as well in Hospital B. Hospital B may have different patient demographics, newer equipment, or different documentation habits. These factors create "domain shift"—a change in the statistical distribution of the data. A naive model might learn spurious correlations specific to Hospital A (e.g., "patients measured on machine X have worse outcomes," when in fact machine X is just used for the sickest patients).

The goal of modern domain generalization is to train a model that learns only the invariant relationships—the underlying biological mechanisms that are "constant" across all hospitals—while ignoring the spurious, environment-specific correlations. The ideal representation of a patient, $\phi(X)$ , would be one where the relationship to the outcome, $P(Y \mid \phi(X))$ , is stable and transportable from one hospital to the next. The constancy assumption is no longer just a necessary evil for statistical inference; it has become the explicit target in our search for generalizable knowledge.

From the pragmatic need to approve a new drug to the philosophical challenge of inferring cause from effect, the constancy assumption is a thread that runs through the fabric of science. It is a tool that allows us to build models in a complex world, but it carries a profound responsibility: to question our assumptions, to understand their fragility, and to appreciate that true insight often comes not from assuming things are constant, but from understanding exactly how, and why, they change.