Global Common Subexpression Elimination

SciencePedia

Key Takeaways

Global Common Subexpression Elimination (GCSE) is a compiler optimization that improves code efficiency by finding and removing redundant calculations throughout a program.
For an expression to be safely eliminated, its original computation must dominate the point of reuse and its value must remain available, meaning its operands have not been modified.
Compilers use sophisticated techniques like Static Single Assignment (SSA) and Global Value Numbering (GVN) to identify equivalent expressions, even when they are syntactically different.
The effectiveness of GCSE is limited by practical challenges such as function side effects, potential exceptions, floating-point inaccuracies, and memory model rules in concurrent programs.

Introduction

At its core, the drive for efficiency follows a simple rule: never do the same work twice. In the world of software, this principle is championed by the optimizing compiler, a sophisticated tool that refines human-written code into highly efficient machine instructions. One of its most powerful techniques is Global Common Subexpression Elimination (GCSE), a process that methodically hunts for and eliminates repeated calculations across an entire program. But this hunt is fraught with peril; an overly aggressive optimization could change a program's behavior or, worse, cause it to crash. This raises a critical question: how does a compiler balance the aggressive pursuit of performance with an absolute guarantee of correctness?

This article delves into the intricate world of Global Common Subexpression Elimination, revealing the logic and trade-offs that govern this essential optimization. In the first chapter, Principles and Mechanisms, we will dissect the foundational rules of availability and dominance, explore the detective-like tools of Static Single Assignment (SSA) and Global Value Numbering (GVN), and uncover the real-world complexities that can render an optimization unsafe. Following that, in Applications and Interdisciplinary Connections, we will see how this fundamental principle of avoiding redundant work extends beyond compilers, finding surprising relevance in diverse fields like database query optimization and high-performance GPU programming.

Principles and Mechanisms

Imagine you're following a recipe. In step 2, it says, "calculate the area of a 5-inch circular tart tin." You dutifully compute $A = \pi r^2$ . Then, in step 7, after mixing the filling, it asks you to "calculate the area of the same 5-inch tin to see if the filling will fit." You'd probably roll your eyes. You've already done this calculation! Why would you do it again? You'd just reuse the number you got the first time.

This simple, intuitive act of not repeating work is the very soul of Global Common Subexpression Elimination (GCSE). A compiler, in its quest to make your code run faster, acts like a very clever, very cautious chef. It scans the entire recipe (your program's control-flow graph) looking for identical calculations (common subexpressions) it can perform just once, saving the result for later. But the compiler's kitchen is a strange and hazardous place. What if, between step 2 and step 7, the recipe told you to stretch the tart tin, changing its radius? Reusing the old area calculation would now be a disaster.

This is the delicate dance of GCSE: the aggressive pursuit of efficiency balanced with an absolute, unshakeable commitment to correctness. To understand this dance, we must first understand the fundamental rules of the stage on which it is performed.

The Twin Pillars: Availability and Dominance

For a compiler to safely eliminate a repeated computation, two conditions must be met. Let's say we have a calculation $t = x + y$ and later, another $u = x + y$ .

First, the value of the original expression must be available. This means that after $t$ is computed, the path to $u$ must not contain any instruction that changes the value of the operands, $x$ or $y$ . If $y$ is modified somewhere in between, the expression $x + y$ at the second location is no longer the "same" expression in value, even if it's textually identical. This is the central challenge in scenarios like the one in, where a variable $y$ is updated on only one of two paths leading to a common point. The expression $x - y$ is available on one path but not the other.

Second, the original computation must dominate the point of reuse. In the language of control-flow graphs, a block $A$ dominates a block $B$ if every possible path from the program's entry to $B$ must pass through $A$ . This is a guarantee of execution. If we compute $x + y$ in a block that dominates the location of the second computation, we know for certain that the value will have been calculated before it's needed. The computation in is a perfect example. A calculation in a block B3 dominates its two successor blocks, B4 and B5. Therefore, a value computed in B3 is guaranteed to be available for reuse in both B4 and B5, making the re-computations there redundant.

These two principles—availability and dominance—form the bedrock of safe common subexpression elimination.

The Detective Work: Finding Equivalence

It's one thing to state the rules, but how does a compiler actually find these redundancies across a complex, branching program? It needs sophisticated detective tools.

Static Single Assignment (SSA): A World of Constants

Imagine a world where every variable, once assigned a value, can never change. It becomes a constant. This is the beautifully simple fiction created by Static Single Assignment (SSA) form. When a compiler transforms a program into SSA, it renames variables such that each name is assigned a value exactly once.

But what happens when two control paths merge, and a variable could have come from either path? SSA solves this with a special equation called a  $\phi$ (phi) function. If variable $x$ could be $x_1$ from the left path or $x_2$ from the right path, at the merge point we define a new variable, $x_3$ , like this: $x_3 = \phi(x_1, x_2)$ . This equation doesn't compute anything; it's a piece of notation that tells the compiler, "the value of $x_3$ is $x_1$ if we came from the left, and $x_2$ if we came from the right."

SSA makes global analysis miraculously simpler. By giving every distinct value a unique name, it makes redundancies pop out. In, once we have $t_0 = u_0 + z_3$ , any later occurrence of $u_0 + z_3$ is obviously redundant, because the names $u_0$ and $z_3$ refer to single, unchanging values.

Global Value Numbering (GVN): The True Name of a Value

GVN gives us unique names for variables, but what about the expressions themselves? Two expressions might be identical in value even if they look different. This is where Global Value Numbering (GVN) comes in. GVN acts like a universal hashing system for values. It assigns a unique "value number" to every distinct value computed in the program.

Initially, each input variable gets a unique value number. Then, for a computation like $t_1 \leftarrow a + b$ , the compiler creates a new value number associated with the operation + and the value numbers of $a$ and $b$ . If it later sees an expression $t_2 \leftarrow c + d$ and finds that $a$ and $c$ have the same value number, and $b$ and $d$ do too, it knows that $t_1$ and $t_2$ are equivalent. They get the same value number.

This is how compilers perform feats of recognition like in, where expressions like $t_1 \times t_2$ and $t_4 \times t_2$ are found to be identical because GVN proves that $t_1$ and $t_4$ (both being $a+b$ ) are the same value.

Furthermore, a truly clever GVN system understands algebra. It can canonicalize expressions. In, the expressions $x + (x + y)$ and $(x + x) + y$ are syntactically different. But a GVN algorithm that knows addition is associative and commutative can represent both as a multiset of their primitive components—two $x$ 's and one $y$ —and assign them the same value number, revealing their hidden equivalence.

When combined, SSA and GVN are a formidable duo. SSA provides a clean map of where values flow, and GVN determines what those values truly are.

Beyond the Basics: The Art of Partial Redundancy

What happens if an expression is computed on one path but not another? At the merge point, the expression is partially redundant: it's redundant if you came from one direction, but necessary if you came from the other.

A simple GCSE algorithm would be stumped. But a more advanced technique called Partial Redundancy Elimination (PRE) can play a clever trick. It says, "What if I just insert the missing computation onto the 'bare' path?" By doing so, it transforms a partial redundancy into a full redundancy. Now, the expression is guaranteed to be available on all paths leading to the merge point, and it can be safely hoisted up to a dominating block and computed just once. This is a proactive optimization, where the compiler adds a little work in one place to save a lot of work later.

Sometimes, the trick is even more subtle. In, the expression $x + y$ is computed after $x$ is incremented on one path. A simple hoisting would be incorrect because it would use the old value of $x$ . But an advanced optimizer can still find a win. It can hoist the original $x + y$ , call it $t$ , and then on the path where $x$ was incremented, it can compute the result as $t + 1$ . This kind of algebraic compensation showcases the remarkable ingenuity that can be encoded into a compiler.

The Real World Bites Back: When Optimizations Are Illegal

So far, we've lived in a pristine, mathematical world. But real programs are messy. They interact with the operating system, with hardware, and with other threads. This is where the true challenge—and beauty—of compiler design emerges. A good compiler must be a pessimist, aware of all the things that can go wrong.

The Ghost in the Machine: Side Effects and Purity

What if an expression does more than just compute a value? A function call might write to a file, update a global variable, or read the system clock. Such an operation has side effects, and it breaks a critical property called referential transparency. An expression is referentially transparent if, for the same inputs, it always produces the same output and has no other observable effects.

The function f(n) in, which internally calls clock(), is a classic example. The two calls f(n) + f(n) are not a common subexpression! Each call to clock() returns a new value. Eliminating one call would change the program's result. Similarly, in, a hash(s) function that reads a global seed cannot be freely moved if other functions might change that seed in between.

To navigate this, compilers rely on annotations. A function marked pure is a promise that it's referentially transparent. A variable marked volatile is a warning to the compiler: "This value can change at any time for reasons you can't see! Do not optimize away reads or writes to me."

Mines in the Code: The Danger of Exceptions

Some expressions are landmines. The expression $a / b$ is perfectly safe, unless $b$ is zero, in which case it triggers a catastrophic exception. Now consider the program in. The expression $a / b$ is computed on a path where the code has explicitly checked that $b \neq 0$ . Another path, for when $b = 0$ , avoids the expression entirely.

A naive GCSE might see the repeated $a / b$ and hoist it to a dominating block that is executed before the check. This is unsafe speculative execution. The transformed program would now crash with a division-by-zero error on a path that was perfectly safe in the original. A correct optimizer must prove that a potentially-excepting instruction is safe before moving it to a location where it would execute more often.

The Fuzzy World of Floating-Point Math

On paper, $x^2$ is identical to $x \times x$ . In a computer, using floating-point numbers, this may not be strictly true. In, we see that the library function pow(x, 2) and the direct multiplication x * x might produce microscopically different results due to rounding. More importantly, they can have different side effects. A pow function might be required by the language standard to set a global error variable, errno, on overflow, while the multiplication operator does not. If the program later checks errno, an optimization that replaces pow(x, 2) with x * x would change the program's observable behavior.

This is why many compilers have a "fast-math" mode. It's an agreement between the programmer and the compiler. The programmer says, "I promise I don't care about these subtle floating-point rules or errno. Just make my code fast." Only with this permission can the compiler treat $x^2$ and $x \times x$ as the same.

The Anarchy of Concurrency

Perhaps the greatest modern challenge to optimization is concurrency. If multiple threads are running at once, a shared variable can be changed at any moment by another thread. In, a thread performs atomic_load_acquire(x) twice. Can we eliminate the second load? Absolutely not. An atomic load is not just a read; it's a communication with the rest of the concurrent system. Between the first and second load, another thread could have written a new value to x. Eliminating the second load would mean the thread becomes blind to this update, violating the memory model and leading to chaos.

This illustrates a profound shift. In a concurrent world, many classical optimizations must be re-evaluated or abandoned, because a seemingly redundant operation may in fact be a critical point of synchronization.

The Unsung Art of Compilation

Our journey started with a simple idea: don't do the same work twice. We quickly discovered that this principle, when applied to the complex world of computer programs, is anything but simple. It requires an intricate toolkit of logical deduction—like SSA and GVN—to find the true meaning behind the code. It demands clever strategies like PRE to proactively create opportunities for optimization. And most of all, it demands a deep and humble respect for the realities of the machine: for side effects, for exceptions, for the quirks of floating-point math, and for the wild, unpredictable nature of concurrency.

The work of an optimizing compiler is an unsung art form. It is a silent, intricate dance between the pursuit of performance and the guarantee of correctness, a testament to decades of computer science research. The next time your code runs surprisingly fast, take a moment to appreciate the hidden genius of the compiler, the tireless detective that has analyzed every path and polished every instruction, turning your human logic into lightning-fast reality.

Applications and Interdisciplinary Connections

In our journey so far, we have peeked behind the curtain to see one of the compiler's most elegant tricks: Global Common Subexpression Elimination (GCSE). At its heart, the idea is wonderfully simple, almost embodying a universal principle of efficient effort: never do the same work twice. A child building with blocks, upon realizing they need another four-by-two red brick, will not manufacture one from scratch if an identical one is already sitting on the table. A compiler, in its own sophisticated way, does precisely the same.

But this simple idea, when applied to the complex tapestry of a computer program, blossoms into a profound art. It requires the compiler to be not just a bookkeeper, but a detective, a logician, and sometimes even a physicist, understanding the deep structure of the computation and the environment in which it runs. Let's explore how this principle of "computational laziness" extends far beyond simple textbook examples, connecting disparate fields and revealing a beautiful unity in the science of computation.

The Art of Seeing Sameness

What does it even mean for two computations to be the "same"? Our human eyes might be fooled by surface-level differences, but a compiler must see deeper. Consider a scenario where a program computes two different values, say $x \leftarrow a_1 + f(u)$ in one part of the code and $y \leftarrow a_2 + f(u)$ in another. To us, these look like two distinct calculations. But a clever compiler, equipped with modern GCSE techniques, recognizes a shared soul within them: the expression $f(u)$ . If it can prove that the function $f$ is pure—that it behaves like a mathematical function, always giving the same output for the same input without any side effects—and that its argument $u$ hasn't changed, it can perform a beautiful optimization. It computes the expensive $f(u)$ just once, saves the result in a temporary variable, and reuses it for both additions. This is not merely matching text; it is understanding the compositional structure of expressions, like seeing that two different buildings share the same foundational blueprint.

This art of seeing sameness extends to navigating the thicket of logic. Imagine a conditional statement like if (X or (Y and X)). A human programmer might write this, perhaps without realizing the redundancy. A compiler, acting as a logician, can apply the absorption law of Boolean algebra ( $A \lor (B \land A) \equiv A$ ) to prove that the entire condition is equivalent to just X. This means if X involves an expensive computation like comparing two large blocks of memory, the compiler can transform the code to ensure that computation happens only once, completely eliminating the second instance that the original logic seemed to imply. The compiler isn't just following instructions; it's reasoning about them.

Of course, this magical insight doesn't happen in a vacuum. A modern compiler is a symphony of collaborating optimizations, each pass preparing the stage for the next. An expression like $g(3)$ might appear twice, but how does the compiler know the two calls are identical? It's the job of another optimization, interprocedural constant propagation, to trace the constant value 3 through the function calls, proving that both calls are, in fact, equivalent to the same constant value. Only then can GCSE step in and eliminate the second call. This illustrates a crucial concept in engineering complex systems: the "phase ordering problem." A pipeline of optimizations—value numbering, code motion, induction variable analysis, and finally, common subexpression elimination—must be arranged in just the right order, like a sequence of lenses, to bring the final, optimized program into sharp focus.

Beyond the Straight and Narrow Path

Real programs are rarely a straight line; they are labyrinths of branches, loops, and recursive calls. Applying GCSE in this world requires even more sophisticated reasoning.

Consider recursion. If a function calls itself, and an expression $f(n)$ is computed both before and after the recursive call, can we eliminate the second one? An intra-procedural GCSE pass can! As long as the function $f$ is pure and its argument $n$ hasn't changed within the current activation of the function, the compiler can safely reuse the first result. It understands that the whirlwind of recursion happening in between, because it consists of pure operations, cannot disturb the state that $f(n)$ depends on. However, this insight has its limits. Standard GCSE cannot easily share the result of $f(n)$ from one recursive call to the next, deeper one. That would require re-architecting the program, perhaps by passing the value as a new parameter or building a cache (a technique called memoization), which are powerful transformations that go beyond the typical scope of GCSE.

The plot thickens further when we consider branching paths. Sometimes, a computation is repeated on two different branches of an if-then-else structure. A simple analysis might forbid reusing the result from one branch in the other because neither strictly "dominates" the other in the control-flow graph. This is where a more profound understanding of program structure, the Program Dependence Graph (PDG), comes into play. Instead of just looking at the possible flow of control, the PDG maps out the true dependencies: what data a computation needs, and which decision controls its execution. If two computations of $a/b$ are found to be controlled by the exact same condition (e.g., they both only run if $b \neq 0$ ) and depend on the exact same inputs, the PDG reveals they are "control-equivalent" and can be merged, even if the control-flow path is complex. This more advanced view also highlights the paramount importance of safety. If we carelessly hoist a computation like $a/b$ to a place where it would run unconditionally, we might introduce a division-by-zero error that would never have happened in the original program—a catastrophic failure.

The Universal Principle: GCSE in the Wild

The principle of eliminating redundant work is so fundamental that it transcends the world of traditional compilers and finds a home in vastly different computational ecosystems.

Think of a modern database. When you submit a complex query, the database engine doesn't just blindly fetch data. It first creates a "query plan," which is essentially a program for data retrieval. This query plan is then optimized. Suppose your query involves a user-defined function (UDF), say f(a,b), and this function appears in two different parts of the query pipeline. The database's query optimizer, acting like a compiler, can apply GCSE to compute f(a,b) only once per row of data and feed the result into both pipelines.

However, this is only possible if the UDF behaves like a true function. If it is "volatile"—for instance, if its value depends on the current time (now()) or a random number—then two calls are not, in fact, the same work and must be executed separately. Likewise, if the function has side effects, like writing to a log file, executing it once instead of twice would change the program's observable behavior. The database optimizer must be absolutely certain of the function's purity and determinism before it dares to eliminate a call.

Perhaps the most fascinating application of these principles is in the alien world of Graphics Processing Units (GPUs). A GPU achieves its staggering speed by having thousands of tiny processors execute the same program in lockstep on different data (a model called SIMT, or Single Instruction, Multiple Threads). When these threads encounter a branch, a curious thing happens: the entire group (or "warp") of threads marches down the "then" path, with only the relevant threads actually doing work, and then marches down the "else" path, again with only the relevant threads active.

Now, imagine an expression like $\sin(\theta)$ appears in the "then" branch, the "else" branch, and again after the branches reconverge. In a divergent warp, this means the $\sin$ instruction is actually executed three times! A GPU compiler can apply GCSE to hoist the computation of $\sin(\theta)$ to a point before the branch, executing it just once for all threads. This single transformation can nearly triple the speed of that piece of code for a divergent warp.

But here, the domain's unique nature introduces extraordinary subtlety. What if the expression is a texture sample, $\mathrm{texture}(u,v)$ ? This seems like a simple function call. But it's not. To produce a realistic image, the GPU automatically selects the appropriate resolution of the texture (a "mip level") by calculating derivatives—how fast the texture coordinates $(u,v)$ are changing across neighboring threads. If you hoist this call to before a divergent branch, the set of "neighboring" threads changes, which can change the derivative, which changes the mip level, which ultimately changes the color returned! The hoisted computation is no longer the "same" as the original. An optimizer can only perform this GCSE transformation if it can prove that the branch is uniform (all threads go the same way) or if the texture lookup uses an explicit level-of-detail that doesn't depend on these secret, hidden inputs from its neighbors.

From the simple elegance of avoiding a repeated addition to the profound subtleties of texture sampling in a parallel universe of threads, the principle of Global Common Subexpression Elimination remains the same. It is a testament to the beauty that arises when a simple, intuitive idea is pursued with mathematical rigor and deep domain knowledge, revealing a common thread of intelligence that runs through all forms of computation.