Common Subexpression Elimination

SciencePedia

Key Takeaways

Common Subexpression Elimination (CSE) is a compiler optimization that improves performance by calculating a repeated expression only once and reusing its result.
Compilers use data structures like Directed Acyclic Graphs (DAGs) and Static Single Assignment (SSA) to safely identify and eliminate redundant computations.
The effectiveness of CSE is limited by challenges such as pointer aliasing, the non-associative nature of floating-point math, and explicit volatile memory accesses.
The principle of CSE extends beyond compilers, influencing the design of spreadsheets, the optimization of AI models, and revealing safety issues in concurrent programming.

Introduction

In the quest for faster and more efficient software, one of the most powerful tools is not a faster processor, but a smarter compiler. At the heart of modern compiler design lies a set of optimizations that transform human-readable code into highly-efficient machine instructions. This article explores a cornerstone of this process: Common Subexpression Elimination (CSE), the elegant principle of performing a calculation only once and reusing its result. We will address the fundamental question of how a compiler can safely identify and eliminate this redundant work, a task fraught with subtle complexities. First, in the "Principles and Mechanisms" chapter, we will dissect how CSE works, from its logical foundations using Directed Acyclic Graphs to the practical challenges posed by pointers and concurrency. Subsequently, the "Applications and Interdisciplinary Connections" chapter will broaden our perspective, revealing how the core idea of CSE extends far beyond compilers, influencing everything from spreadsheet design and AI model optimization to the very principles of concurrent programming. By the end, you will have a deep appreciation for this fundamental concept of computational efficiency.

Principles and Mechanisms

Imagine you're tackling a complex physics problem, and buried in your equations, you find yourself needing to calculate the value of $137.036 \times 2.998 \times 10^8$ multiple times. Would you punch it into your calculator again and again? Of course not. You'd calculate it once, jot the result down in the margin, and reuse that number whenever it appeared. It’s faster, less work, and reduces the chance of error.

This simple, intuitive act of not repeating yourself is the very soul of a powerful compiler optimization known as Common Subexpression Elimination (CSE). A compiler's job is to translate the human-readable code you write into the fast, efficient machine language a processor understands. And just like you, a smart compiler is always looking for ways to avoid redundant work. But while your "jotted-down note" is a simple act of memory, a compiler's process for doing this safely is a fascinating journey into logic, structure, and the very nature of computation.

Seeing the Structure of a Calculation

How does a compiler even "see" that two parts of your code are doing the same thing? It starts by breaking down your mathematical expressions into a fundamental structure. Let's say you write a line of code like this: result = (a * b) + (c * d) - (a * b);

A naive way to represent this is with a structure called a parse tree, which mirrors the syntax of your code exactly. Each operation and each variable is a new node. You can see how the subexpression (a * b) appears twice, leading to two separate branches in the tree. If we generated code from this tree, we would tell the computer to multiply a and b twice.

But a clever compiler does something more elegant. It builds a Directed Acyclic Graph (DAG). In a DAG, every unique subexpression and variable gets exactly one node. When (a * b) is needed a second time, instead of creating a new set of nodes, the compiler simply draws another arrow to the node it already created for the first (a * b). The redundant structure melts away, revealing the true, minimal set of computations required. The DAG is the compiler's "jotted-down note," a blueprint that shows which results can be shared and reused. Generating instructions from this compact DAG naturally eliminates the redundant work, making the program faster and more efficient.

The Rules of the Game: Generate and Kill

Building a DAG for a whole program at once would be incredibly complex. In practice, compilers often work on smaller, more manageable pieces of code called basic blocks—straight-line sequences of instructions with no branches in or out. Within a basic block, the logic for CSE is beautifully simple, a game of "generate" and "kill".

Let's follow a compiler's thought process as it analyzes a block of code:

t1 := y + z
t2 := y + z
x := t1 - t2

When the compiler sees the first line, it performs the addition and stores the result in a temporary variable t1. It makes a note: "The value of the expression y + z is now available in t1." This is a "gen" (generate) event.

On the second line, it sees y + z again. It checks its notes. Is y + z available? Yes! And have the ingredients, y and z, been changed since t1 was calculated? No. Therefore, the compiler knows that re-calculating y + z is wasteful. It can simply reuse the value it already has. The compiler transforms the code, replacing the second instruction with a simple copy: t2 := t1.

But what if the code looked like this?

t1 := y + z
y := 10
t2 := y + z

Here, after t1 is computed, the variable y is modified. This is a "kill" event. The compiler must be brutally honest with itself: its note that t1 holds the value of y + z is now dangerously out of date. The original expression y + z has been invalidated. When it reaches the third line, it cannot reuse t1. It must perform the addition again with the new value of y.

This simple gen-kill logic, when combined with other optimizations, can lead to astonishing simplifications. Consider the expression x = (y+z) - (y+z).

First, CSE recognizes y+z as a common subexpression, transforming the code to t1 := y+z; x := t1 - t1.
Next, an algebraic simplifier sees t1 - t1. It knows that any number subtracted from itself is zero. The code becomes t1 := y+z; x := 0.
Finally, another optimization, Dead Code Elimination (DCE), looks at the situation. The variable t1 is calculated, but its value is never used again. The instruction t1 := y+z is "dead." It can be removed entirely.

What started as three instructions has been distilled by pure logic into a single, perfect instruction: x := 0. This is the elegance of compiler optimization: turning complex-looking code into its simplest, fastest equivalent.

The Dangers of a Local View

This block-by-block approach is powerful, but it's like looking at the world through a keyhole. What if a redundant computation is hiding in plain sight, but separated by an if-else statement?

A local CSE pass, analyzing each block in isolation, would see two separate computations of a * b and would be unable to combine them. To find these "global" redundancies, the compiler needs to zoom out. Modern compilers do this using a more advanced representation called Static Single Assignment (SSA). In SSA form, every variable is assigned a value exactly once. At points where control flow merges (like after an if-else), a special phi function is used to select the correct value based on which path was taken.

This structure allows a more powerful technique called Global Value Numbering (GVN) to thrive. GVN assigns a unique "value number" to each distinct computation. In the example above, it would assign the same value number to a * b in both branches of the if. It would then see the phi function merging two different variables that hold the same value number. It can then deduce that the result after the if-else is the same regardless of which path was taken, and a single computation of a*b can be done before the branch.

This demonstrates a beautiful synergy: different optimizations work together. Copy Propagation can simplify code to make common subexpressions more obvious. GVN can identify equivalences that allow Loop-Invariant Code Motion (LICM) to hoist computations out of loops entirely. It's a pipeline of logical deduction, each step enabling the next.

Knowing When to Stop: The Boundaries of Optimization

Perhaps the deepest wisdom in science isn't knowing the rules, but knowing when the rules don't apply. For a compiler, CSE is not always safe or correct. The journey from a simple mathematical idea to a real-world implementation is fraught with subtle dangers.

The Specter of Pointers

In our simple examples, x and y were just symbols. But in a real computer, they are names for locations in memory. And pointers are variables that hold memory addresses. This introduces the terrifying problem of aliasing: two different pointers might be pointing to the same memory location.

Consider this code:

x = *p; // Read the value at the address p
*q = 100; // Write 100 to the address q
y = *p; // Read the value at address p again

Can we optimize this to y = x? It seems like we're reading *p twice. But what if p and q are aliases—what if they hold the same address? If they do, the write to *q in line 2 will have changed the value that *p points to. Reusing the old value x would be a catastrophic bug. Unless the compiler can prove that p and q can never point to the same memory, it must be paranoid. It must assume they might alias and perform the second read. This turns the compiler from a mere code-shuffler into a detective, reasoning about the possible states of memory.

The same problem occurs with function calls. A call to a function f(ref x) where x is passed by reference is a black box. The function f could do anything to x. After the call returns, the compiler must assume that any previous computations involving x are now invalid. The function call is a massive "kill" event for all of our available expressions.

The Treachery of Floating-Point Math

On paper, addition is associative: $(a+b)+c$ is always equal to $a+(b+c)$ . But on a computer, this isn't true for floating-point numbers! Due to the finite precision and rounding involved in representing numbers like $\pi$ or even $0.1$ , the order of operations can change the final result by a small, but sometimes critical, amount.

A compiler operating in a "strict" mode, for scientific or financial applications, must honor the parentheses the programmer wrote. It is forbidden from re-associating $a + (b + c)$ into $(a + b) + c$ just to expose a common subexpression. However, a programmer can give the compiler permission to play fast and loose by using a flag like -ffast-math. This is a contract: the programmer is telling the compiler, "I value speed more than strict IEEE-754 compliance, so you have my permission to assume math works like it does on paper."

The Ultimate Red Light: `volatile`

The final and most absolute boundary to optimization is the volatile keyword. Imagine your code is interacting with a piece of hardware, like a network card or a real-time clock, through a memory address. You might read from that address twice in a row:

time1 = *hardware_clock;
time2 = *hardware_clock;

To a naive compiler, this looks like a classic common subexpression. But eliminating the second read would be a disaster! The whole point is to read the clock at two different moments to measure an interval. The value at that memory location can be changed by something outside the program—the hardware itself.

The volatile keyword is a direct order to the compiler: "Hands off! The act of accessing this memory is an observable event. You must not optimize it away, you must not reorder it, and you must not merge it with other accesses." It tells the compiler that the "as-if" rule—that the program must behave as if it were executed as written—applies to the very act of reading and writing itself. This single keyword sends a ripple through the entire compiler pipeline, from front to back, constraining every optimization pass to respect this sacred boundary between the program and the outside world.

From a simple desire to not repeat ourselves, we have journeyed through the elegant structure of computation, the logic of data flow, the challenges of global reasoning, and the profound boundaries set by the physical reality of computers. Common Subexpression Elimination is far more than a simple trick for speed; it is a microcosm of the entire field of compiler design—a beautiful and intricate dance of logic, caution, and a deep understanding of what it truly means for a program to be correct.

Applications and Interdisciplinary Connections

After our journey through the principles of common subexpression elimination (CSE), you might be left with the impression that it's a clever, but perhaps niche, trick for compiler writers. Nothing could be further from the truth. This simple idea—of doing a piece of work only once and reusing the result—is one of the most pervasive and powerful principles in computing. Its echoes are found in the architecture of CPUs, the design of programming languages, the foundations of machine learning, and even in the cautionary tales of writing safe concurrent software. Let's explore this vast landscape and see how this one idea ties it all together.

The Compiler's Craft: Forging Faster Code

The most natural home for CSE is, of course, the compiler. When you write code, you express your intent. The compiler's job is to translate that intent into the fastest, most efficient sequence of machine instructions possible. A key part of this is acting as a tireless efficiency expert, hunting down and stamping out any form of redundant labor.

How does it find this redundancy? Imagine the compiler transforming your code into a kind of computational roadmap, a Directed Acyclic Graph (DAG), where each node is a simple calculation and the arrows show how the results flow from one calculation to the next. With this map in hand, the compiler can use a clever hashing technique—creating a unique "signature" for each calculation based on its operation and its inputs—to spot when two different paths on the map lead to the exact same computational junction. When it finds a duplicate, it simply reroutes the traffic through the first junction and demolishes the second one.

This might seem like a small saving, but its effects can be dramatic. Consider a calculation inside a loop that runs a million times. If an expression within that loop depends only on variables that don't change with each iteration, it's a loop-invariant common subexpression. A smart compiler, combining CSE with a technique called Loop-Invariant Code Motion, will hoist that calculation out of the loop entirely. Instead of performing the same multiplication a million times, the program computes it once and reuses the result. The cost of that part of the program plummets from being proportional to the number of iterations, $n$ , to a constant cost of one. It's the computational equivalent of reading a recipe's measurement once before baking a thousand cookies, rather than re-reading it for every single one.

The benefits extend deep into the hardware. Modern processors are like specialized factories with different assembly lines for different tasks. There might be a unit for arithmetic, another for fetching data, and yet another, the Address Generation Unit (AGU), dedicated to calculating memory addresses. If your code repeatedly accesses the same complex address, like base_register + offset, an unoptimized program will task the AGU with the same calculation over and over. By applying CSE, the compiler computes the address once, stores it in a register, and saves precious AGU cycles. Of course, there's no free lunch; this introduces a trade-off. Storing that result consumes a register, increasing "register pressure." If too many registers are in use, the compiler might be forced to temporarily spill a value to memory, incurring its own performance cost. This reveals a beautiful tension: CSE is a powerful tool, but its application is part of a delicate dance among many competing optimizations.

Perhaps most surprisingly, eliminating work can help us do more work at once. Modern CPUs can execute multiple instructions in parallel if those instructions don't depend on each other. CSE helps to expose this hidden parallelism. By identifying and eliminating a redundant computation, the compiler simplifies the code's data dependency graph, untangling dependencies that might have forced two operations to run sequentially. After CSE, those operations might now be independent, free to be dispatched to separate execution units and run concurrently. By doing less, the machine can achieve more.

From Spreadsheets to AI: A Unifying Principle

The beauty of CSE is that it's not just about arithmetic. It's a fundamental pattern that manifests in surprisingly high-level and diverse domains.

Think of a simple spreadsheet. If you have a formula in cell C1 that says $A1 + B1$ , and another in cell C2 that says $C1 * D1$ , you have an implicit dependency graph. The spreadsheet engine "knows" that to calculate C2, it must first have the result of C1. In essence, the very structure of the spreadsheet, by placing the result of $A1 + B1$ into an intermediate cell C1 for reuse, is a form of manual common subexpression elimination! When you change the value in A1, the engine doesn't re-evaluate everything; it intelligently follows the dependency graph, recalculating C1 and then C2, in a perfect topological order.

This principle of abstracting computation appears in even more sophisticated ways. In object-oriented programming, a "virtual call" allows a program to decide at runtime which version of a method to execute based on an object's actual type. How does this work? Under the hood, it often involves a series of type checks. A call like x.method() is expanded by the compiler into something like: "if the type of x is B, call B's method; else if the type of x is C, call C's method...". If you then call another virtual method on the same object x, this entire cascade of type tests is repeated. An astute compiler realizes that the expression typeof(x) is a pure common subexpression. By applying CSE, it can compute the type of x once, store it, and then replace all subsequent virtual calls on x with direct, non-virtual calls. This powerful optimization, known as devirtualization, is revealed to be just another application of common subexpression elimination.

The stage gets even bigger when we turn to modern Artificial Intelligence. Deep learning models are represented as vast computational graphs, where data flows through layers of operations. A single model might reuse the same block of operations (like a specific convolutional layer) on different data paths. Training and running these models is incredibly computationally expensive. The frameworks that power deep learning, like TensorFlow and PyTorch, are essentially sophisticated compilers for these graphs. They aggressively apply CSE to find these identical blocks and ensure they are computed only once during the "forward pass". And what about learning, the "backward pass" or backpropagation? The mathematics of the chain rule, which governs how gradients are calculated, beautifully handles this. When a node in the graph fans out to multiple consumers, the reverse-mode differentiation algorithm naturally sums the incoming gradients from all those paths. CSE optimizes the computation, and the calculus of gradients correctly accounts for it, ensuring the model learns properly.

A Word of Caution: The Perils of Concurrency

For all its power, a naive application of CSE can be disastrous. The optimizations we've discussed rely on a crucial assumption: that the world of the program is self-contained and follows a single, predictable thread of execution. When this assumption is broken, CSE can go from being an optimizer to a bug-creator.

Consider Peterson's Solution, a classic algorithm that allows two concurrent processes to share a resource without conflict. It relies on shared variables, like flags, that one process sets to signal its intentions to the other. Imagine process $P_i$ is in a busy-wait loop, repeatedly checking the flag of process $P_j$ : while (flag[j] == true) { /* wait */ }. From a single-threaded perspective, $P_i$ never writes to flag[j], so the expression flag[j] appears to be a common subexpression. An "enterprising" compiler might optimize this by reading flag[j] into a register once before the loop and then spinning on the register's value. But this is a catastrophic error. Process $P_j$ , running on another CPU core, might change flag[j] in main memory to false, signaling that $P_i$ can proceed. But $P_i$ will never see this change; it's stuck in an infinite loop, staring at its stale, cached copy of the flag. The system deadlocks.

This critical failure highlights the boundary of CSE's safe domain. It teaches us that variables shared between concurrent threads are special. They cannot be optimized away as if they were simple, local values. This insight led to the creation of language features like the volatile keyword in C/C++ or atomic types in modern languages. These are essentially explicit instructions to the compiler: "Suspend your usual assumptions. Do not apply CSE here. Every read of this variable must be a fresh read from memory, because the world might have changed behind your back."

The Grand View: A Universal Law of Efficiency

Ultimately, common subexpression elimination is a specific instance of a grand, machine-independent principle. It's a logical optimization on the structure of a computation, distinct from machine-dependent tricks that exploit the quirks of a specific processor, like using a Fused Multiply-Add (FMA) instruction.

At its heart, CSE is the computational embodiment of the "Don't Repeat Yourself" (DRY) principle that software engineers hold dear. It's a fundamental recognition that redundancy is a source of inefficiency. By finding and eliminating it, we create programs that are not only faster but also, in a way, more elegant. From the dependency graph in a spreadsheet to the intricate dance of compiler optimizations and the vast networks of an AI model, the simple, beautiful idea of computing something once and reusing the result is a cornerstone of modern performance. It is a testament to how a single, clear principle can ripple through every layer of computer science, unifying disparate fields in a shared quest for efficiency.

Common Subexpression Elimination

Introduction

Principles and Mechanisms

Seeing the Structure of a Calculation

The Rules of the Game: Generate and Kill

The Dangers of a Local View

Knowing When to Stop: The Boundaries of Optimization

The Specter of Pointers

The Treachery of Floating-Point Math

The Ultimate Red Light: volatile

Applications and Interdisciplinary Connections

The Compiler's Craft: Forging Faster Code

From Spreadsheets to AI: A Unifying Principle

A Word of Caution: The Perils of Concurrency

The Grand View: A Universal Law of Efficiency

Common Subexpression Elimination

Introduction

Principles and Mechanisms

Seeing the Structure of a Calculation

The Rules of the Game: Generate and Kill

The Dangers of a Local View

Knowing When to Stop: The Boundaries of Optimization

The Specter of Pointers

The Treachery of Floating-Point Math

The Ultimate Red Light: volatile

Applications and Interdisciplinary Connections

The Compiler's Craft: Forging Faster Code

From Spreadsheets to AI: A Unifying Principle

A Word of Caution: The Perils of Concurrency

The Grand View: A Universal Law of Efficiency

The Ultimate Red Light: `volatile`

The Ultimate Red Light: `volatile`