Dynamic Relocation

SciencePedia

Key Takeaways

Dynamic relocation separates a program's logical addresses from physical memory addresses using a base register, allowing the OS to move programs freely.
Position-Independent Code (PIC) uses relative addressing so that a single copy of a shared library's code can be used by multiple processes, saving memory.
The Global Offset Table (GOT) and Procedure Linkage Table (PLT) create a layer of indirection that allows PIC to call external functions at runtime.
Dynamic relocation mechanisms create critical trade-offs between system performance (e.g., lazy binding) and security (e.g., Relocation Read-Only).

Introduction

In modern computing, the ability to run multiple programs simultaneously is something we take for granted. However, this complex dance of processes would be impossible without a clever solution to a fundamental problem: where should a program live in memory? Early systems hardcoded fixed physical addresses into programs, a rigid approach that made it impossible to move code or efficiently share resources. This article tackles the elegant solution to this dilemma: dynamic relocation. It explores the foundational concepts that allow programs to be placed, and even moved, anywhere in memory after they have started running. The first chapter, "Principles and Mechanisms," will demystify the core mechanics, from the hardware's role in translating addresses to the software conventions like Position-Independent Code that enable true flexibility. Following this, the "Applications and Interdisciplinary Connections" chapter will reveal the profound impact of these mechanisms on software architecture, system performance, and the ongoing battle for cybersecurity.

Principles and Mechanisms

Imagine you’ve written a story. Before anyone can read it, you must decide where on a vast, planetary library shelf it will reside. If you carve its permanent, absolute shelf location into the first page—"Section A, Row 3, Shelf 5"—you’ve created a serious problem. What if another book is already there? What if the library reorganizes? Your story becomes unreadable unless it’s in that exact spot. This, in essence, is the dilemma that early computer programs faced.

The Tyranny of a Fixed Address

In the early days of computing, compilers and linkers acted like that rigid author. They would take a program and generate machine code with hardcoded, absolute physical addresses. This process, known as compile-time or load-time binding, essentially decided that the program would live at, say, memory address 0x10000. Every instruction to fetch data and every jump to another part of the code was written with this assumption baked in.

The limitations are immediately obvious. You can't run two such programs at once unless they were specifically compiled for different, non-overlapping memory regions. And what happens if your program needs to grow at runtime—perhaps by loading a new plugin? If the memory right next to your program is already occupied, you're stuck. The program can't be moved, because all its internal references would become invalid, pointing to their old, now-incorrect locations. A pointer that mistakenly stores a fixed physical address, say 7096, becomes completely meaningless if the operating system decides to move the program to a new location where that physical address corresponds to something else entirely. This static approach is fragile and hopelessly inefficient for a modern multitasking operating system.

A Magical Detour: Logical vs. Physical Addresses

To escape this tyranny, we need a way to separate a program’s internal view of its own structure from its external placement in the machine's physical memory. The solution is one of the most beautiful and foundational concepts in computer systems: dynamic relocation through a hardware intermediary, the Memory Management Unit (MMU).

Think of your program not as a story with a fixed shelf location, but as a book with page numbers. An instruction inside the program no longer says "go to physical memory location 0x81000". Instead, it says "go to byte offset 4096 from the start of my code". This is a logical address. It's a relative location, like "page 50" of a book.

The MMU then performs a simple, yet magical, translation for every single memory access. The operating system tells the MMU two things: a base address ( $b$ ), which is the physical starting location of the program, and a limit ( $l$ ), which is the program's size. When the CPU requests logical address $a$ , the MMU first checks if the access is valid ( $0 \le a l$ ). If it is, it computes the physical address $p$ with breathtaking simplicity:

$p = b + a$

This is execution-time binding. The program speaks in logical offsets, and the hardware translates them to physical reality on the fly. The beauty of this is that the operating system can place the program anywhere it wants in physical memory simply by setting the base register $b$ . It can even stop the program, copy its entire contents to a new location in memory, update the base register to the new address, and resume the program. The program itself would be none the wiser! All its internal, logical addresses remain perfectly valid. This solves the problem of a program needing to grow; if it runs out of adjacent space, the OS can move it to a larger, contiguous block elsewhere in memory.

Teaching Code to Be Nomadic: Position-Independence

For the MMU's magic to work, the program's code must be written in a special, "nomadic" style. It must never refer to an absolute location, only to relative ones. This is the art of Position-Independent Code (PIC).

How does a compiler generate code that is blissfully unaware of its own location? It uses a few clever tricks. To access data, instead of hardcoding an address like load from 0x20080, the compiler can generate code that first loads the base address of the data segment into a register, and then accesses data at fixed offsets from that register.

For jumps and function calls within the same code module, modern architectures like x86-64 provide an even more elegant solution: instruction-pointer-relative addressing. An instruction can be encoded to mean "jump to the location 120 bytes forward from the next instruction". Because the distance between two points within the code is constant, this relative jump is valid no matter where the entire block of code is loaded in memory. The code's internal geometry is preserved.

The Global Village: Contacting the Outside World

This works wonderfully as long as the program is a self-contained island. But modern software is a bustling global village. Your program needs to call functions from shared libraries—common collections of code like the standard C library, which contains functions like printf. The operating system loads these libraries into memory to be shared by hundreds of running processes.

Here, we hit a wall. Your code has no idea where the C library will be loaded. Its location changes between different programs and is even randomized every time you run the same program, a security feature known as Address Space Layout Randomization (ASLR). The distance from your code to printf is unknown and unpredictable at compile time.

The solution is another layer of indirection, a masterpiece of software engineering: the Global Offset Table (GOT) and the Procedure Linkage Table (PLT).

The Global Offset Table (GOT): Imagine a small address book that is part of your program's writable data. For every external function or variable your program uses, the linker creates an entry in this address book. Initially, this entry is blank. When the program is loaded, the operating system's dynamic loader finds the actual runtime address of, say, printf, and writes that address into the corresponding GOT entry.
The Procedure Linkage Table (PLT): Now, when your compiler sees a call to printf, it doesn't try to generate a jump to some unknown location. Instead, it generates a position-independent, PC-relative jump to a tiny code stub within your own program—an entry in the PLT. This stub's sole purpose is to perform an indirect jump to the address stored in the GOT's entry for printf.

The chain of events is: Your Code → (PC-relative jump) → PLT Stub → (indirect jump using address from GOT) → printf.

The only part of this chain that requires an absolute address is the GOT entry, and filling that in is the dynamic loader's job. All the compiled code in your program's text segment remains purely position-independent.

Why this elaborate dance of tables and stubs? The payoff is monumental: memory sharing.

Consider a 2-megabyte shared library used by 100 different processes. Without PIC, if each process needed to have addresses patched directly into its code, each would require its own private, modified copy. This would consume $100 \times 2 = 200$ MB of physical RAM.

With PIC, the library's code (its text segment) is pristine and identical for every process. The operating system can load just one physical copy of the 2 MB text segment into memory and safely map it into the virtual address space of all 100 processes. The only parts that need to be private are the data segments containing the GOT for each process. When the loader writes a process-specific address into a GOT page, the copy-on-write (COW) mechanism automatically creates a private copy of that single 4-kilobyte page for that process [@problem_id:3636956, @problem_id:3658285].

So instead of duplicating 200 MB of code, we only duplicate a few kilobytes of data for each process. This is the economic miracle that makes modern operating systems feasible. The code remains shared, while only the data that must be unique becomes private.

Procrastination as a Virtue: The "When" of Binding

There is one final detail: when should the dynamic loader resolve all these external symbols and fill the GOT?

One strategy is immediate binding. When the program starts, the loader finds the addresses for all external symbols the program might ever use and fills the entire GOT before the program's first instruction runs. This increases startup time, which can be significant for large applications. However, it allows for a security enhancement called Full RELRO, where the GOT can be made read-only after being filled, preventing certain types of attacks.

The more common strategy is lazy binding. Don't do the work until you have to. Initially, the PLT stubs don't point to the GOT entry but to a resolver routine in the dynamic loader itself. The very first time your program calls printf, it gets routed to the resolver. The resolver finds printf's address, "patches" the GOT entry for future use, and then jumps to printf. Every subsequent call is fast, using the now-filled GOT entry directly. This lazy approach speeds up program startup, at the cost of a tiny, one-time penalty for the first call to each external function. This is the default on most systems, a triumph of "just-in-time" work, and is fully thread-safe [@problem_id:3656387, @problem_id:3669340].

From the simple problem of placing a program in memory, we have journeyed through layers of hardware and software abstraction to a system of remarkable elegance and efficiency—a system that allows countless programs to coexist and share resources, moving and adapting dynamically to the needs of the moment.

Applications and Interdisciplinary Connections

Having journeyed through the intricate machinery of dynamic relocation—the dance of the Procedure Linkage Table (PLT), the Global Offset Table (GOT), and Position-Independent Code (PIC)—one might be tempted to file it away as a clever but esoteric piece of system plumbing. But to do so would be to miss the forest for the trees. This mechanism is not merely an implementation detail; it is a foundational principle whose consequences ripple through the entire landscape of modern computing, shaping everything from the sleek efficiency of our operating systems to the fortified walls of our digital security. Now that we understand how it works, let's explore the far more exciting questions of why it matters and where its influence is felt.

The Architecture of Modern Software

At its most immediate, dynamic relocation is the magic that makes shared libraries possible. In the early days of computing, every program was a self-contained monolith, carrying with it a copy of every function it needed. This was like every author having to bind a personal copy of the entire dictionary into the back of their own book. Dynamic linking introduced a revolutionary idea: what if all programs could refer to a single, central dictionary?

This simple shift has profound consequences. It means our applications can be smaller and more efficient. An engineer looking to reduce the footprint of an executable can use a tool like strip to remove bulky debugging information and the static symbol table (.symtab), knowing that the program's runtime integrity is preserved by the lean and essential dynamic symbol table (.dynsym). The dynamic loader only needs this minimal "contract" to wire everything together at runtime. Furthermore, updating the "central dictionary"—say, to patch a bug in a system library—instantly benefits every program that uses it, without requiring each one to be recompiled.

You might think that this is just a story about one particular operating system, but the beauty of this concept is its universality. While the specific incantations and file formats may differ, the core drama is the same across platforms. On Linux, the loader reads Program Header segments from an Executable and Linkable Format (ELF) file. On Windows, it maps sections from a Portable Executable (PE) file. On macOS, the dyld linker processes segments in a Mach-O file. Yet, in each case, the loader performs the same fundamental acts: it maps the code into memory, it adjusts internal pointers to account for the actual load address (a process called rebasing or sliding), and it resolves references to external symbols (binding). The underlying principle of deferred address binding is a beautiful example of convergent evolution in the world of software engineering.

Performance: A Tale of Trade-offs

This elegant architecture, however, is not without its costs. The work of relocation takes time and resources, and understanding this trade-off is at the heart of system design and performance engineering.

Consider the world of embedded systems—the tiny computers in our cars, appliances, and industrial controllers. Here, resources like Flash memory and boot time are precious. An engineer might face a critical choice: should the software modules be linked statically, duplicating the library code for each module but ensuring instant readiness? Or should they be linked dynamically, saving precious memory by sharing the library, but incurring a boot-time cost to perform relocations? By modeling the memory footprint and the CPU cycles spent on relocation, one can make a precise, quantitative decision tailored to the device's constraints. In one scenario, dynamic linking might save dozens of kilobytes of vital space; in another, the boot-time delay might be unacceptable for a critical system.

This performance puzzle isn't confined to tiny devices. It scales all the way up to the boot sequence of a desktop or server operating system. The very first user-space program the kernel runs, often called init, is the ancestor of all other processes. Making init dynamically linked means the loader must be invoked, libraries must be loaded from disk, and thousands of relocations and symbol lookups must be performed—all while the user is waiting for the system to start. A detailed model reveals the costs: extra file open latencies, I/O for reading more files, and the CPU overhead of page faults and relocation processing. The choice between a lean, static init and a more flexible, dynamic one has a measurable impact on how fast your computer comes to life.

But the story of performance is not just one of costs. Astoundingly, the very machinery of dynamic relocation can be repurposed for performance gains. The GNU C Library, for example, uses a clever technique called Indirect Functions (IFUNC). Instead of a function symbol pointing directly to code, it can point to a small "resolver" function. At load time, the dynamic loader runs this resolver. The resolver's job is to inspect the CPU it's running on and select the best, most optimized implementation of the function from a set of variants—one for a CPU with AVX2 instructions, another for a CPU with SSE4, and a fallback for older processors. The loader then patches the GOT entry to point directly to this chosen implementation. Here, the dynamic loader becomes an active agent in a late-stage optimization process, ensuring that the code is perfectly tailored to the hardware it's running on. This can lead to significant net performance wins, where the one-time cost of the resolver is dwarfed by the cumulative benefit of running faster code over millions of calls.

This theme of repurposing linking principles for performance even appears in the sophisticated world of dynamic language runtimes for languages like Python, JavaScript, or Ruby. To speed up method calls, these runtimes use a technique called "Inline Caching" (IC), where the result of a method lookup is cached directly at the call site. But what happens if the underlying methods are in a shared library that gets updated via dynamic linking? The cached address would become invalid! The solution is beautifully analogous to the PLT/GOT: instead of caching the volatile absolute address, the runtime can cache a pointer to a stable, indirect slot. This adds a tiny, predictable overhead—the cost of one extra memory dereference—but buys the crucial safety of being "relocation-aware".

The Never-Ending Game: Security and Reverse Engineering

Nowhere are the ripples of dynamic relocation felt more strongly than in the perpetual cat-and-mouse game of software security. The mechanisms that give our software its flexibility also create its most critical vulnerabilities and, in turn, its most powerful defenses.

From the perspective of a security analyst or a reverse engineer, the PLT and GOT are a trail of breadcrumbs. When faced with an unknown binary, a key first step is to understand its interactions with the outside world. By inspecting the binary's relocation entries and tracing calls through their PLT stubs, an analyst can reconstruct a map of which external library functions are being used. This provides invaluable clues about the program's intent, whether it's for legitimate debugging or for uncovering the secrets of a piece of malware.

Of course, the predictability that aids the analyst also aids the attacker. If an attacker knows that the printf function from the C library will always be loaded at a specific, fixed address, they can craft exploits that reliably jump to that address. This is where Address Space Layout Randomization (ASLR) enters the stage. ASLR is a defense that "shuffles the deck" on every execution, placing the executable, libraries, and stack at random addresses. This makes it incredibly difficult for an attacker to guess the location of their target code. It's important to realize that ASLR is only fully effective because of position-independent code (PIE)—the very same technology that enables dynamic relocation. The ability to be loaded anywhere is what allows the "anywhere" to be random. Disabling ASLR, a common practice for performance testing to ensure reproducible results, instantly makes a system more vulnerable by making addresses predictable again, even if other defenses like No-eXecute (NX) memory are active.

But attackers are clever. If they can't predict where a function is, perhaps they can corrupt the pointers that lead to it. The Global Offset Table is, fundamentally, a table of function pointers sitting in writable memory. This makes it a juicy target. In a classic "GOT overwrite" attack, an attacker with a memory corruption vulnerability doesn't inject their own code; instead, they overwrite a GOT entry—say, the one for printf—to point to their own malicious payload. The next time the program innocently calls printf through its PLT stub, it is unknowingly redirected to the attacker's code.

This attack led to the defender's counter-move: Relocation Read-Only (RELRO). With full RELRO, the dynamic loader resolves all symbols at the very beginning of the program's execution (a process called "eager binding") and then uses system calls to mark the memory pages containing the GOT as read-only. The door is slammed shut; the pointers are locked in. This, however, comes at the cost of sacrificing "lazy binding," where symbols are resolved on-demand at their first use, a small performance optimization. This interplay—the security of full RELRO versus the performance of lazy binding, which requires a writable GOT—perfectly encapsulates the delicate balance between safety, performance, and flexibility that defines modern system design.

From making our executables smaller to enabling high-performance function dispatching and forming the central battlefield for cybersecurity, dynamic relocation proves to be far more than a simple linking mechanism. It is a unifying concept, a single elegant idea whose tendrils reach into nearly every corner of software engineering. It reminds us that in the intricate world of computing, the most profound ideas are often those that connect disparate fields, revealing a beautiful, underlying simplicity.

Dynamic Relocation

Introduction

Principles and Mechanisms

The Tyranny of a Fixed Address

A Magical Detour: Logical vs. Physical Addresses

Teaching Code to Be Nomadic: Position-Independence

The Global Village: Contacting the Outside World

The Great Payoff: The Economics of Sharing

Procrastination as a Virtue: The "When" of Binding

Applications and Interdisciplinary Connections

The Architecture of Modern Software

Performance: A Tale of Trade-offs

The Never-Ending Game: Security and Reverse Engineering

Dynamic Relocation

Introduction

Principles and Mechanisms

The Tyranny of a Fixed Address

A Magical Detour: Logical vs. Physical Addresses

Teaching Code to Be Nomadic: Position-Independence

The Global Village: Contacting the Outside World

The Great Payoff: The Economics of Sharing

Procrastination as a Virtue: The "When" of Binding

Applications and Interdisciplinary Connections

The Architecture of Modern Software

Performance: A Tale of Trade-offs

The Never-Ending Game: Security and Reverse Engineering