Memory Compaction

SciencePedia

Key Takeaways

Memory compaction solves external fragmentation by relocating allocated memory blocks to consolidate scattered free space into a single, large, contiguous block.
The process is computationally expensive and typically requires a "stop-the-world" pause, creating a trade-off between memory efficiency and system responsiveness.
Despite the prevalence of paging, which eliminates external fragmentation for most processes, compaction remains essential for creating physically contiguous memory blocks required by hardware (DMA) and for performance features like huge pages.
In managed runtimes like Java or C#, compaction is a critical part of garbage collection that ensures new objects can be allocated even when memory is highly fragmented.
Advanced compaction algorithms can actively sculpt memory layouts to improve hardware cache performance and are essential for creating high-performance huge pages.

Introduction

In the complex world of computer memory management, a perplexing paradox often arises: a system can report ample free memory, yet fail to allocate a new block of space. This inefficiency stems from external fragmentation, where available memory is scattered in small, unusable pockets. This article tackles the classic solution to this problem: memory compaction. It demystifies the process of tidying up memory to reclaim its utility. In the upcoming sections, you will first explore the foundational Principles and Mechanisms of memory compaction, understanding how it works, its inherent costs, and how it compares to alternative strategies like paging. Following that, the discussion will broaden in Applications and Interdisciplinary Connections, revealing how this fundamental technique is applied in modern runtimes, used to optimize hardware performance, and even connects to fields like security and theoretical computer science.

Principles and Mechanisms

Imagine you have a large, beautiful bookshelf. Over time, you lend out books, buy new ones, and rearrange them. Soon, gaps of various sizes appear all over the shelves. One day, a magnificent encyclopedia set arrives. You measure the total empty space on your shelves and find you have more than enough room. But when you try to place the encyclopedia, you discover a frustrating problem: no single gap is large enough to hold it. The free space is there, but it's broken into a hundred useless little pockets. This, in essence, is one of the most fundamental challenges in memory management: external fragmentation.

The Inefficiency of Gaps: External Fragmentation

In a computer's memory, processes are like books of varying sizes. When a system uses a simple strategy of giving each process a single, contiguous block of memory, it inevitably faces the bookshelf problem. As processes are loaded and unloaded, the memory landscape becomes a patchwork of allocated blocks and free "holes."

Let's put some numbers on this. Suppose our memory has $1024$ units of space. After some time, it might look like this: a process of size $128$ , a hole of size $64$ , another process of size $256$ , a hole of size $128$ , and so on. If we add up all the holes, we might find we have a total of $416$ units of free memory. Now, a new process arrives, requesting a contiguous block of $200$ units. We have more than double the required space in total, yet the request fails. Why? Because the largest single hole is only $128$ units. The available memory is fragmented, rendered useless by its non-contiguous nature. This is not a failure of a specific placement strategy like "first-fit" or "best-fit"; it is a fundamental consequence of demanding contiguous space in a dynamic environment.

The Librarian's Gambit: Compaction

What would any sensible person do with that messy bookshelf? They would slide all the books to one end, consolidating all the small, scattered gaps into one large, continuous empty space. This wonderfully intuitive solution has a name in computer science: memory compaction.

The principle is as simple as it is powerful. The operating system, playing the role of the tidy librarian, relocates all the allocated blocks of memory to one end of the physical address space. All the free holes, which were once trapped between processes, are squeezed out and merge into a single, contiguous free block at the other end. The beauty of this is that no memory is created or destroyed. The total amount of free memory remains exactly the same, but its utility is restored. If we had $300$ mebibytes (MiB) of fragmented free space before, with the largest hole being only $150\,\mathrm{MiB}$ , after compaction we will have a single, beautiful, and useful hole of exactly $300\,\mathrm{MiB}$ , ready to accommodate a large process.

It's crucial to understand what compaction does not do. It is not a magical cleaning service that identifies and discards unused memory. If a block of memory is "leaked"—that is, the program has lost track of it but has not told the operating system it is finished with it—the operating system sees it as just another allocated block. Compaction will dutifully move the leaked block along with all the useful ones, preserving the waste but making the surrounding free space more organized. Compaction tidies the house; it doesn't take out the trash.

The Cost of a Tidy House

This tidying up, however, is not without cost. It's real work. The operating system must physically copy every byte of every allocated process from its old location to its new one. Then, it must meticulously update its internal records—the base and limit registers or other mapping structures—so that each process can continue running, blissfully unaware that its physical home has moved.

The computational cost of this operation is significant. To compact memory, the system must typically scan the entire address space and move a substantial fraction of its contents. This is very similar to defragmenting a hard disk. The total time taken is proportional to the size of the memory itself, an operation of linear complexity, often denoted as $O(N)$ where $N$ is the total memory size.

But the true cost, from a user's perspective, isn't just the CPU cycles burned. It is the pause. In its simplest form, compaction is a "stop-the-world" event. The operating system freezes all running applications, performs the great shuffle, updates its maps, and then unfreezes everything. For the duration of this pause, the system is unresponsive. A pause of a few hundred milliseconds might be imperceptible, but for a large memory space with high-speed bandwidth, moving gigabytes of data can take a noticeable amount of time. An interactive application, a game, or a video call would freeze solid. This introduces a deep tension in system design: the trade-off between the long-term throughput benefits of having unfragmented memory and the short-term latency costs of the pauses required to achieve it.

Can We Avoid the Mess? The Elegance of Paging

This leads a clever designer to ask a more profound question: instead of constantly tidying up a messy system, could we design a system that doesn't get messy in this way to begin with? The answer is a resounding yes, and the idea is one of the cornerstones of modern operating systems: paging.

Paging abandons the rigid requirement that a process must occupy a single contiguous block of physical memory. Instead, it breaks physical memory into small, fixed-size blocks called frames and a process's logical address space into identically sized blocks called pages. The operating system maintains a "map," the page table, for each process. This table is like a set of index cards, where each card tells you which physical frame holds a particular page of the process.

With this scheme, the pages of a single process can be scattered all across physical memory, wherever free frames are available. The encyclopedia set can be split into its individual volumes, and each volume can be placed in any available slot on any shelf. The page table acts as the encyclopedia's table of contents, telling you exactly where to find the chapter on "Aardvark" or "Zebra."

In a paged system, external fragmentation is eliminated. A request for $5\,\mathrm{MiB}$ of memory is simply a request for $1280$ pages of $4\,\mathrm{KiB}$ each. As long as the system can find $1280$ free frames anywhere in memory, the request succeeds. The notion of compacting memory to make space for a process becomes moot, a solution to a problem that no longer exists.

The Ghost of Contiguity: Why Compaction Endures

So, has paging rendered compaction obsolete? It would seem so. But nature, and computer architecture, are full of surprising twists. It turns out that even in the most sophisticated modern operating systems, the ghost of contiguity lives on.

While a process's own memory can be scattered to the winds, other parts of the system are not so flexible. Certain high-performance hardware devices, particularly for Direct Memory Access (DMA), are built with a simpler worldview. They are designed to read or write large streams of data directly to or from memory, bypassing the CPU. To do this efficiently, they often require that the memory buffer they operate on be physically contiguous. These devices don't know how to use the processor's fancy page tables to jump from one frame to another.

Suddenly, our old problem is back, but in a new disguise. The operating system might need to allocate a large, physically contiguous buffer for a network card or a graphics processor. Even in a paged system, the available physical frames can become fragmented. If no sufficiently large contiguous block of free frames exists, the operating system may have no choice but to perform memory compaction—shuffling occupied frames around—to create one. Compaction, it seems, is a fundamental tool that we can't quite throw away.

Tidying Up Without Pausing the World

Given that we are stuck with compaction, can we at least mitigate its most painful side effect—the "stop-the-world" pause? Again, cleverness comes to the rescue. Instead of moving the entire world at once, we can do it incrementally. The OS can move a few pages, let the user's applications run for a bit, then move a few more.

This presents its own puzzle: what happens if a process tries to write to a page precisely when the OS is in the middle of moving it? The solution is an elegant dance known as Copy-On-Write (COW) compaction. When the OS decides to move a page, it first allocates a new destination frame. It then copies the content of the old page to the new one. Only after the copy is complete does it atomically update the process's page table to point to the new location. The final step is to free the old frame.

This method ensures correctness. More importantly, it dramatically reduces the overhead. Instead of needing to duplicate an entire multi-megabyte process region in memory to move it, the system only needs a small amount of temporary extra memory for the pages that are "in flight"—those that have been copied but whose old versions have not yet been freed. This overhead is determined not by the size of the process, but by the depth of the copy pipeline, which can be kept small. It's a beautiful piece of engineering that allows the system to tidy itself up in the background without bringing everything to a screeching halt.

A Question of Identity: Compaction and Garbage Collection

Finally, it is essential to distinguish compaction from its close cousin, garbage collection (GC). While the two often appear together, they perform different roles. Garbage collection is the process of identifying which objects in memory are still in use (live) and which are not (garbage). Compaction is simply the act of moving things.

Many garbage collectors use a mark-compact algorithm. The "mark" phase traverses the web of pointers in a program, starting from its roots (like the CPU registers and call stack), to find all reachable, live objects. The "compact" phase then moves all these marked objects together, achieving the consolidation we've discussed.

Here, a fascinating subtlety arises. Some collectors are "conservative"; if they see a bit pattern on the stack that looks like a memory address, they assume it's a pointer and mark the corresponding object as live, just to be safe. This can lead to "false positives," where a dead object is accidentally kept alive because a random integer on the stack happened to have the same value as its address. The compactor, doing its job, will then dutifully move this dead-but-marked object along with the truly live ones, leading to a form of memory waste called over-retention.

This illustrates the clear separation of concerns. Compaction's job is not to ask "what is this?". Its job is simply to move what it's told to move. It is a powerful but narrowly focused tool, a fundamental mechanism in the grand, intricate, and beautiful machine that is a modern operating system.

Applications and Interdisciplinary Connections

Having understood the principles of memory compaction—the elegant act of sliding allocated blocks of memory together to eliminate the wasteful gaps of external fragmentation—we might be tempted to see it as a simple, mechanical housekeeping task. But to do so would be like looking at a master sculptor's chisel and seeing only a piece of metal. In reality, memory compaction is a fundamental tool whose application reveals deep connections across the landscape of computer science and engineering. It is not merely a janitorial routine; it is a dynamic strategy for shaping memory to serve higher purposes, from raw performance to robust security.

Let us now journey beyond the "how" and explore the "why" and "where" of memory compaction. We will see it not as an isolated mechanism, but as a concept that solves practical problems, forces difficult but fascinating trade-offs, and interacts in surprising ways with nearly every layer of a modern computing system.

The Engine of Modern Runtimes

If you have ever written code in a modern language like Java, C#, Python, or JavaScript, you have benefited from memory compaction, likely without even knowing it. These languages operate within a "managed runtime," an environment that automatically handles memory for the programmer. A key feature of this environment is the garbage collector (GC), which periodically identifies and reclaims memory that is no longer in use.

But what happens when the runtime needs to allocate a new object, and all the free memory is scattered in small, useless fragments? A simple garbage collector might free up a lot of memory in total, but if no single free chunk is large enough for the new request, the allocation fails. The application might crash, all because the free space wasn't organized. This is where compaction becomes the hero of the story. High-performance garbage collectors often combine their cleanup phase with a compaction phase. If an initial attempt to find space for a new object fails, the system doesn't give up. Instead, it can trigger a full garbage collection and compaction cycle. All live, useful objects are identified and slid to one end of the heap, merging all the fragmented pockets of free space into one large, contiguous block. The system then retries the allocation, which now almost certainly succeeds. This powerful "fail, compact, and retry" strategy is a cornerstone of robust application environments, ensuring they can run for long periods without succumbing to memory fragmentation.

A Question of Efficiency: The Art of the Trade-Off

Why go to all the trouble of moving memory around? Why not use a different allocation strategy, like a "free list," which keeps a detailed directory of every free block? This is not just a technical question; it's a profound engineering trade-off that pits space against time and complexity.

A free-list allocator avoids the cost of moving data, but it pays a price in space. Its "directory" of free blocks—the metadata—consumes memory itself. As the heap grows, so does the metadata. Furthermore, free-list allocators are still susceptible to fragmentation; even if they can find a block, it might be slightly larger than requested, leading to wasted slivers of internal fragmentation.

A compacting collector, on the other hand, eliminates fragmentation entirely. Its primary space overhead is not complex metadata, but the need for "breathing room" to perform the compaction. For example, a simple "semi-space" collector requires doubling the memory footprint, reserving a whole empty space to copy live objects into. More sophisticated sliding compactors need less, but some reserve capacity is always required.

So, which is better? The answer lies in a beautiful mathematical balancing act. We can model the memory costs of both approaches. For the free-list, the cost is the live data $L$ , plus fragmentation overhead (let's say a fraction $\tau$ of $L$ ), plus metadata overhead (a fraction $\phi$ of the total heap). For the compacting collector, the cost is simply $(1+\kappa)L$ , where $\kappa$ is the fractional reserve space needed. By setting these costs equal, we can solve for the "break-even" point. For instance, one can derive the break-even metadata fraction $\phi^{\ast}$ as a function of the fragmentation factor $\tau$ and the GC's reserve factor $\kappa$ . This kind of analysis allows system designers to make informed, quantitative decisions, choosing the right strategy for the right workload, armed with the power of simple mathematical models.

Sculpting Memory for Peak Performance

Perhaps the most exciting applications of compaction are not just about reclaiming waste, but about actively sculpting the memory layout to unlock higher performance from the underlying hardware. The physical arrangement of data in memory is not arbitrary; it has a direct impact on speed.

A wonderful example of this is the use of "huge pages" (or superpages) in modern operating systems. Computer processors use a cache called the Translation Lookaside Buffer (TLB) to speed up the translation of virtual to physical memory addresses. If an application uses many small, standard-sized pages (e.g., $4\,\mathrm{KiB}$ ), it can quickly overwhelm the TLB, leading to slow "misses." Huge pages, which can be $2\,\mathrm{MiB}$ or even larger, allow a single TLB entry to cover a much larger memory region, drastically improving performance for applications that use large amounts of memory. The catch? The operating system must find a large, physically contiguous block of memory to create a huge page. In a fragmented system, such blocks are rare. Compaction comes to the rescue. The OS can intelligently trigger a compaction routine to evacuate occupied base pages from a target region, stitching together a contiguous free block large enough to be promoted into a high-performance huge page. This is a proactive use of compaction: not just cleaning up, but building a foundation for speed.

The story gets even more subtle. Modern CPUs have multiple levels of cache, and the way data is mapped into these caches often depends on its physical address. If an application's frequently used data pages all happen to map to the same few cache sets, they will constantly evict each other, leading to "conflict misses" and poor performance. The technique of "page coloring" tries to prevent this by allocating a process's physical pages across a diverse set of "colors," where each color corresponds to a group of cache sets. A naive compaction algorithm, by packing all of an application's pages together, could inadvertently destroy a carefully balanced color distribution, clustering all pages into just a few colors and hurting cache performance. A sophisticated, "color-aware" compactor, however, can do the opposite. It can be designed to not only coalesce free space but to simultaneously re-balance the color distribution of a process's pages, minimizing both fragmentation and cache conflicts. This is memory management elevated to an art form.

The Unseen Ripple Effects: Compaction in the OS Kernel

Implementing compaction deep inside an operating system kernel is a delicate and complex surgery. Moving a page of physical memory is far more involved than a simple memcpy. The OS maintains intricate maps—page tables—that record which virtual page corresponds to which physical page frame. When a page is moved from physical frame $p$ to frame $q$ , all references to $p$ must be meticulously updated to point to $q$ .

In systems with advanced structures like Inverted Page Tables (IPTs), where there is one entry per physical frame, this means updating the IPT entry at index $q$ with the page's identity and clearing the entry at index $p$ . Furthermore, the processor's TLB, which caches these mappings for speed, must be notified that its cached mapping is now stale and must be invalidated. Failing to do so would cause the CPU to access the old, now-empty physical location, leading to catastrophic data corruption. Any attempt to avoid these updates, for example by introducing an extra layer of software indirection, would only add overhead to the critical path of every single memory access and fail to solve the fundamental problem of keeping hardware and software coherent.

The complexity doesn't end there. Not all memory is created equal. Some memory is "pinned" or unmovable. For instance, memory buffers used for Direct Memory Access (DMA) by hardware devices must remain at a fixed physical address while the device is using them. Similarly, core kernel data structures managed by specialized allocators like the slab allocator are often considered unmovable to avoid complexity. These unmovable pages act as immovable boulders in our memory landscape. They create barriers that a compaction routine cannot cross, fundamentally limiting the size of the contiguous free block that can be formed. These unmovable regions are, in themselves, a source of external fragmentation that even compaction cannot fully cure, demonstrating the intricate dependencies between different memory management subsystems in the kernel.

Beyond Performance: New Frontiers and Connections

The influence of memory compaction extends far beyond its traditional role in performance tuning. As computer systems evolve, compaction has found itself at the crossroads of security, reliability, and even the physics of hardware itself.

Consider a system that employs full memory encryption, a critical feature for protecting data in the cloud. In many advanced schemes, the encryption key or "tweak" is tied to the physical address of the data. This provides powerful protection against certain attacks. But it has a startling consequence for compaction: when a block of memory is moved, its physical address changes, so it must be decrypted with the old tweak and re-encrypted with the new one. Suddenly, our simple memory copy operation now carries a significant cryptographic workload, increasing the time and energy cost of compaction.

A similar issue arises in high-reliability systems that use checksums to guarantee data integrity. If a block of data is moved during compaction, its checksum must be recomputed and verified. This "verification time" adds to the total time the system is paused for housekeeping, directly reducing its overall availability—a critical metric for servers and infrastructure.

The connection to the physical world becomes even more apparent with the rise of Non-Volatile Memory (NVM) technologies, which retain data without power but have a limited number of write cycles. Every time compaction moves a segment of data, it performs writes to the NVM device, contributing to its wear and tear. System designers must now budget their compaction operations, calculating the maximum number of compaction passes that can be performed over the device's lifetime before it wears out. What was once a purely logical algorithm now has a tangible, physical lifespan.

A Theoretical Interlude: How Good Is Our Strategy?

So far, we have discussed compaction in a practical, engineering context. But we can also step back and ask a more fundamental, mathematical question: How good is our compaction strategy, really? The simple strategy of "compact everything when you get stuck" seems reasonable, but is it optimal?

This is a question for the field of online algorithms. An "online" algorithm must make decisions without knowing the future. Our compactor doesn't know what allocation or free requests will arrive next. It is playing a game against an all-knowing "offline" adversary, the optimal algorithm ( $\mathrm{OPT}$ ) that knows the entire sequence of requests in advance and can make perfect decisions to minimize cost. We can measure the quality of our online algorithm by its "competitive ratio": the worst-case ratio of its cost to $\mathrm{OPT}$ 's cost.

By constructing a clever sequence of requests—first fragmenting memory with small allocations, then deallocating every other block, and finally requesting a block too large to fit in any hole—we can force our simple compaction algorithm to move a large amount of data. An optimal offline algorithm, knowing this large request was coming, could have arranged the memory differently with much less work. Remarkably, for the standard full-compaction algorithm, this analysis reveals that its cost is never more than twice the cost of the offline optimum. Its competitive ratio is $2$ . This is a beautiful result: while our simple, "dumb" online strategy is not perfect, it is provably within a constant factor of perfection.

Theory can also help us answer another crucial question: when should we trigger compaction? Doing it too often is wasteful; too rarely, and the system suffers from fragmentation. The real world is random. System load fluctuates. We can model the system's state (e.g., 'Low Load' vs. 'High Load') and the corresponding fragmentation rates using the powerful mathematics of stochastic processes, specifically continuous-time Markov chains. By solving the resulting system of differential equations, we can calculate the expected time it will take for fragmentation to reach a critical threshold, and thus determine the optimal average frequency for running our compaction routine. This is a perfect marriage of practice and theory, using sophisticated mathematics to tune a deeply practical system parameter.

From its humble origins as a method for tidying up memory, we have seen that memory compaction is a concept with profound and far-reaching implications. It is a critical component of modern software, a tool for sculpting hardware performance, a complex kernel-level dance, and a subject of deep theoretical beauty. It reminds us that in computer science, the most elegant ideas are often those that bridge the gap between the abstract and the applied, revealing the hidden unity of the digital world.