Contiguous Allocation

SciencePedia

Key Takeaways

Contiguous allocation assigns memory in a single, unbroken block, but this inevitably leads to external fragmentation, where free space is broken into unusable gaps.
Placement policies like First-Fit and Best-Fit offer different trade-offs for managing holes, while the Buddy System trades external for internal fragmentation.
Memory compaction can reclaim fragmented space but is a costly operation that relies on hardware support like the Memory Management Unit (MMU) for dynamic relocation.
Despite the rise of virtual memory, contiguous allocation remains critical for hardware DMA, real-time systems, GPU performance, and optimizing data locality.

Introduction

The need to organize data in a computer's memory is a foundational challenge in computer science. Among the earliest and most intuitive strategies is contiguous allocation, where each program is granted a single, unbroken block of memory. This approach is valued for its simplicity and hardware efficiency. However, this elegant simplicity conceals a persistent and complex problem: memory fragmentation, where available space becomes splintered into small, unusable gaps. This article confronts this challenge head-on. The first section, "Principles and Mechanisms," will deconstruct the mechanics of contiguous allocation, exploring the unavoidable issues of external and internal fragmentation, the trade-offs between placement policies, and the drastic but powerful solution of memory compaction. Subsequently, "Applications and Interdisciplinary Connections" will reveal how these seemingly classic problems are not historical artifacts but are critically relevant today, influencing everything from disk performance and real-time systems to the smooth rendering of graphics in modern video games.

Principles and Mechanisms

Imagine you're the manager of a very long bookshelf, and various people come to you asking for space to store their book collections. The simplest rule you could make is that each person's collection must be kept together in one continuous, unbroken section of the shelf. This is the essence of contiguous memory allocation. It is simple, elegant, and for the computer's hardware, incredibly efficient to manage. A process is given a starting address (the "base") and a length (the "limit"), and it can access anything within that single block. But as we shall see, this beautiful simplicity harbors a deep and persistent problem.

The Swiss Cheese Problem: External Fragmentation

Let's continue with our bookshelf analogy. At first, everything is great. You give a 3-foot section to Alice, a 2-foot section to Bob, and a 4-foot section to Carol, all next to each other. But then Bob finishes his books and removes his collection, leaving a 2-foot gap between Alice and Carol. A new person, Dave, arrives wanting a 5-foot section. You look at your shelf. You have plenty of total empty space, but it's scattered in small, unusable gaps. You have a 2-foot gap here, maybe a 1-foot gap there. None of the individual gaps is large enough for Dave's 5-foot collection. Dave is turned away, even though you have the space.

This is the vexing problem of external fragmentation. It occurs when free memory is splintered into many non-contiguous blocks, or "holes," over time. No single hole is large enough to satisfy a new request, even though the sum of the free space is more than sufficient.

Consider a more concrete example from a computer's memory. Suppose we have 1024 KiB of memory, and after some time, the free spaces are holes of sizes 96, 64, 128, 32, and 96 KiB. The total free memory is a healthy $96 + 64 + 128 + 32 + 96 = 416$ KiB. If a new process arrives requesting 200 KiB, it must be rejected. The largest single hole available is only 128 KiB. The system has enough memory in total, but it's in the wrong configuration. This is a classic and unavoidable consequence of a simple contiguous allocation scheme. The memory map has become like a slice of Swiss cheese, and our request is too large to fit into any of the holes.

The Tyranny of the Physical Address

A clever student might ask, "Why can't we just pretend the separate holes are one big block? Can't the operating system just tell the process to use the 96 KiB hole and the 128 KiB hole and just 'bridge' the gap between them?" This is a wonderful question that cuts to the heart of how computers actually work.

The answer lies in the distinction between a software promise and a physical reality. Hardware components like the CPU's memory access unit and, even more critically, Direct Memory Access (DMA) devices, are often simple-minded. They are given a starting physical address and a length, and they proceed to access that address, then the next one, then the next, in a relentlessly sequential fashion. They are like a delivery truck driver told to deliver packages to addresses 1000 through 2000 on Main Street; the driver will attempt to visit every single address number in that sequence.

If the operating system tries to "bridge" a gap, it is effectively telling the process and the hardware, "Your memory is from address 1000 to 2000," when in fact the memory from, say, 1500 to 1600 belongs to someone else. When the hardware blindly tries to access address 1550, it will be writing over another process's data or even the operating system itself—a catastrophic error. The OS can't intervene on every memory access to supply "filler bytes"; the hardware operates directly on the physical address space. An address is not just a number; it's a physical location. You cannot patch a hole in the address space with data, just as you can't make two separate empty lots contiguous by putting a picture of a lawn in the space between them. The requirement for contiguity is often a hard, physical one.

Choosing a Home: First-Fit, Best-Fit, and the Fragmentation Dance

Given that we are stuck with a collection of holes, the operating system needs a strategy, or placement policy, to decide which hole to use for a new request. Three classic strategies come to mind:

First-Fit: Scan the holes from the beginning and choose the first one that is large enough. It's fast and simple.
Best-Fit: Scan all the holes and choose the smallest one that is large enough. This seems intuitively smart, as it leaves the smallest, and presumably least useful, leftover fragment.
Worst-Fit: Scan all the holes and choose the largest one. This seems wasteful, but the idea is to leave the largest, and therefore most useful, leftover fragment.

Which is best? The answer, fascinatingly, is "it depends." One might assume Best-Fit is the champion at minimizing waste. However, this intuition can be misleading. By repeatedly choosing the "best" fit, this strategy can lead to a "death by a thousand cuts," filling the memory with a fine dust of tiny, unusable holes just barely too small for the next request.

Imagine an initial set of holes including sizes $\langle 40, 20.6, 20.6, 20.6 \rangle$ and a sequence of requests $\langle 20.5, 20.6, 20.5, 20.6 \rangle$ . Best-Fit, for the first 20.5 request, will choose a 20.6 hole, leaving a tiny, useless sliver of size 0.1. It does this again for the second 20.5 request. First-Fit, on the other hand, might put the first 20.5 request into the large 40 hole, leaving a still-useful 19.5 hole and preserving the perfectly sized 20.6 holes for later. In this specific scenario, Best-Fit ends up creating more unusable tiny-hole waste than First-Fit.

In fact, one can construct adversarial request sequences that make Best-Fit perform demonstrably worse than Worst-Fit. By making requests that snugly fit into smaller holes, Best-Fit consumes the most desirable blocks, leaving a poor distribution of holes for future, larger requests. Worst-Fit, by carving from the largest block, may preserve a healthier variety of hole sizes. The performance of these simple algorithms is a complex dance between the policy and the specific sequence of requests and deallocations. There is no universally superior strategy.

Order at a Price: The Buddy System and Internal Fragmentation

The complexities of managing arbitrarily sized holes led to the development of more structured allocators. One of the most famous is the buddy system. Instead of dealing with any possible size, this allocator only deals in blocks whose sizes are powers of two (e.g., 4, 8, 16, 32, 64 KiB...). When a request arrives, the system rounds the size up to the nearest power of two and allocates a block of that size. If a 64 KiB block is needed but only a 128 KiB block is free, the 128 KiB block is split into two 64 KiB "buddies," one is used, and the other is added to the free list. When a block is freed, the system checks if its buddy is also free; if so, they are instantly coalesced back into their larger parent block. This makes allocation and deallocation remarkably fast.

But this rigid structure comes with a different kind of cost. If a process requests 62 KiB, the buddy system will give it a 64 KiB block. If it requests 90 KiB, it gets a 128 KiB block. The extra space—2 KiB in the first case, 38 KiB in the second—is part of the allocated block but cannot be used by the process or by anyone else. This wasted space is called internal fragmentation. It is waste inside an allocated partition. This is a fundamental trade-off: we've reduced the chaos of external fragmentation by introducing the orderly, but sometimes wasteful, tax of internal fragmentation.

The Grand Cleanup: Memory Compaction and Relocation

What if external fragmentation becomes so severe that the system can no longer function effectively? The operating system has a powerful, albeit drastic, solution: memory compaction. Just like tidying our bookshelf, the OS can pause everything and slide all the allocated blocks together to one end of memory. This process squeezes out all the holes and consolidates them into one single, large, contiguous block of free space.

But this raises a critical question. If we move a process's data and code in physical memory, won't all of its internal pointers and memory references become invalid? If a program has a pointer to an object at address 16,384, and we move the whole program to start at address 32,768, that pointer is now pointing to garbage.

The solution is a beautiful piece of collaboration between hardware and software: dynamic relocation. Modern CPUs don't let a process work with raw physical addresses. Instead, a process operates in its own private logical address space, which always starts at 0. Every memory address the process generates is a logical address. When the address is sent to memory, special hardware—a Memory Management Unit (MMU)—intervenes. The MMU holds two special values for the currently running process: a base register and a limit register. It first checks if the logical address is less than the limit (to prevent the process from accessing memory beyond its allocation). If the check passes, it adds the base register's value to the logical address to produce the final physical address. $a_{\text{phys}} = a_{\text{base}} + a_{\text{logical}}$ Now, the magic of compaction becomes clear. To move a process, the OS copies the block of memory and simply updates the value in the process's base register. The process itself is oblivious; its code and pointers, which all use logical addresses, remain unchanged and perfectly valid. The hardware transparently translates them to the new physical location on the fly. This mechanism, however, has a crucial weakness. It only protects CPU-generated addresses. If a physical address is cached elsewhere, for instance to program a DMA device, that cached address will not be updated, and the system will break.

The Economics of Tidiness

Compaction is a powerful tool, but it's not free. It consumes precious CPU time to copy potentially megabytes or gigabytes of data. So, when is it worth it? This is an engineering and economic question. We must weigh the cost of compaction against its benefits.

Let's build a simple model. The cost of compaction is proportional to the amount of allocated memory, $A$ , that needs to be moved. Let's call this cost $C_{\text{compaction}} = A \cdot c_m$ , where $c_m$ is the time to move one byte. The benefit of compaction is that future memory allocations become much faster. Without compaction, finding a hole for a request might involve scanning through many small, unsuitable holes. Let's say this search costs, on average, $c_s/p$ per allocation, where $c_s$ is a constant and $p$ is the probability a random hole is big enough. After compaction, there is only one giant hole, so the search cost drops to just $c_s$ .

Over the next $N$ allocations, the total savings would be $N \cdot (c_s/p - c_s)$ . Compaction is worthwhile if the total savings outweigh the initial cost. We can find a breakeven point, $N^{\star}$ , where the cost equals the benefit. A little algebra shows that this happens when: $N^{\star} = \frac{A c_m p}{c_s (1-p)}$ If the OS expects to handle more than $N^{\star}$ allocations in the near future, it's economically rational to perform compaction now. This transforms the discussion from a qualitative one about principles to a quantitative one about policy.

A Modern Dilemma: Locality vs. Fragmentation

The principles we've discussed are not just historical footnotes; they manifest in new and interesting ways in modern computer architectures. Consider a system with Non-Uniform Memory Access (NUMA), where the machine has multiple memory controllers (nodes). Accessing memory on the same node as the CPU ("local" memory) is much faster than accessing memory on a different node ("remote" memory).

Now, our allocation problem gains a new dimension. A process running on Node 0 needs a 64 MiB contiguous block. Node 0 is heavily fragmented; its largest free hole is only 12 MiB. However, Node 1 has a pristine 80 MiB free block. What should the OS do?

Allocate Remotely: Use the block on Node 1. This is simple and preserves the memory state of Node 0. But every access to this block by the process will incur the higher remote latency, potentially adding up to a significant performance penalty.
Allocate Locally, No Compaction: This isn't an option, as no local hole is large enough.
Allocate Locally, With Compaction: Perform a costly compaction on Node 0 to create a 64 MiB space. This guarantees fast local access but incurs a large upfront overhead and changes the fragmentation state of the local node.

The choice involves a complex trade-off between access latency, compaction overhead, and the potential future cost of fragmentation. Choosing to allocate remotely might be fast now but slow down the application's runtime. Choosing to compact might cause a noticeable pause but lead to better performance overall. The "best" decision depends on the exact costs of latency, compaction, and even probabilities of future requests needing space on the local node. The simple, old problem of finding a contiguous block on a bookshelf has evolved into a high-stakes, multi-dimensional optimization problem at the heart of modern high-performance computing.

Applications and Interdisciplinary Connections

We have spent some time understanding the machinery of contiguous allocation, its various strategies, and the pesky problem of fragmentation. It might be tempting to dismiss this as a solved problem, a relic from an era before the sophisticated magic of virtual memory and paging came to dominate the scene. But to do so would be to miss a beautiful and subtle point. The challenge of arranging things together in an unbroken line is one of nature’s, and engineering's, most fundamental puzzles. The principles of contiguous allocation are not a historical footnote; they are alive and well, operating in the heart of our most advanced technologies, often in surprising disguises. To see this, we need only to look at the world around us—from the disk drive humming in your computer to the giant telescopes gazing at distant galaxies.

The Digital Filing Cabinet: Memory, Disks, and Fragmentation

Perhaps the most intuitive place to witness contiguous allocation at work is in how a computer stores files on a disk. When you save a large video file, the operating system ideally wants to write it as one long, continuous sequence of blocks on the magnetic platter or solid-state drive. Why? For the same reason that it’s easier to read a book when the pages are in order! If the file is contiguous, the disk head can read it in one smooth, continuous sweep. But if the disk is cluttered with other files, the OS might be forced to break your video into pieces, scattering it across whatever free gaps it can find. This is external fragmentation, the very same beast we saw in main memory. To read the file back, the disk head has to jump frantically from place to place, a process that is dramatically slower.

This directly mirrors the classic problem of memory allocation. The strategies we discussed—First-Fit, Best-Fit, and Worst-Fit—are used by file systems to manage free space on a disk. Each strategy has a different philosophy about how to manage the leftover space. Best-Fit, for example, tries to find the tightest possible hole for a new file, which sounds efficient but can lead to a proliferation of tiny, useless slivers of free space. Worst-Fit, in contrast, carves allocations from the largest available holes, attempting to leave behind a still-large and useful remainder. The choice of strategy directly impacts how quickly the disk fragments and how much performance degrades over time. By simulating these strategies, we can measure the resulting fragmentation and see that there is no single "best" answer; it's a trade-off between the speed of allocation and the long-term health of the storage medium.

Now, one might ask: if modern operating systems use paging and virtual memory, why would we ever care about contiguous allocation in RAM? It turns out that the elegance of paging comes with its own hidden tax. Every time your program accesses memory, the processor must translate a virtual address to a physical one. To speed this up, it uses a special cache called the Translation Lookaside Buffer, or TLB. But this cache is small. When you switch from one program to another—a context switch—the system often has to flush the TLB. The new program starts with a "cold" TLB and suffers a flurry of misses, each one causing a small but significant delay as the processor walks through page tables in memory to find the translation.

Herein lies a fascinating trade-off. Imagine a system with a very high rate of context switching, say, thousands of times per second. The total time wasted on compulsory TLB misses can become enormous. In such a scenario, the "old" method of contiguous allocation, despite its own problem of fragmentation that requires occasional, costly compaction, can surprisingly become the more efficient choice! There is a calculable crossover point, a specific context switch frequency $T$ , beyond which the cumulative penalty of TLB misses outweighs the periodic cost of compaction. Contiguous allocation, in this high-frequency world, offers a more predictable, albeit less flexible, performance profile. It's a beautiful reminder that in engineering, "new" is not always better than "old"; it's always a matter of context and constraints.

When Contiguity is King: Hardware and Real-Time Demands

For some tasks, contiguity isn't just a preference for performance; it's an absolute requirement. Many high-speed hardware devices, such as network cards and storage controllers, use a technique called Direct Memory Access (DMA). A DMA controller is a simple, specialized processor that can copy data directly between the device and main memory, freeing up the main CPU. But to achieve its speed, it is often designed to be simple: you give it a starting physical address and a length, and it goes to work. It doesn't understand page tables or virtual memory. It needs a single, physically contiguous buffer.

What happens when a device requests a $256\,\mathrm{KiB}$ buffer for a DMA transfer, but the memory is fragmented into dozens of smaller, non-adjacent free blocks? The allocation fails. The device is starved. This is a critical problem in operating system design. You cannot simply hope that a large enough block will be available. The solution is to be proactive. Modern operating systems, like Linux, implement a special mechanism known as a Contiguous Memory Allocator (CMA). At boot time, the OS reserves a large, physically contiguous region of RAM exclusively for such purposes. This region isn't wasted; it can be "loaned out" for movable memory, like file caches. When a DMA request arrives, the OS simply evicts the temporary occupants from the reserved zone, and voilà—a guaranteed contiguous block is available. This is like reserving a carpool lane on a highway; you guarantee passage for critical traffic by managing the space intelligently.

The cost of maintaining this order, however, can be subtle and dangerous. The main tool against fragmentation is compaction—shuffling allocated blocks around to coalesce free space. But what if your system is a real-time system, like the flight controller for a drone or the engine management unit in a car? These systems have strict deadlines. An operation must not just be correct; it must be completed within a specific time window. Imagine an Interrupt Service Routine (ISR) fires in response to a critical sensor reading. The ISR needs to allocate a small buffer immediately, but at that exact moment, the memory manager is in the middle of a compaction, holding a lock while it slowly copies a large block of memory from one location to another. The ISR is forced to wait, its latency budget is blown, and the system misses its deadline.

This conflict between background maintenance and foreground latency is a profound challenge. Calculating the time it takes to move a single block during compaction can reveal that this period can easily exceed the entire latency budget of a time-critical interrupt. The solutions are again a testament to clever engineering. One approach is to pre-allocate a small pool of emergency buffers just for interrupt handlers. Another is to design hardware that can handle non-contiguous memory through a technique called scatter-gather I/O, effectively teaching the "dumb" device to follow a map of scattered memory fragments.

The stakes get even higher in the world of simple embedded systems that lack a Memory Management Unit (MMU). Without an MMU, there is no virtual memory; the program sees only raw physical addresses. In such a system, a program that uses absolute pointers is "welded" to its physical location in RAM. Now, consider a fault-tolerant system that needs to periodically save its state (a checkpoint) to flash storage so it can be restored after a crash. If the system crashes and RAM has become fragmented, the only available contiguous slot large enough to restore the program might be at a different physical address. But if you restore the program's image there, all of its internal pointers will be wrong, pointing back to the old, now invalid, addresses. The program will crash instantly. This problem forces a fundamental shift in software design towards location independence. By using relative addressing (e.g., offsets from a base register) instead of absolute pointers, the program can be loaded and run from any location, making it robust against the shifting landscape of fragmented memory.

Painting Pictures: Graphics, Games, and GPUs

Nowhere is the demand for high-throughput, contiguous data more apparent than in computer graphics. Your screen is a giant grid of pixels, and the image to be displayed is stored in a region of GPU memory called a framebuffer. To achieve smooth animation and avoid a phenomenon called "tearing" (where you see half of one frame and half of the next), GPUs use a technique called double buffering. They maintain two framebuffers: a front buffer, which is currently being displayed, and a back buffer, where the next frame is being rendered. Once the back buffer is ready, the GPU swaps them.

This swap needs to happen in perfect synchrony with the display's refresh rate, an event called VSync, which might occur every $1/60$ th of a second. This entire process hinges on the GPU's ability to allocate a large, contiguous block for the back buffer. What if the GPU's video memory (VRAM) is fragmented? The allocation might require a time-consuming compaction. We can trace the timeline precisely: the game engine requests the buffer, the memory manager finds no suitable hole, it initiates compaction, it copies another block of data out of the way, and only then does it complete the allocation. If this entire sequence takes too long and misses the VSync deadline, the swap is delayed until the next VSync. The result? A dropped frame. A visible stutter or "hiccup" in your game. The abstract problem of external fragmentation has manifested as a tangible flaw in the user experience.

The same principles apply to the textures that give a game world its detail. A modern game contains terabytes of texture data, far more than can fit in VRAM at once. To manage this, games use a technique called mipmapping, which stores textures at multiple resolutions. When an object is far away, the game loads a small, low-resolution version. As you get closer, it swaps in progressively larger, higher-resolution versions. This creates a constant, dynamic stream of allocation and deallocation requests of varying sizes. This system, known as dynamic Level-of-Detail (LOD) streaming, is a high-stakes version of the memory allocation game. If a request for a high-resolution texture fails because VRAM is too fragmented, the game might be forced to use a lower-quality, blurry texture instead, or risk a performance-killing compaction. The state of the contiguous memory allocator directly determines the visual fidelity of the game world from moment to moment.

Beyond Memory: The Universal Problem of Packing Things

At this point, you might see the pattern. The problem of contiguous allocation is really an abstract problem of packing items of various sizes into a finite, one-dimensional space. The "space" doesn't have to be bytes of RAM.

Think about data structures. When you build a tree or a linked list, a standard approach is to allocate each node individually from the heap. Each node is a small block of memory, containing its data and pointers to other nodes. But these pointers have overhead, and because the nodes can be scattered all over memory, traversing the structure can lead to poor cache performance. An alternative is arena allocation. You allocate one single, large, contiguous block of memory—the arena—and then place all the nodes of your data structure inside it. Instead of memory pointers, you can use simple integer indices to refer to other nodes within the arena. This technique keeps all the data physically close together, which is wonderful for CPU caches, and it can dramatically simplify memory management. You are essentially creating your own private, purpose-built contiguous allocator for your data structure.

Let's take one final, giant leap. The "space" we are allocating in doesn't even have to be physical space. It can be time. Consider the problem of scheduling observations on a space telescope like the James Webb or Hubble. Astronomers from around the world submit requests: "I need to observe galaxy A for 40 minutes," "I need to point at star B for 75 minutes," and so on. The telescope's available time over the next week is a finite, one-dimensional resource—a timeline.

Scheduling an observation is mathematically identical to allocating a block in contiguous memory. The observation's duration is the size of the block. The timeline is the memory pool. The goal is to maximize the telescope's utilization—to pack in as many observations as possible, minimizing the "fragmented" idle time between them. But there's a twist, an analogy to the cost of compaction. Moving a telescope from one target to another (slewing) takes time and energy. This is a "transition cost" between allocations. The ideal schedule, therefore, not only maximizes the total observation time but also minimizes the total slewing angle between consecutive observations. This complex optimization problem, at its heart, is a beautiful and advanced relative of the humble contiguous memory allocation problem we started with.

From the bits on a disk to the photons from a distant star, the simple principle of arranging things in a line—and the consequences of failing to do so—is a thread that runs through the fabric of computing and beyond. It teaches us that the most fundamental ideas are often the most powerful, reappearing in new forms and challenging us to find ever more elegant solutions.