Radix-4 Booth's algorithm

SciencePedia

Key Takeaways

Radix-4 Booth's algorithm reduces the number of partial products by half compared to standard multiplication by processing two multiplier bits at a time.
The algorithm recodes the multiplier into a signed-digit representation using the set {−2, −1, 0, +1, +2}, where each digit corresponds to a simple hardware operation like a shift, addition, or subtraction.
This reduction in partial products leads to smaller and faster hardware, such as shallower Wallace tree adders, which saves chip area and significantly reduces power consumption.

Introduction

Multiplying large numbers is a fundamental operation for any computing device, but the naive method of repeated addition is inefficient and slow. In the quest for speed and efficiency, computer architects have developed sophisticated techniques to perform this crucial task faster. This is not just about doing math quicker; it's about enabling everything from high-performance computing to power-efficient mobile devices.

A key challenge in designing multipliers lies in managing the numerous intermediate steps, or "partial products," that must be generated and summed. How can we reduce this workload without compromising accuracy? The answer lies in a clever change of perspective, embodied by Booth's algorithm and its powerful successor, the Radix-4 Booth's algorithm.

This article delves into the elegant world of Radix-4 Booth's algorithm. First, under "Principles and Mechanisms," we will uncover the mathematical trick that allows the algorithm to recode binary numbers into a more efficient form, replacing many additions with a few simple shifts and add/subtract operations. Following that, in "Applications and Interdisciplinary Connections," we will explore the profound impact of this method on hardware design, revealing how it leads to smaller, faster, and more power-efficient multiplier circuits that are foundational to modern electronics.

Principles and Mechanisms

Imagine you're an accountant from a time before calculators. Your job is to multiply large numbers, all day, every day. You'd quickly realize that multiplication is just a fancy word for repeated addition. Multiplying a number $M$ by 13, for instance, means adding $M$ to itself 13 times. It's tedious and slow. A clever accountant, however, might notice that $13 = 10 + 3$ , and instead calculate $(M \times 10) + (M \times 3)$ . This is much faster. A computer's arithmetic logic unit (ALU) faces the same challenge, just with binary numbers. The journey to the Radix-4 Booth's algorithm is a story of finding ever-cleverer ways to be "lazy" and efficient.

The Art of Avoiding Work: From Many Additions to One Subtraction

Let's think in binary. Suppose we want to multiply some number, our multiplicand $M$ , by 7. In binary, 7 is 0111. A straightforward "shift-and-add" multiplier would look at the bits of 0111 from right to left:

Bit 0 is 1: Add $M$ (shifted by 0).
Bit 1 is 1: Add $M$ (shifted by 1), which is $2M$ .
Bit 2 is 1: Add $M$ (shifted by 2), which is $4M$ .
Bit 3 is 0: Do nothing.

The result is $M + 2M + 4M = 7M$ . This requires three separate addition operations. It works, but can we do better?

What if we notice that $7 = 8 - 1$ ? In binary, this is 1000 - 0001. Multiplying by 7 could instead be performed as a single subtraction: (8M) - (1M). In a computer, multiplying by 8 is not an actual multiplication; it's a simple, lightning-fast left shift by three bit positions. So, instead of three additions, we now have one shift and one subtraction. This is a huge win!

This simple trick is the soul of Booth's algorithm. It recognizes that long strings of 1s in a binary number can be replaced by a subtraction at the beginning and an addition at the end. For example, the number 00111100 (which is 60) can be seen as 01000000 - 00000100 (or $64 - 4$ ). Instead of four additions, we do one subtraction and one addition. The algorithm formalizes this trick by scanning the multiplier bits and looking for the start and end of these strings of ones.

Taking Bigger Steps: The Radix-4 Leap

The original Booth's algorithm (Radix-2) looks at bits one by one (in overlapping pairs). It's clever, but the question soon arises: why take small steps when you can take giant leaps? This is where Radix-4 Booth's algorithm comes in. Instead of processing the multiplier one bit at a time, we process it two bits at a time.

By examining two bits in each step, we effectively cut the number of operations in half. For an $n$ -bit multiplier, we'll only need $n/2$ steps instead of $n$ . This means fewer partial products to calculate and add up, which directly translates to a smaller, faster, and more power-efficient multiplier circuit. But how does this work? If we just look at two bits, say 11 (which is 3), we'd need to compute $3M$ . This seems to complicate things, as calculating $3M$ isn't as simple as a shift. We need a more cunning approach.

The secret is to not just look at the pair of bits in isolation, but to also peek at the last bit of the previous pair. This gives us context. For each step $i$ , we look at an overlapping group of three bits from the multiplier $Y = (\dots y_{2i+1} y_{2i} y_{2i-1} \dots)$ . The bit $y_{2i-1}$ is the "carry-in" from the right, telling us if we're at the beginning, middle, or end of a string of ones.

The Secret Code: How to Recode a Multiplier

Based on this three-bit group, we generate a "recoded digit" from a surprisingly small set: $\{-2, -1, 0, +1, +2\}$ . This digit tells the hardware what to do in the current step. The conversion is governed by a simple rule that can be expressed with a formula or a lookup table. The value of the recoded digit $d_i$ for the triplet $(y_{2i+1}, y_{2i}, y_{2i-1})$ is given by:

$d_i = -2y_{2i+1} + y_{2i} + y_{2i-1}$

Let's see what this means intuitively:

String of Zeros ...000...: If the triplet is 000, the formula gives $d_i = 0$ . This makes sense; in a region of all zeros, we don't need to add anything.
Isolated One ...0010...: To get the digit for 01, we look at the triplet 010 (assuming the previous bit was 0). The formula gives $d_i = -2(0) + 1 + 0 = +1$ . We need to add $1 \times M$ .
Two Ones ...0110...: For the pair 11, the triplet is 011. The formula gives $d_i = -2(0) + 1 + 1 = +2$ . We need to add $2 \times M$ .
Start of a String of Ones ...1110...: Here, the triplet is 110. The formula gives $d_i = -2(1) + 1 + 0 = -1$ . This is the "subtraction at the end of the string" trick we saw earlier!
End of a String of Ones ...0011...: The triplet is 001. The formula gives $d_i = -2(0) + 0 + 1 = +1$ . This is the "addition at the beginning of the string".
Middle of a String of Ones ...1111...: The triplet is 111. The formula gives $d_i = -2(1) + 1 + 1 = 0$ . This is beautiful! It tells us to do nothing in the middle of a long string of ones, just as our intuition suggested.

Let's try a complete example. Suppose our multiplier is the 8-bit number $Y = 01011011_2$ . To recode it, we first append a zero on the right: $01011011\underline{0}$ . Now we form overlapping 3-bit groups from right to left:

Group 0: $(y_1, y_0, y_{-1}) = (1, 1, 0) \rightarrow 110 \rightarrow d_0 = -1$
Group 1: $(y_3, y_2, y_1) = (1, 0, 1) \rightarrow 101 \rightarrow d_1 = -1$
Group 2: $(y_5, y_4, y_3) = (0, 1, 1) \rightarrow 011 \rightarrow d_2 = +2$
Group 3: $(y_7, y_6, y_5) = (0, 1, 0) \rightarrow 010 \rightarrow d_3 = +1$

So, the number $Y = 01011011_2$ is transformed into the sequence of recoded digits $(+1, +2, -1, -1)$ . We have converted an 8-bit number into a 4-digit "secret code".

From Code to Action: The Hardware's Simple Dance

Now, what does the hardware do with this code [+1, +2, -1, -1]? Each digit corresponds to a simple action on the multiplicand $M$ .

 $d_i=0$ : Do nothing. Just shift the accumulator for the next step.
 $d_i=+1$ : Add $M$ to the partial product.
 $d_i=-1$ : Subtract $M$ (which means adding its two's complement).
 $d_i=+2$ : This is the magic move. To get $2M$ , we don't need a complex calculation. The hardware simply performs a one-bit left shift on the multiplicand $M$  and adds the result. This is an incredibly fast and cheap operation.
 $d_i=-2$ : Similarly, we get $2M$ by a one-bit left shift on $M$ , and then we subtract this value (by taking its two's complement and adding).

The entire complex multiplication is reduced to a simple dance of four steps (for an 8-bit multiplier). In each step, a controller looks at the recoded digit and tells an adder/subtractor circuit which operation to perform on either $M$ or a shifted version of $M$ . After each step, the accumulated result is shifted right by two positions to prepare for the next, more significant digit. A full multiplication, like the one demonstrated in, is a beautiful orchestration of these simple steps, drastically reducing the total number of additions required compared to the naive method.

A Look Under the Hood: The Beauty of Base-4

There's a deeper mathematical elegance at play here. The recoded digits aren't just an arbitrary code; they represent the multiplier in a different number system. Specifically, it's a signed-digit representation in base 4.

A number $Y$ represented by the Radix-4 digits $(d_k, \dots, d_1, d_0)$ has the value:

$Y = \sum_{i=0}^{k} d_i \cdot 4^i = d_k \cdot 4^k + \dots + d_1 \cdot 4^1 + d_0 \cdot 4^0$

Let's check our recoded number from before: $(+1, +2, -1, -1)$ . Value $= (+1) \cdot 4^3 + (+2) \cdot 4^2 + (-1) \cdot 4^1 + (-1) \cdot 4^0$ Value $= 1 \cdot 64 + 2 \cdot 16 - 1 \cdot 4 - 1 \cdot 1 = 64 + 32 - 4 - 1 = 91$ .

And what was our original binary number? $01011011_2 = 64 + 16 + 8 + 2 + 1 = 91$ . It matches perfectly!

The algorithm is essentially a conversion from base 2 to this special base-4 system. This insight allows us to work backwards, too. If someone gives you a recoded string like $(+2, 0)$ , you can immediately find its value: $2 \cdot 4^1 + 0 \cdot 4^0 = 8$ , which is 001000 in 6-bit binary. You can even solve puzzles, like finding the original 6-bit binary number that produces the recoding (-1, -1, +1) by working the recoding rules in reverse, or finding all possible numbers that satisfy certain constraints on their recoded digits. We can even design numbers that produce a desired pattern of recoded digits, like finding the smallest positive number that gives a [+,-,+] pattern of digits.

This reveals the profound unity of the concept: Radix-4 Booth's algorithm isn't just a collection of hardware tricks. It's a fundamental change in number representation, chosen specifically because the new representation is perfectly suited for fast and efficient hardware implementation. It replaces brute-force binary arithmetic with an elegant system where each "digit" corresponds to a trivial hardware operation: a shift and/or an add/subtract.

Applications and Interdisciplinary Connections

We have seen the clever machinery of the Radix-4 Booth's algorithm, a beautiful bit of mathematical sleight of hand that recodes a number into a more efficient form. But a good magician's trick is more than just clever; it has a purpose. So, we must ask: what is the point? Why go through the trouble of examining bits in overlapping triplets and generating this peculiar sequence of operations like $+M$ , $-2M$ , and so on? The answer, as is so often the case in science and engineering, lies in the relentless pursuit of speed and efficiency. In the microscopic world of a computer chip, where billions of calculations happen every second, multiplication is a common but costly task. Making multiplication faster and cheaper is not a small tweak; it's a fundamental leap that affects everything.

The journey of Booth's algorithm from an abstract concept to a cornerstone of modern electronics is a wonderful illustration of how different fields of knowledge connect. It’s a story that takes us from pure arithmetic to the physical layout of silicon wafers.

Taming the Beast: The Partial Product Problem

Let's first appreciate the problem that Booth's algorithm so elegantly solves. A standard binary multiplication of two $N$ -bit numbers is, at its heart, a series of shifts and additions. For each '1' in the multiplier, you take a shifted copy of the multiplicand; for each '0', you take nothing. You end up with a list of up to $N$ numbers, called "partial products," that you must then painstakingly add together. Adding a long list of numbers, even for a computer, takes time and hardware.

This is where the magic of Radix-4 Booth recoding comes into play. By examining two bits of the multiplier at a time (with a third for context), the algorithm essentially says, "Instead of adding many simple things, let's add a few, slightly more complex things." Instead of just generating $0$ or $M$ , it generates one of a small set of values: $0, \pm M, \pm 2M$ . The crucial result? For an $N$ -bit multiplier, you no longer have $N$ partial products to sum. You have only $N/2$ . You have cut the list in half. This single consequence is the seed from which all other benefits grow.

The Race to the Sum: Building with Wallace Trees

Halving the number of partial products is a great start, but how does that translate to a faster chip? Imagine you have to add a tall stack of papers, each with a number on it. You could add the first two, then add the third to the result, and so on. This is slow and sequential. A much faster way would be to organize a "summation tournament." You pair up the papers, add them in parallel, and are left with a new, shorter stack. You repeat this process in rounds until only two papers are left for a final addition.

This "tournament" is precisely the idea behind a hardware structure known as a Wallace tree. It's a marvel of parallel computation, using layers of simple logic gates called full adders to reduce a large number of partial products down to just two numbers, which can then be summed by a fast, conventional adder. The speed of the Wallace tree is determined by its depth—the number of rounds in our tournament.

Here, the brilliance of Booth's algorithm shines. If you start with half the number of partial products, you need fewer rounds to get to the final two. For instance, in a classic 8x8 multiplication, the standard method generates 8 partial products, requiring 4 stages in a Wallace tree to sum them. By using Radix-4 Booth's algorithm, we start with only 4 partial products. The Wallace tree now only needs 2 stages. We've halved the depth of the adder tree! In the world of CPU clocks, where a nanosecond is an eternity, this reduction in logic depth is a monumental gain in speed.

The Currency of Silicon: Counting Gates and Saving Power

Speed is one side of the coin; the other is cost. In digital design, cost is measured in several ways: the physical area the circuit occupies on the silicon chip, the complexity of its wiring, and the power it consumes. Every logical operation—every addition, every shift—is performed by physical collections of transistors called logic gates. The most common building blocks for addition are Full Adders (which sum 3 bits) and Half Adders (which sum 2 bits).

The reduction in partial products by Booth's algorithm directly translates into a reduction in the number of these physical gates. Engineers can analyze the structure of a multiplier and derive precise formulas for the hardware required. For an $N$ -bit multiplier, using a Radix-4 Booth encoder significantly reduces the number of full-adder cells required in the adder tree compared to a standard implementation. This represents a significant saving compared to a standard multiplier, especially as $N$ gets larger. By carefully analyzing the number of bits in each column of the partial product matrix, designers can precisely calculate the number of adders needed at each stage of the Wallace tree, ensuring not a single transistor is wasted.

This is not just an academic exercise. Fewer gates mean a smaller area on the chip, which in turn means more multipliers can fit on a single wafer, lowering manufacturing costs. More profoundly, fewer gates mean less power is consumed with every multiplication. For a battery-powered device like a smartphone, this efficiency is paramount. For a massive data center with thousands of servers, the cumulative power savings are enormous. Booth's algorithm, a piece of arithmetic theory, thus becomes a key player in green computing.

From Blueprint to Reality: The Art of Digital Design

So, how does an engineer actually build this? How do these abstract ideas of recoding and adding become a tangible piece of hardware? The answer lies in the field of digital logic design and the use of Hardware Description Languages (HDLs) like Verilog or VHDL. An engineer doesn't solder individual gates anymore; they write code that describes the structure and behavior of the circuit.

The implementation of one stage of a Booth multiplier is a beautiful microcosm of this process. The design is modular. First, you need the potential partial products. You take the multiplicand $M$ , sign-extend it to the proper width, and call this $+M$ . A simple shifter module performs a left shift to create $+2M$ . Two's-complementer modules are used to generate $-M$ and $-2M$ .

Now you have all the possible ingredients: $0, \pm M, \pm 2M$ . Which one do you choose? This is handled by a multiplexer, which is essentially an electronic selector switch. The three bits from the Booth encoder ( $y_{2i+1}, y_{2i}, y_{2i-1}$ ) are fed into the "select" lines of the multiplexer. Like a dial, this 3-bit code tells the multiplexer which of its inputs to pass to the output. If the code is 011, the multiplexer selects the $+2M$ value. If it's 101, it selects $-M$ . This output is the generated partial product for that stage, ready to be sent to the Wallace tree.

This structural description—connecting shifters, complementers, and multiplexers—is the blueprint that is ultimately synthesized by software tools into a final transistor-level layout. It is a stunningly direct translation of an algorithm into a physical machine. The elegance of the math is preserved in the elegance of the circuit's architecture. Booth's algorithm is not just something a computer does; it is something a computer is. It is baked into the very silicon that powers our digital lives.