Hartley's Law

SciencePedia

Key Takeaways

Hartley's Law quantifies information ( $H_0$ ) by taking the logarithm of the number of equally likely possibilities (N), with the formula $H_0 = \log_2(N)$ defining the 'bit'.
Due to the properties of logarithms, the total information from independent sources is simply the sum of their individual information contents.
The law's core assumption of uniform probability means it defines the maximum possible information content, serving as a fundamental upper bound for any real system.
This principle finds broad application beyond engineering, providing a framework for understanding information in psychology, molecular biology, and even quantum physics.

Introduction

How can we assign a precise number to something as abstract as "information"? This fundamental question, which puzzled engineers at the dawn of the communication age, lies at the heart of our digital world. The desire to quantify the content of a message, beyond its semantic meaning, led to a revolutionary insight that transformed technology and science. This article addresses this foundational problem by exploring Hartley's Law, the first successful mathematical formulation for measuring information. It provides a clear framework for understanding how information is defined, measured, and transmitted.

Over the next two sections, we will embark on a journey from first principles to far-reaching consequences. In "Principles and Mechanisms", we will dissect Hartley's simple yet powerful formula, understanding why the logarithm is the perfect tool for the job, what a "bit" truly represents, and how information from separate sources elegantly adds up. We will then explore in "Applications and Interdisciplinary Connections" how this single idea extends far beyond telegraph wires, providing a universal language to describe complexity and choice in fields as diverse as psychology, digital security, molecular biology, and even the fundamental physical limits of communication. By the end, you will not only understand how to count information but also appreciate its profound role in structuring our world.

Principles and Mechanisms

The Art of Counting Choices

How do we measure something as abstract as "information"? Let's begin with a simple, intuitive idea. Imagine receiving a message. The amount of information you've gained depends entirely on how much you didn't know before. If a friend tells you the sun rose this morning, you've learned very little; it was a near certainty. But if they tell you the winning lottery number, you have learned a great deal. The crucial difference is the number of possibilities. There was only one real possibility for the sun rising, but millions for the lottery ticket.

This fundamental connection between information and the number of possibilities was first given a mathematical form by the American engineer Ralph Hartley in 1928. He proposed that if you have to choose one message from a set of $N$ possible, equally likely messages, the information content of that choice can be quantified. His brilliant insight was to realize that the natural way to do this is with a logarithm. The information, which we now call the Hartley entropy ( $H_0$ ), is given by the beautifully simple formula:

$H_0 = \log_b(N)$

Here, $N$ is the number of distinct, equally likely states or messages. Why a logarithm? As we shall see, this choice isn't arbitrary; it gives information an almost magical property that makes it incredibly useful.

The Bit: A Universal Yardstick for Information

The base of the logarithm, $b$ , is a choice that determines our unit of measure. If we were to use base 10, we could speak of information in "dits" or "hartleys". If we use the base of natural logarithms, $e$ , the unit is called the "nat", a term you might encounter in more advanced physics or machine learning contexts.

However, the most natural and ubiquitous choice, especially in our digital world, is base 2. This gives us the fundamental unit of information we all know: the bit.

$H_0 = \log_2(N)$

What does $\log_2(N)$ truly represent? In a wonderfully practical sense, it’s the minimum number of yes/no questions you would need to ask, on average, to single out one specific outcome from $N$ possibilities.

Let's consider a practical problem. Imagine you are trying to create a binary encoding for an ancient alphabet with 30 distinct characters, and for now, we'll assume each character is equally likely to appear. The information content per character is $H_0 = \log_2(30) \approx 4.907$ bits. What on earth does "4.907 bits" mean? You can't ask 0.907 of a question! This value represents a powerful theoretical average. If you were to design a simple, fixed-length code—assigning the same number of bits to every character—you'd need to accommodate all 30 possibilities. Since $2^4 = 16$ is not enough and $2^5 = 32$ is, you would be forced to use a 5-bit code for every single character. The Hartley entropy beautifully tells us that while a simple scheme requires 5 bits, a more clever (variable-length) coding scheme could, in principle, achieve an average length approaching 4.907 bits per character. It represents a fundamental limit set by nature.

The logarithmic scale is also incredibly powerful for understanding change. Suppose a sensor system initially has 128 possible output messages. Its information content is $\log_2(128) = 7$ bits. If we reconfigure it to have only 16 possibilities, its new information content is $\log_2(16) = 4$ bits. The information has decreased by a clean $7 - 4 = 3$ bits. Notice the pattern: $128 \div 2 \div 2 \div 2 = 16$ . We halved the possibilities three times, and the information dropped by 3 bits. Every time you halve the number of possibilities, you reduce the information by exactly 1 bit.

The Magic of Logarithms: Information Adds Up

Here is where the choice of a logarithm reveals its true genius. Let's think about systems made of independent parts.

Imagine a simple environmental monitoring station. One sensor reports the wind direction from 8 possibilities (N, NE, E, etc.). A second, independent sensor reports the sky condition from 3 possibilities (Clear, Cloudy, Rain). How many distinct composite messages can the station send? Since the measurements are independent, the total number of states is simply the product of the individual possibilities: $N = 8 \times 3 = 24$ .

Now, let's look at the total information content:

$H_{\text{total}} = \log_2(24) = \log_2(8 \times 3)$

A core property of logarithms is that $\log(a \times b) = \log(a) + \log(b)$ . This allows us to break the expression apart:

$H_{\text{total}} = \log_2(8) + \log_2(3) = 3 + \log_2(3)$

Look at what just happened! The term $\log_2(8)$ is precisely the information from the wind sensor alone, and $\log_2(3)$ is the information from the weather sensor alone. The total information of the combined system is simply the sum of the information from its independent parts. This elegant property holds true for any number of independent components, whether we're talking about a synthetic biology memory element combining 4 DNA bases with 3 modification states ( $N=12$ ), or a digital circuit joining a 3-state indicator with a 4-location address bus ( $N=12$ ). Information from independent sources is additive. This is not a lucky coincidence; it is the deep reason why logarithms are the natural language for describing information.

What Exactly is a "Possibility"?

So far, calculating our number of states, $N$ , has been a matter of simple multiplication. But the world is full of more complex situations where we must be more careful in our counting. The principle remains the same, but finding $N$ can be a fun puzzle in itself.

For example, if a university needs to select a two-person debate team from a pool of 10 eligible students, what is $N$ ? It’s the total number of unique two-person teams that can be formed. The order in which they are picked doesn't matter, so we turn to the language of combinations. The number of possibilities is "10 choose 2":

$N = \binom{10}{2} = \frac{10 \times 9}{2} = 45$

The uncertainty inherent in this selection process—the information you gain when the team is finally announced—is therefore $H_0 = \log_2(45)$ bits.

Rules and constraints have a direct and quantifiable impact on information. Imagine you are designing a system to generate 3-character security codes from a set of 8 distinct symbols. Let's compare two methods:

Method R (with Replacement): Symbols can be reused. For each of the three positions, we have 8 choices. The total number of distinct codes is $N_R = 8 \times 8 \times 8 = 8^3 = 512$ . The information content is $H_R = \log_2(512) = 9$ bits.
Method U (without Replacement): Once a symbol is used, it cannot be used again. We have 8 choices for the first position, 7 for the second, and 6 for the third. The number of possibilities shrinks to $N_U = 8 \times 7 \times 6 = 336$ . The information content is $H_U = \log_2(336) \approx 8.39$ bits.

The simple constraint of "no repeats" reduces the number of possible outcomes and, consequently, reduces the system's uncertainty (its information content). The difference, $9 - \log_2(336) = \log_2(512/336) = \log_2(32/21)$ , is a precise measure of the information "gained" simply by knowing the rule of the game.

From Abstract Bits to Blazing Speeds

Hartley's law is far more than an abstract concept; it is the bedrock of modern telecommunications. It provides a direct bridge from the static measure of information to the dynamic rate at which it can be transmitted.

Consider a noiseless fiber-optic channel that can generate one of 16 distinct and perfectly distinguishable signals. From Hartley's Law, the information carried by any single one of these signals is $H_0 = \log_2(16) = 4$ bits.

Now, suppose it takes the system a fixed duration, say 250 picoseconds ( $2.5 \times 10^{-10}$ seconds), to transmit one such signal. The maximum rate of transmission is simply the amount of information per signal divided by the time it takes to send that signal:

$\text{Capacity} = \frac{\text{Information}}{\text{Time}} = \frac{4 \text{ bits}}{2.5 \times 10^{-10} \text{ s}} = 1.6 \times 10^{10} \text{ bits per second}$

This is the channel's capacity, its ultimate speed limit. This simple and powerful relationship shows that to increase the data rate of a communication system, you have two fundamental levers: increase the number of distinct symbols you can send, or decrease the time it takes to send each one.

The Uniformity Assumption: A Stepping Stone to Deeper Truths

Throughout our journey, we have leaned heavily on one crucial, simplifying assumption: that every one of the $N$ possibilities is equally likely. Hartley's Law is built on this elegant foundation of perfect uniformity.

But is the world so uniform? In the English language, the letter 'E' is a constant companion, while 'Z' is a rare visitor. A fair coin is one thing, but a loaded die is quite another. What happens to information when some outcomes are more probable than others?

This is where we see the true role of Hartley's Law. Let's analyze a simple communication system with four symbols. If we make the simplifying assumption that they are all equally likely, Hartley's law tells us the information content is $H_{\text{Hartley}} = \log_2(4) = 2$ bits per symbol. This value represents the system's maximum possible average information content.

But what if we perform a statistical analysis and discover the true probabilities are non-uniform, say {0.5, 0.25, 0.125, 0.125}? The most common symbol (with probability 0.5) is highly predictable; when we receive it, our "surprise" is low, and thus the information gained is small. The rarer symbols are far less predictable and give us a greater jolt of information. By calculating the weighted average of the information from each symbol—a concept later perfected by Claude Shannon—we find the true average information is only $1.75$ bits.

The Hartley value of 2 bits overestimates the actual information content by $0.25$ bits. This difference is not an error; it is a profound insight. It tells us that any deviation from a uniform probability distribution reduces the average information content. Predictability is the enemy of information.

Hartley's Law, therefore, stands as a critical and beautiful pillar of information theory. It quantifies information in its purest form—for the case of maximum uncertainty—and in doing so, provides a fundamental upper bound for any real-world system. It is the perfect, idealized starting point from which the richer, more nuanced landscape of information, probability, and noise can be explored.

Applications and Interdisciplinary Connections

We have seen that Hartley's Law gives us a surprisingly simple way to count information: just take the logarithm of the number of possibilities. You might be tempted to think that such a simple rule, born from the practical problems of telegraphy, would have a limited scope. But that is where the true beauty of a fundamental principle lies. Its simplicity is a sign of its power. Like the law of gravitation, which applies to both a falling apple and an orbiting planet, the idea of counting possibilities logarithmically turns out to be a universal language, spoken by engineers, psychologists, biologists, and even by the molecules within our own cells. Let us take a journey through these diverse worlds to see this principle in action.

Our journey begins where the story of information theory itself began: in telecommunications. Before the digital age, engineers were grappling with a fundamental question: how much "stuff" can you send down a wire? Imagine a hypothetical early automated telegraph system, a "Chronomessage," which could send any one of 150 distinct symbols. If it could transmit 12 of these symbols every second, what is its true information rate? Hartley's law gives us the answer directly. The "richness" of the symbol set is $\log_{2}(150)$ , and multiplying this by the transmission rate of 12 symbols per second gives us the channel's capacity in bits per second. This simple product, $R = n \log_2(S)$ , was a revolutionary concept. It separated the "how many choices" from the "how fast" and showed that both were equally important in defining the capacity of a communication channel.

This idea of a "state space" of possibilities quickly grew beyond a simple list of symbols. Consider Homer Dudley's pioneering VODER speech synthesizer, demonstrated at the 1939 World's Fair. To create speech, an operator manipulated 10 independent controls, each of which we can imagine having 8 discrete levels. How many unique sounds could the VODER produce at any given instant? It's not $10 \times 8$ . Since each control is independent, the total number of states is a staggering $8 \times 8 \times 8 \times \dots$ , ten times over, or $8^{10}$ . The information required to specify one of these states is therefore $\log_{2}(8^{10})$ , which simplifies beautifully to $10 \times \log_{2}(8) = 10 \times 3 = 30$ bits. We see a general rule emerge: for a system with $N$ independent components, each having $L$ states, the total information capacity is not a sum, but a product of possibilities, which under the logarithm becomes the sum of the individual information contents.

This quantification of "choice" is not limited to machines. What about the choices we make? In a simple psychological experiment, a participant facing a panel of 16 buttons, only one of which is correct, resolves a certain amount of uncertainty by finding the right one. How much? Well, how many possibilities were there? 16. The information gained is therefore $\log_{2}(16) = 4$ bits. This means that the single correct choice provided the same amount of information as knowing the outcome of four consecutive coin flips. The bit, once a tool for engineers, becomes a measure of knowledge itself—a way to quantify the reduction of uncertainty in any system, including the human mind. This very same logic is what underpins modern digital security. The strength of a cryptographic key lies in the size of its "key space"—the total number of possibilities a snooper would have to check. For a key made by permuting 10 distinct software modules, the number of possible keys isn't $10^{10}$ , but the number of unique orderings, which is $10!$ (ten factorial). The Hartley entropy of this system, $\log_{2}(10!)$ , is a direct measure of its security against a brute-force attack. The larger the number of bits, the more secure the secret.

It turns out that nature has been an expert in information processing for billions of years, long before humans ever thought about it. Consider the famous waggle dance of the honeybee. To tell its hive-mates where to find nectar, a forager bee performs a dance that encodes two independent pieces of information: the direction relative to the sun and the distance from the hive. If we imagine a simplified model where the bee can indicate one of 16 directions and one of 5 distance categories, the total information conveyed is the sum of the information from each part: $H_{\text{total}} = \log_{2}(16) + \log_{2}(5)$ . Nature, in its efficiency, uses a combinatorial code, just like the VODER, to pack more information into a single "message."

This principle of combinatorial coding is found in even more spectacular forms. Imagine a hypothetical deep-sea creature that communicates with pulses of light. If it can produce light in 5 distinct colors and control each pulse to have 12 distinct durations, then each single flash of light is a symbol drawn from a set of $5 \times 12 = 60$ possibilities. The information per flash is $\log_{2}(60)$ bits. If physiological constraints—the time for the brain to choose the next signal and for the light-producing organs to recharge—limit the creature to sending, say, 40 flashes per second, its total information rate is simply $40 \times \log_{2}(60)$ bits per second. We have come full circle, finding the same $R = n \log_2(S)$ relationship from the old telegraph wire now describing the luminous language of the deep sea.

The most astonishing applications of these ideas, however, are found at the microscopic level, in the machinery of life itself. A single protein is not a static object; it is an information processing hub. Its function can be altered by attaching chemical tags—a process called Post-Translational Modification (PTM). Consider a complex eukaryotic protein with multiple sites for modification: perhaps four sites that can be either phosphorylated or not (2 states each), and two sites that can be unmodified, acetylated, or ubiquitinated (3 states each). The total number of distinct functional states this single protein can exist in is a massive $2^4 \times 3^2 = 144$ . The information capacity is $\log_{2}(144)$ bits. A simpler prokaryotic counterpart might only have three sites with two states each, for a total of $2^3 = 8$ states and an information capacity of $\log_{2}(8)=3$ bits. The difference in information capacity, $\log_{2}(144) - \log_{2}(8) = \log_{2}(18)$ , quantifies the vastly greater regulatory complexity that has evolved in higher organisms. Hartley's law gives us a precise number for the evolutionary leap in information-processing power at the molecular scale.

Perhaps the most profound biological example lies within our DNA. The genetic code is famously redundant; for instance, there are six different three-letter codons that all specify the amino acid Leucine. From the perspective of building a protein, these synonymous codons are interchangeable. But from an information theory perspective, this redundancy is a hidden channel. If a position in a gene calls for an amino acid with 4 synonymous codons, that position can be used to store $\log_{2}(4) = 2$ bits of extra information, completely independent of the protein being built. By analyzing the entire sequence of a gene and summing the information capacity at each position based on its degeneracy, we can calculate a "steganographic capacity"—the total number of bits that could be hidden within the gene's sequence without ever changing its protein product. This suggests that DNA could be carrying layers of information far beyond the simple protein blueprint.

Finally, what are the ultimate physical limits to information? Hartley's law counts possibilities, but physics dictates how quickly we can create and distinguish them. In a hypothetical deep-space laser communication system, the bandwidth-time principle of wave physics states that to make shorter pulses ( $\Delta t$ ), you need more bandwidth ( $\Delta f$ ), via the relation $\Delta t \approx 1/\Delta f$ . More pulses per second means a higher information rate. You might naively think that with infinite bandwidth, you could send infinitely short pulses at an infinite rate, achieving infinite capacity. But quantum mechanics throws a wrench in the works. The energy of each light pulse is finite, limited by the transmitter's average power. As pulses get shorter, the energy per pulse decreases, meaning you can create fewer distinguishable levels (e.g., fewer photon number states). A careful analysis shows that as the bandwidth $\Delta f$ goes to infinity, the increase in pulse rate is perfectly cancelled by the logarithmic decrease in information per pulse. The channel capacity, amazingly, does not go to infinity, but approaches a finite, constant value determined by the power and frequency of the laser. Information, in the end, cannot be separated from the physical reality that carries it.

From the clicking of a telegraph key to the quantum limits of interstellar communication, from the choices in our minds to the hidden codes in our genes, the simple act of counting possibilities logarithmically has given us a universal key. It unlocks a deeper understanding of systems of every imaginable kind, revealing the hidden information that structures our world and the universe itself.