try ai
Popular Science
Edit
Share
Feedback
  • Hartley Entropy

Hartley Entropy

SciencePediaSciencePedia
Key Takeaways
  • Hartley entropy quantifies the information content of a system as the logarithm of its total number of equally likely states (H0=log⁡(N)H_0 = \log(N)H0​=log(N)).
  • The units of information—bits, hartleys, and nats—correspond to the base of the logarithm used (base 2, 10, or e, respectively).
  • Hartley entropy is the special case of the more general Shannon entropy that occurs when all outcomes have a uniform probability distribution, representing the system's maximum information capacity.
  • This simple concept serves as a universal yardstick to measure and compare information across diverse fields, including engineering, genetics, and physics.

Introduction

How do we assign a number to an abstract concept like "information"? This fundamental question puzzled the pioneers of communication, who needed a way to measure the content of a signal, the capacity of a channel, and the very nature of knowledge itself. Before the complex algorithms of the digital age, a simple, elegant solution was needed to quantify the uncertainty resolved when one outcome is chosen from many possibilities. This quest led to the birth of information theory, with one of the first and most fundamental ideas being a way to measure information by simply counting choices.

This article explores this foundational concept: Hartley entropy. It addresses the gap between the intuitive feeling that more choices mean more information and the mathematical formalism needed to build a robust theory. We will unpack the simple yet profound insight that a logarithmic scale is the key to measuring information in a way that is additive and universal.

The article is structured to build this understanding from the ground up. In "Principles and Mechanisms," we will delve into the logic behind Hartley's formula, explore the different units of information like bits and nats, and see how this idea serves as a special case for the more general Shannon entropy. Following this, "Applications and Interdisciplinary Connections" will take us on a journey through engineering, biology, and physics, revealing how this single concept provides a unifying language to describe everything from telegraph signals to the information content of DNA and the universe itself.

Principles and Mechanisms

How to Measure a Choice?

How much information is in a single piece of data? Is it in the black-and-white pixels that make up this letter 'A', or in the firing of a single neuron in your brain? Before we can build grand theories, we must agree on what we are even measuring. The pioneers of information theory faced this very question. They weren't just thinking about computer data; they were thinking about telephone signals, language, and even the very nature of knowledge.

Let's try a little thought experiment, inspired by the kind of psychological studies that were happening in the 1920s when these ideas were first taking root. Imagine you are in a room with a panel of 16 identical buttons. You are told that one, and only one, of these buttons is the "correct" one. All are equally likely. How much "information" do you gain when you are finally told which button it is?

Is it more or less information than being told the outcome of a single coin flip? Your intuition probably tells you it's much more. A coin flip has only two possibilities. Here, you have 16. The information you gained corresponds to the uncertainty that was eliminated. Your uncertainty about 16 possibilities was reduced to a single certainty.

This is the very heart of the matter. The amount of information is a measure of the number of possibilities. But the relationship isn't linear. Consider two such panels, each with 16 buttons. The total number of combined possibilities isn't 16+16=3216 + 16 = 3216+16=32, but 16×16=25616 \times 16 = 25616×16=256. We feel that the information should add, not multiply. If one choice gives you III amount of information, two independent choices should give you 2I2I2I. What mathematical function turns multiplication into addition? The logarithm!

This simple but profound insight is the key that unlocks the whole field. If the number of equally likely possibilities is NNN, the information content, which we'll call H0H_0H0​, is proportional to the logarithm of NNN.

H0=log⁡(N)H_0 = \log(N)H0​=log(N)

For our panel of 16 buttons, the information resolved by finding the correct one is log⁡(16)\log(16)log(16). This single, elegant idea was first formally proposed by Ralph Hartley in 1928, and this measure, H0H_0H0​, is now called the ​​Hartley entropy​​. It is the simplest, most fundamental measure of the information capacity of a system.

A Universal Yardstick: Counting Possibilities

The beauty of Hartley's idea is its universality. It doesn't care if you're choosing between buttons, security codes, or quantum states. All that matters is one question: "How many choices are there?" The entire game boils down to counting.

Imagine you are designing a security system that generates 8-character tokens. The rules might be complex: perhaps the first three characters are a specific arrangement of letters, and the last five are digits. To find the Hartley entropy of this system, you don't need to know anything about electronics or cryptography. You just need to be a careful bookkeeper. You calculate the number of ways to choose the letters, you calculate the number of ways to choose the digits, and you multiply them together to get the total number of possible tokens, NNN.

Once you have that grand total NNN, the maximum information contained in any single token—its Hartley entropy—is simply log⁡(N)\log(N)log(N). That's it. The hard work is in the counting, not in the information theory. The logarithm is just the universal yardstick we use to measure the result.

But this brings us to a crucial point. When we measure a length, we have to specify the units—inches, meters, or light-years. When we write log⁡(N)\log(N)log(N), we have left out something important: the base of the logarithm. Changing the base is like changing our yardstick. It doesn't change the underlying reality of the information, but it changes the number we use to describe it.

The Currency of Information: Bits, Nats, and Hartleys

The choice of base for the logarithm is nothing more than a choice of units. Over the years, three main "currencies" of information have become standard, each with its own history and domain of convenience.

  1. ​​The Bit (base 2):​​ The most famous unit, the bit, comes from using a base-2 logarithm. The information is measured in units of coin flips. The question it answers is: "How many fair coin flips would you need to perform to specify one unique outcome among NNN possibilities?" For our 16 buttons, since 16=2416 = 2^416=24, we find that the information content is exactly log⁡2(16)=4\log_{2}(16) = 4log2​(16)=4 bits. This is no coincidence. You could label the buttons 0000, 0001, 0010, ..., 1111, and four binary questions (Is the first digit 1? Is the second...?) would perfectly identify any button. This is the natural language of computers.

  2. ​​The Hartley (base 10):​​ This unit, also called a "ban" or "dit," is named in honor of Hartley himself and uses the base-10 logarithm. It is the natural choice for systems built around our ten fingers and the decimal system. It answers the question: "How many rolls of a 10-sided die would you need?" A password system using only the digits 0-9 with 10610^6106 possibilities would contain log⁡10(106)=6\log_{10}(10^6) = 6log10​(106)=6 hartleys of information.

  3. ​​The Nat (base e):​​ This unit uses the natural logarithm (base e≈2.718e \approx 2.718e≈2.718), a number beloved by mathematicians for its elegant properties in calculus. The "nat" is the most "natural" unit from a purely mathematical standpoint and is often used in theoretical physics and advanced statistics.

Since these are just different units for the same underlying quantity, we must be able to convert between them, just as we convert between inches and centimeters. The tool for this is the standard change-of-base formula for logarithms: log⁡b(x)=log⁡a(x)log⁡a(b)\log_{b}(x) = \frac{\log_{a}(x)}{\log_{a}(b)}logb​(x)=loga​(b)loga​(x)​.

So, if a scientist measures the entropy of a biological circuit to be 1.751.751.75 bits, we can express this in hartleys by calculating 1.75×log⁡10(2)≈0.52681.75 \times \log_{10}(2) \approx 0.52681.75×log10​(2)≈0.5268 hartleys. Conversely, a signal measured as 2.52.52.5 hartleys is equivalent to 2.5×log⁡2(10)≈8.302.5 \times \log_{2}(10) \approx 8.302.5×log2​(10)≈8.30 bits. A value of ln⁡(42)\ln(42)ln(42) nats is simply log⁡10(42)\log_{10}(42)log10​(42) hartleys.

This allows us to make meaningful comparisons. Which system has more uncertainty: a quantum process measured at 151515 hartleys or a data stream measured at 454545 bits? We just need to convert them to a common currency. Converting the physicist's measurement to bits, we get 15 hartleys⋅log⁡2(10) bits1 hartley≈49.8 bits15 \text{ hartleys} \cdot \frac{\log_2(10) \text{ bits}}{1 \text{ hartley}} \approx 49.8 \text{ bits}15 hartleys⋅1 hartleylog2​(10) bits​≈49.8 bits. The quantum system is, in fact, slightly more uncertain than the data stream. We can even combine the information from independent measurements—say, from a plasma detector measuring in nats and a magnetometer in bits—by converting both to a single unit, like hartleys, and then simply adding them together.

When Life Isn't Fair: The Entry of Probability

Hartley's elegant formula H0=log⁡(N)H_0 = \log(N)H0​=log(N) rests on a powerful, often unstated, assumption: that every outcome is equally likely. It assumes the coin is fair, the dice are not loaded, and every button has the same chance of being the right one. But what happens when that's not true?

Imagine a simplified telegraphy system with just four symbols: A, B, C, and D. The alphabet size is N=4N=4N=4. The Hartley entropy, our cardinality-based measure, is log⁡2(4)=2\log_{2}(4) = 2log2​(4)=2 bits. This tells us the maximum information that could be sent with each symbol.

But suppose we analyze the transmissions and find that 'A' is used half the time (p1=0.5p_1=0.5p1​=0.5), 'B' a quarter of the time (p2=0.25p_2=0.25p2​=0.25), and 'C' and 'D' an eighth of the time each (p3=p4=0.125p_3=p_4=0.125p3​=p4​=0.125). When you receive an 'A', are you really surprised? Not very. It was the most likely outcome. But if you receive a 'D', that's more surprising, more informative.

The simple Hartley entropy misses this nuance. It gives the system a flat 2 bits of information per symbol. This is where the genius of Claude Shannon enters the picture. He refined Hartley's idea by introducing probabilities. Shannon proposed that the information or "surprisal" of a single event iii is inversely related to its probability, pip_ipi​:

I(i)=−log⁡(pi)I(i) = -\log(p_i)I(i)=−log(pi​)

A very probable event (large pip_ipi​) carries little information. An extremely rare event (tiny pip_ipi​) carries a huge amount of information. The ​​Shannon Entropy​​, which we denote with HHH, is then the average information per event, calculated by weighting the information of each event by its probability of occurring:

H=−∑i=1Npilog⁡(pi)H = -\sum_{i=1}^{N} p_i \log(p_i)H=−∑i=1N​pi​log(pi​)

For our telegraphy example, the Shannon entropy is: H=−(0.5log⁡2(0.5)+0.25log⁡2(0.25)+2×0.125log⁡2(0.125))=1.75 bitsH = -\left(0.5\log_2(0.5) + 0.25\log_2(0.25) + 2 \times 0.125\log_2(0.125)\right) = 1.75 \text{ bits}H=−(0.5log2​(0.5)+0.25log2​(0.25)+2×0.125log2​(0.125))=1.75 bits

Notice what happened. The true average information, the Shannon entropy, is 1.751.751.75 bits. The Hartley entropy was 222 bits. The simple, cardinality-based method overestimated the information content by 0.250.250.25 bits. This isn't an error; it's a profound insight. The uneven probabilities have introduced a degree of predictability into the system, which reduces its average uncertainty.

We can now see the Hartley entropy in its true light: ​​Hartley entropy is the special case of Shannon entropy when all probabilities are equal.​​ It represents the absolute maximum entropy a system with NNN states can have. Any deviation from a uniform probability distribution will cause the Shannon entropy to be less than the Hartley entropy. This difference between the maximum possible entropy (H0H_0H0​) and the actual entropy (HHH) is a measure of the system's structure and predictability. In data compression, this gap is called ​​redundancy​​—it's the "wasted" capacity in a code that isn't carrying new information.

This single framework, starting with counting choices and refining it with probabilities, is astonishingly powerful. It allows ecologists to quantify the biodiversity of a habitat from species populations, and it allows biologists to measure the information stored in the molecular configurations of a protein. From the simple act of choosing a button, we have arrived at a universal language to describe the structure and uncertainty of the world.

Applications and Interdisciplinary Connections

We have explored the basic idea of Hartley entropy: that the information content of a message is simply the logarithm of the number of possible messages one could have sent. On its face, this might seem like a book-keeping trick, a mere definition. But the true power and beauty of a scientific concept are revealed in its ability to connect seemingly disparate phenomena. The simple act of counting possibilities, when formalized by a logarithm, becomes a universal ruler that can measure things as different as a telegraph signal, the code of our own DNA, and the information capacity of the universe itself. Let us embark on a journey through the sciences to see how this one idea provides a unifying thread.

The Dawn of the Information Age: Engineering and Communication

The story of information theory begins, naturally, with the challenge of communication. In the early days of telecommunications, engineers were building systems to transmit messages, but they lacked a fundamental way to quantify what they were transmitting. Think of a hypothetical early telegraph system designed to send financial data. If the machine has a set of, say, 150 distinct symbols and can transmit 12 of them every second, how much "information" is flowing?

Ralph Hartley provided the first quantitative answer. For each symbol transmitted, a choice is made from 150 possibilities. The information content of that single choice, he proposed, is H=log⁡2(150)H = \log_{2}(150)H=log2​(150), which is roughly 7.237.237.23 bits. Since this choice is made 12 times per second, the total information rate is simply the product of these two numbers, about 86.786.786.7 bits per second. For the first time, engineers had a ruler. They could compare different coding schemes, calculate the capacity of a channel, and treat the abstract notion of "information" as a concrete, measurable quantity.

This principle scales beautifully to more complex signals. Consider the human voice. It seems infinitely rich and variable. Yet the pioneering work on devices like Homer Dudley's Voder in the 1930s showed that even speech could be quantified. A vocoder works by breaking the speech signal into several frequency bands and periodically measuring the energy in each. Imagine a simplified model with 8 bands, where the energy in each is quantized to one of 16 discrete levels. At each sampling moment, the system is making 8 independent choices, each from a set of 16 possibilities. The total information generated in that instant is the sum of the information from each choice: 8×log⁡2(16)=8×4=328 \times \log_{2}(16) = 8 \times 4 = 328×log2​(16)=8×4=32 bits. By sampling this data stream rapidly, one captures the essential information of the speech signal. This fundamental idea—deconstructing a complex signal into a set of simpler, quantifiable choices—is the bedrock upon which modern digital audio, image compression, and nearly all digital communication are built.

The Code of Life: Biology and Genetics

Perhaps the most stunning and profound application of information theory lies in the heart of biology itself. In his 1944 book What is Life?, the physicist Erwin Schrödinger, pondering how the blueprint for an entire organism could be stored in a tiny cell, imagined the hereditary substance as an "aperiodic crystal"—a long, complex message written in a molecular alphabet. He was spectacularly correct.

We now know this aperiodic crystal is DNA. It uses an alphabet of just four "letters" (the nucleotides A, T, C, and G). A sequence of length NNN can therefore specify one of KN=4NK^N = 4^NKN=4N possible messages. Following Hartley's logic, the information capacity of such a sequence is H=log⁡2(KN)=Nlog⁡2(K)H = \log_2(K^N) = N \log_2(K)H=log2​(KN)=Nlog2​(K). For DNA, where K=4K=4K=4, the capacity is simply 2N2N2N bits. This means a tiny gene segment of just 50 nucleotides can already hold 100100100 bits of information, allowing it to specify one of over 103010^{30}1030 distinct sequences. Life, it turns out, is a master of information processing, using a simple code to store a staggering amount of data.

The story gets even more interesting when we look at how this DNA message is translated into proteins. The genetic code exhibits a feature called "degeneracy," where multiple three-letter "codons" can specify the same amino acid. For instance, the amino acid leucine is encoded by six different synonymous codons. At first glance, this might seem redundant. But from an information theory viewpoint, it is a feature, not a bug.

If a choice must be made between kkk equally likely options, the uncertainty resolved by that choice is log⁡2(k)\log_2(k)log2​(k) bits. For an organism to encode leucine at a specific position in a protein, it must choose one of the 6 available codons. This choice itself represents a channel with a capacity of log⁡2(6)≈2.585\log_2(6) \approx 2.585log2​(6)≈2.585 bits. This "redundancy" provides a hidden layer, a separate information channel embedded within the primary genetic code. While the choice of a synonymous codon doesn't change the final protein, it can carry other signals that, for example, regulate the speed of protein synthesis.

This hidden channel is not just a theoretical curiosity; it constitutes a real "steganographic capacity" within the gene. We can quantify it. By going through a gene codon by codon and summing the information capacity at each position—adding log⁡2(2)\log_2(2)log2​(2) for a position with two synonymous choices, log⁡2(4)\log_2(4)log2​(4) for one with four, and so on—we can calculate the total number of bits that can be encoded in synonymous codon choices without altering the protein sequence. For a typical gene, this hidden information can amount to thousands of bits, a vast, secondary message written right on top of the primary one.

The Universe as Information: Physics from the Small to the Large

The connection between counting possibilities and a physical property finds its deepest roots in the field of statistical mechanics. Consider a simplified model of a computer memory strip: a one-dimensional lattice with MMM sites where you can place NNN indistinguishable electrons. The number of unique ways to arrange these electrons is given by the famous combinatorial formula: Ω=(MN)\Omega = \binom{M}{N}Ω=(NM​) In the 19th century, Ludwig Boltzmann proposed that the thermodynamic entropy of this system—a measure of its disorder—was directly related to this number of arrangements: S=kBln⁡ΩS = k_B \ln \OmegaS=kB​lnΩ where kBk_BkB​ is the Boltzmann constant.

Now, look at this formula through the lens of information theory. If every one of the Ω\OmegaΩ arrangements is equally likely, the Hartley information required to specify one particular arrangement is H=log⁡2(Ω)H = \log_2(\Omega)H=log2​(Ω). Boltzmann's thermodynamic entropy and Hartley's information entropy are describing the exact same thing. They are merely measured in different units (kBln⁡k_B \lnkB​ln vs. log⁡2\log_2log2​). The entropy of a physical system is the information we lack about its precise microscopic state. Disorder is just missing information.

This principle holds all the way down to the quantum realm. If a single electron is confined such that it can occupy one of N=5×1020N = 5 \times 10^{20}N=5×1020 distinct quantum states, the amount of information needed to specify its exact state is simply log⁡(N)\log(N)log(N). Information is not an abstract human invention; it is a physical quantity, woven into the very fabric of quantum reality.

From the infinitesimally small, we can leap to the cosmically large. Is there a limit to how much information can be packed into a region of space? Remarkably, it seems there is. Inspired by the thermodynamics of black holes, Jacob Bekenstein proposed a universal bound on the information content of any system, limited by its energy and size. By considering an idealized system, such as a volume of space filled with a thermal photon gas, and pushing it to this theoretical limit, one can derive an expression for the maximum possible information density. Astonishingly, this maximum density depends only on the temperature and a collection of fundamental constants like the speed of light and Planck's constant. This suggests that there is no such thing as infinite information density. The universe itself appears to have a finite hard drive capacity, a fundamental limit quantified by the same logic that began with counting the possibilities of a telegraph signal.

A Note of Caution: Information, Correlation, and Energy

Having taken this grand tour from telegraphs to black holes, it is essential to end with a word of caution, as any good scientist must. It is easy to become so enamored with a powerful idea that one sees it everywhere, as the master key to all puzzles.

Consider the electrons in a molecule. Their motions are not independent; they are correlated by electrostatic repulsion and quantum mechanics. The position of one electron tells you something about where the others are likely to be. From an information perspective, this correlation can be quantified using measures like "mutual information." At the same time, this correlation has energetic consequences; it lowers the total energy of the system compared to a hypothetical state of non-interacting electrons. This energy difference is a crucial quantity in quantum chemistry, known as the "correlation energy."

Are these two quantities—the information measure and the energy measure—just different names for the same thing? The answer is a definitive no. While they are conceptually linked, both arising from the same physical interactions, there is no simple, universal formula that converts one into the other. Correlation energy is an energy, measured in Joules or Hartrees. Mutual information is a dimensionless quantity measuring uncertainty reduction. They are different physical properties. While a more strongly correlated system will often exhibit both a higher mutual information and a larger (in magnitude) correlation energy, one is not a simple function of the other.

This is a crucial lesson. The lens of information theory is incredibly powerful. It provides a common language to describe processes in engineering, biology, and physics, revealing deep and unexpected unities. But it is one lens among many. The map, however beautiful and useful, is not the territory. Understanding the relationships—and the differences—between concepts like information, energy, and entropy is what allows us to build a richer, more complete picture of our world.