Schur-Horn Theorem

SciencePedia

Key Takeaways

The Schur-Horn theorem establishes a fundamental relationship for Hermitian matrices: the vector of its diagonal entries is always majorized by the vector of its eigenvalues.
This majorization relationship arises because the diagonal entries are a convex combination of the eigenvalues, a "mixing" process mathematically described by a doubly stochastic matrix.
Geometrically, the set of all possible diagonal vectors for a given set of eigenvalues forms a convex polytope called a permutope, whose vertices are the permutations of the eigenvalues.
The theorem has critical applications in setting hard limits for optimization problems and in quantum mechanics, where it connects a state's energy spectrum to its possible measurement probabilities.

Introduction

In the study of linear algebra, certain properties of a matrix, like its eigenvalues, remain constant under transformations, acting as its fundamental fingerprint. In contrast, other properties, such as the entries on its main diagonal, can change dramatically depending on the chosen perspective or basis. This raises a crucial question: is there a hidden law governing the relationship between the fixed, intrinsic eigenvalues of a matrix and its variable diagonal entries? This gap in understanding prevents a full appreciation of how a system's core properties manifest in specific measurements.

This article bridges that gap by delving into the Schur-Horn theorem, one of the most elegant results in matrix theory. We will explore how this theorem provides a precise and powerful answer to our question. The first chapter, "Principles and Mechanisms," will unpack the core of the theorem, introducing the concept of majorization and revealing the mathematical machinery that connects eigenvalues to their diagonal counterparts. Following this, the chapter on "Applications and Interdisciplinary Connections" will demonstrate how this abstract principle has profound, practical implications in fields ranging from quantum mechanics to engineering optimization. Let's begin by exploring the tale of these two sets of numbers and the beautiful rules that bind them.

Principles and Mechanisms

Imagine you have a block of clay. You can shape it into a sphere, a cube, or a long, thin rod. In all these transformations, the volume of clay remains constant, but its dimensions—its length, width, and height—change dramatically. Matrix theory has a surprisingly similar story. For a special class of matrices called Hermitian matrices (which are central to quantum mechanics and many areas of physics), there's a set of fundamental numbers called eigenvalues that are like the total amount of clay. They are an intrinsic property of the matrix and don't change, no matter how you "rotate" your perspective. But the numbers on the matrix's main diagonal, like the dimensions of our clay block, do change with our perspective. The fascinating question is: how are these two sets of numbers—the immutable eigenvalues and the changeable diagonal entries—related?

The answer is one of the most elegant results in linear algebra, the Schur-Horn theorem. It's not just a dry formula; it's a story about constraints, about how much you can concentrate or spread out a set of values. It's a principle that governs everything from the possible energy measurements in a quantum system to the solution of optimization problems.

A Tale of Two Sets of Numbers

Let's get our characters straight. A Hermitian matrix $A$ is a square matrix that is equal to its own conjugate transpose. A key feature is that its eigenvalues $(\lambda_1, \lambda_2, \dots, \lambda_n)$ are always real numbers. You can think of them as the "true" or "natural" scaling factors of the system the matrix describes. For example, in quantum mechanics, they represent the fixed, quantized energy levels of a physical system.

On the other hand, the diagonal entries $(a_{11}, a_{22}, \dots, a_{nn})$ represent what we "see" from a particular point of view, or in the language of physics, a particular basis. Changing the basis (which is like rotating our coordinate system) changes the matrix $A$ into a new matrix $A' = U A U^\dagger$ via a unitary transformation $U$ . This leaves the eigenvalues untouched, but it can completely change the diagonal entries.

So, our story is about the relationship between the vector of eigenvalues, let's call it $\lambda$ , and the vector of diagonal entries, let's call it $d$ .

The First Rule: A Sum That Never Changes

The most straightforward connection between the eigenvalues and the diagonal is their sum. The sum of the diagonal entries of a matrix is called its trace, denoted $\text{tr}(A)$ . It's a remarkable fact that the trace is also equal to the sum of the eigenvalues.

\sum_{i=1}^n a_{ii} = \sum_{i=1}^n \lambda_i

This is a powerful first constraint. If the eigenvalues of a quantum system are $\{10, 5, -3\}$ , their sum is $12$ . This means any possible set of diagonal entries $(d_1, d_2, d_3)$ that you could ever hope to measure must also sum to $12$ . This is our "conservation of clay" rule.

But this can't be the whole story. The vector $(11, 2, -1)$ also sums to 12, but we'll soon see it's an impossible set of diagonal entries for a matrix with eigenvalues $\{10, 5, -3\}$ . There must be a subtler, more profound law at play.

Majorization: The Law of "Spreading"

The deeper relationship discovered by Issai Schur in 1923 is a concept called majorization. In simple terms, majorization is a precise mathematical way of saying that one vector is "more spread out" than another. The Schur-Horn theorem tells us that the vector of eigenvalues is always more spread out than the vector of its diagonal entries.

Let's make this concrete. Take two vectors of real numbers, $x$ and $y$ , each with $n$ components. First, sort them both in descending order, let's call the sorted versions $x^\downarrow$ and $y^\downarrow$ . We say that  $x$ is majorized by $y$ , written as $x \prec y$ , if two conditions hold:

The sum of the largest $k$ entries of $x$ is less than or equal to the sum of the largest $k$ entries of $y$ , for every $k$ from $1$ to $n-1$ . $\sum_{i=1}^k x^\downarrow_i \le \sum_{i=1}^k y^\downarrow_i \quad \text{for } k=1, 2, \dots, n-1$
Their total sums are equal. $\sum_{i=1}^n x^\downarrow_i = \sum_{i=1}^n y^\downarrow_i$

The second condition is just our old friend, the trace rule. The first condition is the new, subtle part. It puts a limit on how "top-heavy" the diagonal entries can be. The single largest diagonal entry can't be bigger than the single largest eigenvalue. The sum of the two largest diagonal entries can't be bigger than the sum of the two largest eigenvalues, and so on.

Let's see this in action. Consider the Hermitian matrix from a simple exercise:

A = \begin{pmatrix} 1 1 0 \\ 1 2 1 \\ 0 1 1 \end{pmatrix}

Its eigenvalues can be calculated to be $\lambda = \{3, 1, 0\}$ , which sorted are $\lambda^\downarrow = (3, 1, 0)$ . The diagonal entries are $d = \{1, 2, 1\}$ , which sorted are $d^\downarrow = (2, 1, 1)$ .

Now let's check the majorization conditions for $d \prec \lambda$ :

For $k=1$ : $d^\downarrow_1 = 2 \le \lambda^\downarrow_1 = 3$ . (The largest diagonal is no larger than the largest eigenvalue). The condition holds.
For $k=2$ : $d^\downarrow_1 + d^\downarrow_2 = 2 + 1 = 3 \le \lambda^\downarrow_1 + \lambda^\downarrow_2 = 3 + 1 = 4$ . The condition holds.
For $k=3$ (the trace rule): $d^\downarrow_1 + d^\downarrow_2 + d^\downarrow_3 = 2 + 1 + 1 = 4$ and $\lambda^\downarrow_1 + \lambda^\downarrow_2 + \lambda^\downarrow_3 = 3 + 1 + 0 = 4$ . They are equal.

All conditions are met! The vector of diagonal entries is indeed majorized by the vector of eigenvalues. The "gap" between the partial sums, such as the 2.7 found in another example, quantifies how much "smoother" the diagonal is compared to the spiky eigenvalues.

The "Mixing" Machine: Why Diagonals are Smoothed-Out Eigenvalues

So, why does this happen? The reason is beautiful and lies at the heart of quantum mechanics and linear algebra. The diagonal entries aren't independent of the eigenvalues; they are, in fact, a special kind of average of them.

Any Hermitian matrix $A$ can be written as $A = V \Lambda V^\dagger$ , where $\Lambda$ is a diagonal matrix containing the eigenvalues $(\lambda_1, \dots, \lambda_n)$ and $V$ is a unitary matrix whose columns are the corresponding orthonormal eigenvectors. If we write out the formula for a single diagonal entry $a_{ii}$ , we find something remarkable:

a_{ii} = \sum_{j=1}^n |V_{ij}|^2 \lambda_j

Look closely at this equation. Each diagonal entry $a_{ii}$ is a weighted average of all the eigenvalues $\lambda_j$ . The weights are the numbers $|V_{ij}|^2$ . And what are these weights? Since $V$ is a unitary matrix, the sum of the squares of the elements in any row is 1 ( $\sum_j |V_{ij}|^2 = 1$ ), and in any column is also 1 ( $\sum_i |V_{ij}|^2 = 1$ ). A matrix of non-negative numbers whose rows and columns all sum to 1 is called a doubly stochastic matrix.

So, the diagonal entries are born from the eigenvalues through a "mixing process" described by this doubly stochastic matrix $S_{ij} = |V_{ij}|^2$ . Averaging things tends to smooth them out and make them less extreme. Imagine having buckets of paint with different shades of red (the eigenvalues). A doubly stochastic matrix is like a recipe for creating new shades (the diagonal entries) by mixing the original ones. The new shades will never be more vibrant or extreme than the most vibrant original shade. This is the physical intuition behind majorization!

A Geometric Masterpiece: The Permutope

Schur proved that the diagonal is always majorized by the eigenvalues. But the story got even better. In 1954, Alfred Horn proved the converse: if a vector $d$ is majorized by a vector $\lambda$ , then you are guaranteed to be able to find a Hermitian matrix with eigenvalues $\lambda$ and diagonal entries $d$ .

This "if and only if" result is incredibly powerful. It gives us a complete characterization of all possible outcomes. Going back to our quantum system with eigenvalues $\lambda = (10, 5, -3)$ , we can now definitively check which sets of measurements are possible. A proposed diagonal $d = (11, 2, -1)$ is impossible because its largest value, $11$ , is greater than the largest eigenvalue, $10$ , violating the first majorization inequality. However, $d = (8, 6, -2)$ is possible because it satisfies all the majorization rules.

The set of all possible diagonal vectors $d$ that can be formed from a given set of eigenvalues $\lambda$ has a beautiful geometric structure. It forms a convex polytope in $n$ -dimensional space called a permutope. The vertices of this shape are simply all the permutations of the eigenvalue vector $\lambda$ , like $(10, 5, -3)$ , $(10, -3, 5)$ , $(5, 10, -3)$ , and so on. Any achievable diagonal vector is just a point inside or on the boundary of this shape! It is a convex combination of the vertices. This transforms a problem in matrix algebra into a stunningly clear picture in geometry.

Beyond the Horizon: From Hermitian to Normal

The power of this core idea—that diagonals are a "convex combination" of eigenvalues—extends even beyond the world of real-numbered eigenvalues. It also applies to normal matrices, which are matrices that commute with their conjugate transpose ( $A A^\dagger = A^\dagger A$ ). These matrices can have complex eigenvalues and complex diagonal entries.

Even in this more general setting, the relationship holds: the vector of diagonal entries $d = (a_{11}, \dots, a_{nn})$ is a convex combination of the eigenvalues $\lambda = (\lambda_1, \dots, \lambda_n)$ . This allows us to solve interesting optimization problems. For instance, if we want to maximize the sum of the magnitudes of the diagonal entries, $\sum_i |a_{ii}|$ , for a normal matrix with a given set of eigenvalues, the principle of convexity tells us the maximum must occur at an extreme point. The "most extreme" or "least mixed" cases are when the doubly stochastic matrix is a permutation matrix. This means the diagonal entries are simply a permutation of the eigenvalues themselves.

So, to get the largest possible sum of magnitudes, you just need to set the diagonal entries to be the eigenvalues, and the maximum value is simply the sum of the magnitudes of those eigenvalues. What begins as a simple question about matrices unfolds into a deep principle connecting algebra, geometry, and physics, revealing a hidden order and unity in the mathematical world.

Applications and Interdisciplinary Connections

After our journey through the elegant proofs and geometric underpinnings of the Schur-Horn theorem, you might be wondering, "What is this all for?" It is a fair question. Mathematics is often presented as a pristine, abstract structure, and it is easy to lose sight of its power to describe and constrain the world we live in. The Schur-Horn theorem, however, is not a mere curiosity of matrix algebra. It is a surprisingly practical and profound tool, a sharp lens through which we can understand limits and possibilities in fields as diverse as engineering optimization and the strange realm of quantum mechanics.

Think of it this way: the eigenvalues of a Hermitian matrix are its intrinsic, unchanging essence. They are like the total amount of energy, momentum, or some other conserved quantity in a physical system. The diagonal entries, on the other hand, represent how that essence is distributed or observed in a particular coordinate system or basis. The Schur-Horn theorem is the fundamental law that governs this distribution. It tells us that while you can shuffle the energy around, you cannot do so arbitrarily. There are hard limits, and majorization provides the precise rules of this game.

The Art of Optimization: Sculpting Matrices

The most direct application of the theorem is in the world of optimization. If the diagonal of a matrix represents costs, probabilities, or physical measurements, the Schur-Horn theorem tells us the absolute best- and worst-case scenarios for these values, given a fixed set of eigenvalues.

Imagine you have designed a system—perhaps a mechanical structure or an electrical network—and its fundamental modes of vibration or response are given by a set of eigenvalues. The diagonal entries of the system's matrix might represent the stress or load on specific components. A natural question is: what is the maximum stress any single component might have to endure? The theorem gives a startlingly simple answer: no single diagonal entry can ever be larger than the largest eigenvalue. But it tells us more. Suppose we want to make the system as "balanced" as possible by minimizing the largest stress on any component. The majorization inequalities allow us to calculate the absolute minimum value that this largest diagonal entry can take. Often, this minimum is achieved when the diagonal entries are as uniform, or "democratic," as possible. The theorem provides the precise lower bound, a guaranteed safety margin for our design.

We can ask more sophisticated questions. Instead of just one component, what is the maximum total stress we can find concentrated in a specific subsystem, say, the first two components? That is, what is the maximum of $d_1 + d_2$ ? Once again, majorization provides the answer: this sum can never exceed the sum of the two largest eigenvalues, $\lambda_1 + \lambda_2$ . The set of all possible diagonal vectors forms a beautiful geometric object known as a permutohedron—the convex hull of all permutations of the eigenvalues. Maximizing a sum like $d_1 + d_2$ is equivalent to finding the point on this shape that is farthest in a particular direction, which will always be one of the corners corresponding to a specific permutation of the eigenvalues. This transforms a complex matrix problem into a more intuitive geometric one.

A Conservation Law for Matrix Elements

The theorem also reveals a hidden conservation law. The "total size" of a matrix, as measured by the sum of the squares of all its elements (the squared Frobenius norm, $\|A\|_F^2$ ), is completely determined by its eigenvalues: $\|A\|_F^2 = \sum_i \lambda_i^2$ . This quantity is fixed, a constant of the system. We can also write this sum as the contribution from the diagonal and the off-diagonal elements: $\|A\|_F^2 = \sum_i |d_{ii}|^2 + \sum_{i \neq j} |a_{ij}|^2$ .

Now, let's put these two facts together. If we know the eigenvalues and we also know the diagonal entries, the Schur-Horn theorem first tells us if this combination is even possible. If it is, then the total magnitude of all the off-diagonal elements is no longer a variable; it is fixed! It is whatever is "left over" after the diagonal has taken its share of the total squared norm defined by the eigenvalues. This is a powerful statement. If you try to force the diagonal entries to be very different from the eigenvalues, the off-diagonal elements must grow in magnitude to compensate. There is no escape; the matrix elements are locked in a deep relationship, and the Schur-Horn theorem is its constitution.

Bridging Worlds: Quantum Mechanics and Information

The connection to the real world becomes astonishingly direct when we step into the quantum realm. In quantum information theory, the state of a system is described by a density matrix, $\rho$ , which is a Hermitian, positive semi-definite matrix with a trace of one. The constraints are not just mathematical conventions; they are physical laws.

The eigenvalues of $\rho$ are fundamental properties of the quantum state, related to its purity and information content. The diagonal elements, $\rho_{ii}$ , in a given basis, have a direct physical meaning: they are the probabilities of finding the system in the corresponding basis state upon measurement. A change of basis, which corresponds to looking at the system from a different angle, is represented by a unitary transformation, $\rho \to U\rho U^\dagger$ . This changes the diagonal elements, but not the eigenvalues.

So, the question "Given a quantum state with a specific spectrum, what are the possible probabilities we can measure?" is precisely the question the Schur-Horn theorem answers. The vector of probabilities is majorized by the vector of eigenvalues. This allows us to calculate, for instance, the minimum possible value for the largest measurement probability. The answer turns out to be tremendously insightful: we can often find a basis where all measurement outcomes are equally likely, up to the limits imposed by majorization. The theorem can also solve more complex, constrained problems, such as finding the range of possible probabilities when an experiment imposes certain symmetries or conditions on the state.

Perhaps the most beautiful application is in quantifying "quantumness" itself. The off-diagonal elements of a density matrix are responsible for quantum coherence—the property that allows for superposition and interference, the heart of quantum mechanics. A natural question is: for a state with a given energy spectrum (eigenvalues), what is the maximum amount of coherence it can possibly store? This is a question about maximizing the sum of the squared magnitudes of the off-diagonal elements. Using the "conservation law" we discussed earlier, this is equivalent to minimizing the sum of the squares of the diagonal elements (the probabilities). The Schur-Horn theorem, via the theory of Schur-convex functions, tells us exactly how to do this: the sum $\sum_i \rho_{ii}^2$ is minimized when the probabilities $\rho_{ii}$ are as uniform as possible. This reveals a deep trade-off: to maximize a state's quantum coherence, you must spread its classical probabilities as thinly as possible. The quantum and classical aspects of a state are entwined, and the Schur-Horn theorem dictates the terms of their relationship.

The Wider Mathematical Universe

Finally, the principles underlying the Schur-Horn theorem echo throughout mathematics. The core ideas of majorization, doubly stochastic matrices, and optimization over permutations are not isolated. For example, in problems involving the minimization or maximization of trace functionals, like $\mathrm{tr}(AUBU^*)$ , the solution often involves the rearrangement inequality, which states that the sum $\sum_i a_i b_{\sigma(i)}$ is minimized when one sequence is sorted ascendingly and the other descendingly. This is no coincidence. The proof that this minimum is achieved often passes through the very same logic of doubly stochastic matrices and permutation vertices that underpins Horn's part of our theorem.

This family of ideas extends to powerful results in numerical analysis and data science, such as in matrix proximity problems. If you want to find the matrix in the unitary orbit of a diagonal matrix $D$ that is closest to a given matrix $Y$ , the answer is given by a theorem that can be seen as a generalization of Schur-Horn principles. You must align the singular values of $Y$ with the eigenvalues of $D$ in the right way—a striking parallel to the rearrangement inequality. This result is fundamental for matrix approximation algorithms used in everything from signal compression to machine learning.

From a simple statement about the diagonals and eigenvalues of a single matrix, we have found a principle that constrains engineering designs, quantifies the essence of quantum states, and resonates with deep theorems in optimization and analysis. It is a testament to the unity of science and mathematics, where a single, elegant idea can illuminate a vast landscape of different fields, revealing the hidden rules that govern them all.