Lidskii Theorem

SciencePedia

Key Takeaways

The Lidskii theorem states that the vector of eigenvalues from a sum of Hermitian matrices is majorized by the vector formed by summing their individual eigenvalues.
Majorization is a mathematical concept that rigorously shows how the interaction of non-commuting matrices makes the resulting eigenvalues less spread out than a simple sum would suggest.
In perturbation theory, the theorem provides a crucial upper bound on how much a system's eigenvalues can shift, guaranteeing stability in fields like quantum mechanics and engineering.
The principle extends to non-Hermitian matrices, showing the real parts of their eigenvalues are majorized by the eigenvalues of their purely conservative Hermitian part.

Introduction

The addition of matrices, a fundamental operation in linear algebra, holds a deceptive simplicity. While adding corresponding elements is trivial, predicting the properties of the resulting matrix—specifically its eigenvalues—is a profound challenge. For simple, commuting matrices, eigenvalues add up predictably. However, when matrices do not commute, as is common in quantum mechanics and data science, their interactions introduce a seeming chaos, where the eigenvalues of the sum are not the simple sum of the parts. This article addresses this gap, revealing the elegant mathematical order that governs this complexity.

The following chapters will guide you through this principle. First, in "Principles and Mechanisms," we will introduce the concept of majorization and explore the Lidskii theorem itself, which establishes a clear rule for how eigenvalues of a sum behave. Subsequently, in "Applications and Interdisciplinary Connections," we will demonstrate the theorem's immense practical power, showing its use in analyzing system stability in quantum physics, its generalization in functional analysis, and its surprising links to other areas of mathematics. We begin by uncovering the core principles that tame the apparent anarchy of matrix addition.

Principles and Mechanisms

Imagine you have two ingredients. If you mix a cup of water at 20°C with a cup of water at 80°C, you get two cups of water at 50°C. The properties add up and average out in a simple, predictable way. But what if you mix baking soda and vinegar? You don't just get a simple mixture; you get a fizzing, bubbling reaction that produces something entirely new. The properties of the sum are not the sum of the properties.

The world of matrices, the mathematical heart of quantum mechanics and data science, is much more like mixing chemicals than mixing water. Adding two matrices is easy—you just add the corresponding numbers. But understanding the properties of the resulting matrix is a far more subtle and beautiful adventure. The most important properties of a matrix are its eigenvalues—special numbers that represent its fundamental characteristics, like the vibrational frequencies of a bridge, the energy levels of an atom, or the principal components of a dataset. So, the crucial question becomes: if we know the eigenvalues of two matrices, $A$ and $B$ , what can we say about the eigenvalues of their sum, $C = A+B$ ?

The Deceptively Simple Sum

Let’s start in a peaceful, orderly world. Some matrices are like well-behaved ingredients that don't interact in surprising ways. These are commuting matrices, where the order of multiplication doesn't matter ( $AB=BA$ ). A classic example is a pair of diagonal matrices—matrices with non-zero numbers only on the main diagonal.

If we take $A = \mathrm{diag}(5, 3, 1)$ and $B = \mathrm{diag}(4, 2, 1)$ , their eigenvalues are simply their diagonal entries: $\lambda(A) = (5, 3, 1)$ and $\lambda(B) = (4, 2, 1)$ . Their sum is also a diagonal matrix, $A+B = \mathrm{diag}(9, 5, 2)$ , whose eigenvalues are just the sums of the individual eigenvalues: $\lambda(A+B) = (9, 5, 2)$ . Everything is simple and additive. This scenario, where the eigenvalues of the sum are the sum of the eigenvalues, happens when the matrices share a common set of "principal axes," or eigenvectors. They act along the same directions, so their effects simply add up.

Anarchy, or a Deeper Order?

But most of the time, matrices don’t commute. They represent rotations, stretches, and shears along different axes. Their effects interfere. This is where the chaos—and the beauty—begins.

Consider two very simple matrices that represent interactions on different axes of a 3D system:

A = \begin{pmatrix} 0 a 0 \\ a 0 0 \\ 0 0 0 \end{pmatrix}, \quad B = \begin{pmatrix} 0 0 0 \\ 0 0 b \\ 0 b 0 \end{pmatrix}

$A$ shuffles energy between the first and second dimensions, while $B$ shuffles it between the second and third. Their eigenvalues are easy to find: $\lambda(A) = (a, 0, -a)$ and $\lambda(B) = (b, 0, -b)$ . If life were simple, we'd expect the eigenvalues of their sum to be the sum of their eigenvalues, which, after sorting, would be the vector $v = (a+b, 0, -a-b)$ .

But let's look at the sum matrix, $C = A+B$ :

C = \begin{pmatrix} 0 a 0 \\ a 0 b \\ 0 b 0 \end{pmatrix}

The eigenvalues of this new matrix are not what we might have guessed. They are, in fact, the vector $c = (\sqrt{a^2+b^2}, 0, -\sqrt{a^2+b^2})$ . The vector $c$ is clearly different from $v$ ! The interaction between the matrices has fundamentally altered the outcome.

Is there any pattern here? It seems like anarchy. But look closer. The total sum is the same: $(a+b) + 0 + (-a-b) = 0$ , and $\sqrt{a^2+b^2} + 0 + (-\sqrt{a^2+b^2}) = 0$ . And for the largest eigenvalue, we notice that $a+b \ge \sqrt{a^2+b^2}$ . It seems the "expected" eigenvalues from simple addition are more spread out than the "actual" ones. This isn't anarchy; it's a hint of a deeper, more elegant law.

Majorization: A Tool for Taming the Spread

To describe this law, we need a wonderful mathematical tool called majorization. It's a way of saying that one collection of numbers is "more spread out" or "more concentrated" than another.

Let's say we have two vectors of numbers, $x$ and $y$ , sorted from largest to smallest. We say that $x$ majorizes $y$ , written $x \succ y$ , if two conditions are met:

The partial sums of $x$ are always greater than or equal to the partial sums of $y$ .
The total sum of all numbers in $x$ is equal to the total sum of numbers in $y$ .

For instance, the vector $x = (10, 1, 1)$ majorizes $y = (6, 4, 2)$ . Both sum to 12. But the partial sums show the difference in spread:

$10 \ge 6$
$10+1 \ge 6+4$

The wealth in $x$ is highly concentrated in the first element, while in $y$ it is more evenly distributed. Majorization captures this idea of "less spread out" with mathematical rigor.

The Lidskii-Weyl Law: Order Restored

Now we can state the beautiful law that governs the addition of eigenvalues for Hermitian matrices (the matrix equivalent of real numbers, which always have real eigenvalues). First discovered by Hermann Weyl and later proven in its full generality by Victor Lidskii, the theorem states:

The vector of eigenvalues of a sum of Hermitian matrices, $\lambda(A+B)$ , is majorized by the vector sum of their individual eigenvalues, $\lambda(A) + \lambda(B)$ .

In our notation: $\lambda(A+B) \prec \lambda(A) + \lambda(B)$ .

The chaos is gone. The confounding interaction between non-commuting matrices always acts to make the resulting eigenvalues less spread out than a simple sum would suggest. Let's check our example from before. We had $v = (a+b, 0, -(a+b))$ and $c = (\sqrt{a^2+b^2}, 0, -\sqrt{a^2+b^2})$ .

Partial sum 1: $a+b \ge \sqrt{a^2+b^2}$ . True.
Total sum: $0 = 0$ . True. So indeed, $c \prec v$ . The law holds! The non-commutativity introduced a kind of "averaging" or "smoothing" effect.

A powerful consequence of majorization is that for any convex function $f(x)$ (a function shaped like a bowl, e.g., $f(x)=x^2$ or $f(x)=|x|$ ), if $x \succ y$ , then $\sum f(x_i) \ge \sum f(y_i)$ . In our example, this means $\sum (a_i+b_i)^2 \ge \sum c_i^2$ . The calculation in problem 1078405 confirms this, showing the difference is exactly $2ab$ , a non-negative quantity.

Dancing on the Edge: Finding the Limits of Possibility

Lidskii's theorem gives us a boundary, a rule of decorum that the eigenvalues of $A+B$ must obey. But within that boundary, what outcomes are possible? It turns out that the set of all possible eigenvalue vectors for $A+B$ (given fixed eigenvalues for $A$ and $B$ ) forms a beautiful geometric shape called a convex polytope.

The "corners" or extreme points of this shape are found in a fascinating way. They correspond to adding the sorted eigenvalues of $A$ to all possible permutations of the eigenvalues of $B$ . By testing these permutations, we can map out the absolute limits for any given eigenvalue of the sum. For instance, to find the highest possible value for the second-largest eigenvalue of $A+B$ , you would systematically check all combinations like $(\lambda_1(A)+\lambda_1(B), \lambda_2(A)+\lambda_2(B), ...)$ , $(\lambda_1(A)+\lambda_2(B), \lambda_2(A)+\lambda_1(B), ...)$ , and so on, re-sorting each result and checking the second component. This tells us precisely how much "mixing" can affect the outcome.

There is also a lower bound, discovered by Helmut Wielandt, which tells us that $\lambda(A+B)$ majorizes the vector sum of $\lambda(A)$ and the reverse-ordered eigenvalues of $B$ . Together, these theorems from Lidskii and Wielandt fence in the possibilities, providing both upper and lower bounds on how the eigenvalues can behave. There is order and structure, even in this complex interplay.

Why This Matters: The Stability of a Shaken World

This might seem like a niche mathematical curiosity, but it's one of the cornerstones of modern physics and engineering. We often model the world with a matrix $A$ representing an "ideal" system. But the real world is messy; there are always small disturbances, errors, or external forces. We can model this as adding a small "perturbation" matrix $E$ . The question then becomes: how does the system, now described by $A+E$ , change?

In quantum mechanics, the eigenvalues of a Hamiltonian matrix are the discrete energy levels of an atom or molecule. If you place that atom in a weak electric field (a perturbation $E$ ), Lidskii's theorem tells you how those energy levels can shift. In mechanical engineering, the eigenvalues of a structural matrix are the resonant frequencies of a bridge or airplane wing. If the structure suffers minor damage or fatigue ( $E$ ), how much can those critical frequencies change?

A majorization theorem on perturbations provides a stunningly elegant answer. The vector of eigenvalue shifts, $(\mu_i - \lambda_i)$ , where $\mu_i$ are the eigenvalues of $A+E$ and $\lambda_i$ are the eigenvalues of $A$ , is majorized by the vector of the perturbation's own eigenvalues, $\epsilon_i$ .

Using the convex function $f(x)=|x|$ , we get a profoundly useful inequality:

\sum_{i=1}^N |\mu_i - \lambda_i| \le \sum_{i=1}^N |\epsilon_i|

The total magnitude of the shifts in the system's eigenvalues is bounded by the total magnitude of the perturbation's eigenvalues (its so-called trace norm). This gives us a direct way to quantify the robustness of a system. By analyzing the "size" of a potential disturbance, we can put a hard limit on the "damage" it can do to the system's fundamental properties.

Beyond the Looking-Glass: A Glimpse into the Complex World

So far, we have lived in the pristine world of Hermitian matrices, whose eigenvalues are always real. But many real-world systems, especially those with dissipation like friction or electrical resistance, are described by non-Hermitian matrices with complex eigenvalues. Does this beautiful structure fall apart?

No. It becomes even more profound. Any square matrix $A$ can be split into its Hermitian part $H = \frac{1}{2}(A+A^*)$ and its anti-Hermitian part. Lidskii's theorem has a spectacular generalization, first proved by Issai Schur:

The vector of the real parts of the eigenvalues of any matrix $A$ is majorized by the vector of eigenvalues of its Hermitian part, $H$ .

Symbolically: $\text{Re}(\lambda(A)) \prec \lambda(H)$ .

What does this mean? The Hermitian part $H$ represents the conservative, energy-storing aspects of the system. The non-Hermitian part is associated with effects like rotation, dissipation, and gain. This theorem tells us that these non-conservative effects can only act to pull the real parts of the eigenvalues inward, making them less spread out than the eigenvalues of the purely conservative part of the system. The "non-normality" of the matrix ( $AA^* \neq A^*A$ ) creates "slacks" in the majorization inequalities, pulling the spectrum together.

From a simple question about adding matrices, we've taken a journey deep into the structure of linear algebra. We've discovered that what at first looks like chaos is governed by a subtle and elegant principle of order—majorization. This principle not only provides a framework for understanding matrix addition but also gives us powerful, practical tools to analyze the stability of the physical world and reveals a deep connection between the geometry of a matrix and the hidden distribution of its eigenvalues. This is the inherent beauty and unity of mathematics: finding the simple, powerful laws that create order in a complex world.

Applications and Interdisciplinary Connections

Now, we have spent some time getting to know the Lidskii theorem and its relatives, turning them over in our hands to appreciate their logical structure. The ideas are elegant, even beautiful, in their abstract mathematical purity. But you might be asking a perfectly reasonable question: “What is all this good for?” Where does this intricate machinery of majorization and eigenvalue inequalities actually meet the real world?

It’s one thing to prove that for any two Hermitian matrices $A$ and $B$ , the vector of eigenvalues of their sum, $\lambda(A+B)$ , is majorized by the sum of their individual eigenvalue vectors. It’s quite another to see what this statement does. The true power of a physical or mathematical principle isn’t just in its truth, but in its consequences. And the consequences of Lidskii’s theorem are vast and surprising, reaching from the quantum realm of atoms to the abstract plains of complex analysis. Let's embark on a journey to see where this theorem's shadow falls.

Taming the Jiggle: Perturbation Theory and Quantum Physics

In physics, we rarely know anything perfectly. Our models are almost always approximations. We might have a beautiful, simple model of a hydrogen atom, but then we have to account for the “perturbations”—the jiggle from an external magnetic field, the subtle interactions we initially ignored. The question is, how much does this jiggle change the system's fundamental properties, like its allowed energy levels?

These energy levels are nothing but the eigenvalues of the system's Hamiltonian operator, which we can think of as a large matrix, $H_0$ . The perturbation is another, typically smaller, matrix, $V$ . The new, perturbed system is described by the sum $H_0 + V$ . Lidskii’s theorem and its corollaries give us a powerful way to put a leash on the effects of $V$ . They provide a sharp, unambiguous bound on how much the energy levels can shift.

For instance, we can ask: what is the maximum possible shift in the sum of the $k$ highest energy levels? Lidskii’s theorem, through a result known as the Ky Fan inequality, gives a remarkably simple answer: the maximum possible increase is precisely the sum of the $k$ largest eigenvalues of the perturbation matrix $V$ itself. It’s as if the perturbation has a certain "disruption budget" given by its own eigenvalues, and it can spend that budget to push the original system's energies around, but it cannot overspend. This allows physicists to guarantee the stability of a system. Even if we don’t know the exact details of the perturbation $V$ , but we know something about its "size" (its eigenvalues or norm), we can still make concrete predictions about the perturbed system.

This principle extends to the very heart of quantum information theory. Consider a composite quantum system, like two entangled particles (let’s call them qutrits, three-level systems) shared between Alice and Bob. The state of the whole system is described by a vector, but Alice, who only has access to her particle, sees a blurred picture described by a "reduced" density matrix, $\rho_A$ . The eigenvalues of $\rho_A$ tell her the probabilities of finding her particle in certain fundamental states. Now, suppose Alice wants to measure the energy of her particle, corresponding to a local Hamiltonian $H_A$ . A fascinating question arises: by asking Bob to apply operations on his particle (which, due to entanglement, affects the whole system), what is the maximum energy Alice can possibly measure on her end? This isn’t an academic question; it’s about controlling and extracting information from a quantum system.

The answer, once again, is a beautiful application of the same family of ideas. The maximum possible energy is found by arranging the eigenvalues of Alice’s Hamiltonian $H_A$ and her density matrix $\rho_A$ in descending order, and then summing their products. You pair the largest with the largest, the second-largest with the second-largest, and so on. This is a direct consequence of the von Neumann trace inequality, a close cousin of Lidskii's theorem. The abstract mathematics of eigenvalue ordering directly predicts a physical limit on measurable energy.

The Analyst's Playground: From Finite to Infinite

The story doesn't end with the finite-dimensional matrices of introductory quantum mechanics. Many of the most important systems in physics and engineering live in infinite-dimensional spaces. Think of a vibrating string, where the state is a function, not a finite list of numbers. The operators here are often integral operators, and the mathematics that governs them is known as functional analysis.

It turns out that Lidskii's theorem has a "big brother" in this infinite world, known as Lidskii's Trace Formula. It applies to a special class of operators called "trace-class" operators, which are, in a sense, small enough to behave nicely. For such an operator, even if it’s not self-adjoint and its eigenvalues are scattered across the complex plane, this profound theorem states that the sum of all its eigenvalues (counted with their multiplicity) is exactly equal to its trace—the sum of its diagonal elements.

This is a stunning statement of conservation. Imagine you start with a simple, self-adjoint operator $T$ whose eigenvalues are all real and well-behaved. Now, you add a non-self-adjoint perturbation $P$ . The eigenvalues of the new operator $T+P$ might scatter wildly. But the theorem guarantees that the "center of mass" of these eigenvalues, their sum, has moved in a perfectly predictable way: $\sum \lambda_i(T+P) = \sum \lambda_i(T) + \mathrm{Tr}(P)$ .

What’s even more remarkable is how this abstract concept connects to concrete calculations. For many integral operators, which are defined by a kernel function $K(x,y)$ , the abstract "trace" manifests as a simple integral of the kernel along its diagonal: $\mathrm{Tr}(T) = \int K(x,x) dx$ . This bridges the gap between abstract functional analysis and the practical world of integral equations. An esoteric sum over an infinite set of eigenvalues becomes something you can actually compute.

A Web of Connections: Unexpected Cousins

The influence of Lidskii's ideas ripples out, forming unexpected connections with other branches of mathematics and science.

One of the most beautiful results in this family is the Lidskii-Wielandt theorem, which answers a more ambitious question. Instead of just bounding the eigenvalues of a sum $A+B$ , it describes the entire set of all possible eigenvalue vectors that $\lambda(A+B)$ can be, given the fixed spectra of $A$ and $B$ . The answer is a geometric shape: a convex polytope in $n$ -dimensional space, whose vertices are determined by the different ways of pairing the eigenvalues of $A$ and $B$ . This gives us a complete map of possibilities, allowing us to find the absolute maximum (or minimum) for any combination of the resulting eigenvalues, such as $\lambda_2 + \lambda_3$ .

These majorization results also have direct analogues for singular values, which measure how a matrix stretches space. This leads to powerful inequalities for matrix norms, like the Ky Fan norms, which are fundamental tools in numerical analysis for understanding the stability of algorithms and the propagation of errors.

The connection to quantum statistical mechanics provides another fascinating avenue. A central object in this field is the partition function, often expressed as $\mathrm{Tr}(\exp(-\beta H))$ , where $H$ is the Hamiltonian. Calculating this for a sum of non-commuting operators, $A+B$ , is notoriously difficult. However, Lidskii's majorization theorem, when combined with another beautiful piece of mathematics called Karamata's inequality (which relates majorization to convex functions), provides a simple and elegant upper bound. The fact that the exponential function is convex means we can immediately say $\mathrm{Tr}(\exp(A+B)) \le \sum_i \exp(\lambda_i(A) + \lambda_i(B))$ . This links the abstract ordering of eigenvalues directly to thermodynamic quantities.

Perhaps the most breathtaking connection is with complex analysis—the study of functions of complex variables. Consider a compact operator $A$ on an infinite-dimensional space. One can construct an entire function (a function analytic on the whole complex plane) called the Fredholm determinant, $F(\lambda) = \det(I - \lambda A)$ , whose roots are the reciprocals of the eigenvalues of $A$ . The "order" of this function, which describes how fast it grows at infinity, is a fundamental characteristic. How could we determine this? Remarkably, Weyl's inequalities, which are deeply related to Lidskii's theorem, tell us that the decay rate of an operator's eigenvalues is controlled by the decay rate of its singular values. By knowing how fast the singular values $s_n$ of $A$ go to zero, we can determine the convergence properties of the sum of powers of the eigenvalues, $\sum |\mu_n|^\tau$ . This, in turn, directly gives us the order of the entire function $F(\lambda)$ . It's a magnificent display of mathematical unity: the discrete sequence of an operator's eigenvalues, whose distribution is constrained by majorization principles, dictates the global analytic behavior of a function across the infinite complex plane.

From the stability of atoms to the energy in a quantum computer, from the trace of an operator to the growth of an entire function, the principles pioneered by Lidskii provide a unifying thread. They reveal that behind the chaotic and complex behavior of summed and perturbed systems, there lies a deep and elegant structure governed by the simple, intuitive idea of ordering.