Sum-of-Squares (SOS) Relaxation: A Unified Framework for Optimization and Control

SciencePedia

Key Takeaways

SOS relaxation transforms difficult non-convex polynomial optimization problems into solvable convex semidefinite programs (SDPs).
The method provides a verifiable certificate of non-negativity by decomposing a polynomial into a sum of squares, which is a sufficient but not always necessary condition.
The dual problem, involving moment matrices, can be used to extract the exact locations of the global minimizers when the relaxation is exact.
Applications of SOS relaxation are vast, spanning from proving stability in control systems and robotics to designing digital filters and exploring the limits of quantum mechanics.

Introduction

Finding the guaranteed minimum of a complex, high-dimensional function is one of the most persistent challenges in science and engineering. For polynomial functions, which describe systems from robotics to economics, this task is akin to finding the lowest valley in a vast, fog-shrouded mountain range; simple methods can easily get trapped in local dips. This article addresses the fundamental problem of verifying global properties of polynomials, a task that is often computationally intractable. It introduces Sum-of-Squares (SOS) relaxation, a powerful framework that translates these intractable algebraic problems into a solvable geometric form. The following chapters will guide you through this revolutionary method. In "Principles and Mechanisms," we will uncover the core idea of using SOS decomposition as a certificate of positivity, see how this is transformed into a solvable semidefinite program (SDP), and explore the profound dual relationship with moment problems that allows for the recovery of optimal solutions. Following this, "Applications and Interdisciplinary Connections" will demonstrate the remarkable breadth of SOS, showcasing its use in proving the safety of autonomous systems, designing advanced digital filters, tackling famously hard computational problems, and even probing the foundations of quantum mechanics.

Principles and Mechanisms

Imagine you are tasked with finding the absolute lowest point in a vast, fog-covered mountain range. The landscape is a polynomial function, $p(x)$ , and the coordinates $x$ can represent anything from the positions and velocities of a robot to the allocation of resources in a factory. A simple walk downhill might land you in a small valley, but how can you be certain it's the lowest point on the entire map and not just a local dip? For centuries, this question of global optimization has been one of the hardest challenges in mathematics. The wiggles and turns of a high-dimensional polynomial create a landscape of bewildering complexity. A direct assault is often hopeless.

What we need is not a better searchlight to pierce the fog, but a different kind of magic altogether. We need a way to make a claim about the entire landscape at once.

The Power of Being Positive: Sums of Squares

Let's rephrase our problem. Finding the minimum value of $p(x)$ is the same as finding the largest number, let's call it $\gamma$ , such that the new polynomial $p(x) - \gamma$ is never negative, no matter where you are in the landscape. If we can prove that $p(x) - \gamma \ge 0$ for all $x$ , then we have found a guaranteed floor, a lower bound on our original polynomial. But how can we prove a function is non-negative everywhere?

This is where a simple, beautiful algebraic idea comes to the rescue. What if we could rewrite our polynomial $p(x) - \gamma$ as a sum of other polynomials squared?

p(x) - \gamma = q_1(x)^2 + q_2(x)^2 + \dots + q_k(x)^2

Since the square of any real number is non-negative, a sum of squares must also be non-negative. There's no way around it! This form is an irrefutable certificate of non-negativity. This is the central idea of Sum-of-Squares (SOS) relaxation: we relax the difficult-to-check condition of non-negativity to the more restrictive, but verifiable, condition of being a sum of squares. If we can find such a certificate, we have a provable bound on our minimum.

Of course, there is a catch. Is every non-negative polynomial a sum of squares? In a famous problem posed at the turn of the 20th century, David Hilbert showed that the answer, surprisingly, is no. There are peculiar polynomials (the most famous being the Motzkin polynomial) that are always non-negative but cannot be written as a sum of squares. This means our SOS condition is sufficient, but not necessary. If we fail to find an SOS certificate for $p(x) - \gamma$ , it doesn't necessarily mean that $\gamma$ isn't a lower bound. It might just be that our polynomial lives in that subtle gap between "always positive" and "a sum of squares". This gap is the fundamental source of conservatism in the SOS method. However, for certain important classes of polynomials, like quadratic functions, this gap vanishes. For quadratics, being non-negative is equivalent to being a sum of squares, making the SOS method exact in those cases.

The Machine: From Algebra to Geometry

So, we have a great idea, but how do we actually find this sum-of-squares decomposition? Manually rearranging polynomial terms is a nightmare. The true breakthrough comes from translating this algebraic problem into a geometric one.

Any polynomial $p(x)$ of degree $2d$ that is a sum of squares can be written in a special matrix form:

p(x) = z(x)^T Q z(x)

Here, $z(x)$ is simply a vector containing all the monomials of degree up to $d$ (e.g., for one variable and degree 2, $z(x) = \begin{pmatrix} 1 & x & x^2 \end{pmatrix}^T$ ). The object $Q$ is a symmetric matrix called the Gram matrix. The SOS condition on the polynomial $p(x)$ translates directly into a condition on the matrix $Q$ : it must be positive semidefinite (PSD). A matrix is PSD if the quadratic form it defines never dips below zero.

Let's see this in action with a simple example: finding the minimum of $p(x) = x^4 + ax^2 + 1$ . We want to find the largest $\gamma$ such that $p(x) - \gamma = x^4 + ax^2 + (1-\gamma)$ is a sum of squares. This is a degree-4 polynomial, so we use a monomial basis up to degree 2: $z(x) = \begin{pmatrix} 1 & x & x^2 \end{pmatrix}^T$ . We then set up the identity:

x^4 + ax^2 + (1-\gamma) = \begin{pmatrix} 1 & x & x^2 \end{pmatrix} \begin{pmatrix} q_{11} & q_{12} & q_{13} \\ q_{12} & q_{22} & q_{23} \\ q_{13} & q_{23} & q_{33} \end{pmatrix} \begin{pmatrix} 1 \\ x \\ x^2 \end{pmatrix}

By expanding the right-hand side and matching the coefficients of each power of $x$ (e.g., the coefficient of $x^4$ on the right must be $1$ , so $q_{33}=1$ ), we get a set of linear equations for the entries of $Q$ . Our search for the largest $\gamma$ has become a search for a PSD matrix $Q$ that satisfies these linear constraints. This type of problem—optimizing a linear function subject to a matrix being PSD—is a semidefinite program (SDP). And crucially, SDPs are convex optimization problems, for which we have powerful, efficient algorithms that are guaranteed to find the global optimum. We have successfully transformed a hard, non-convex problem in polynomials into a tractable, convex problem in matrices.

The Other Side of the Coin: Duality and Moments

Every optimization problem has a shadow self, a "dual" problem, and their relationship is often profound. The dual of the SOS problem is a search over moments of a hypothetical probability measure. Moments are quantities like the mean ( $y_1 = \mathbb{E}[x]$ ), variance-related term ( $y_2 = \mathbb{E}[x^2]$ ), and so on. The dual problem asks: what is the minimum expected value of our polynomial, $\mathbb{E}[p(x)]$ , over all possible probability distributions whose moments satisfy certain consistency conditions?

One of these conditions is that the moment matrix, which is structured just like the Gram matrix but filled with moment variables (e.g., $M_{ij} = y_{i+j}$ ), must be positive semidefinite. This ensures our "probability distribution" is not a mathematical fiction but could, in principle, exist.

The theory of optimization guarantees that the optimal value of this dual moment problem provides an upper bound on the optimal value of the primal SOS problem. When strong duality holds—which it often does in these problems—the bounds meet, and the two problems give the exact same answer! This primal-dual relationship is cemented by the Karush-Kuhn-Tucker (KKT) conditions, which link the optimal primal solution (the Gram matrix) and the optimal dual solution (the moments) in a system of equations.

This dual perspective is far more than a theoretical curiosity. It is the key to unlocking the full power of the method.

The Grand Reveal: Finding Where the Minimum Lies

When the SOS relaxation is exact (i.e., there is no conservatism gap), the optimal moment sequence is not just a set of abstract numbers. It represents the actual moments of a probability measure concentrated on the true global minimizers of the original polynomial. The structure of the moment matrix tells us everything.

The rank of the optimal moment matrix reveals the number of global minimizers. If the rank is 1, there is a unique global minimum. If the rank is $r$ , there are (typically) $r$ distinct global minima. For the problem of minimizing $f(x) = (x^2-1)^2$ , which has two minimizers at $x=-1$ and $x=1$ , the resulting optimal moment matrix correctly has a rank of 2.

Even better, we can use the moment matrix to find the exact locations of these minimizers. The null space of the moment matrix allows us to construct a set of polynomial equations that are all satisfied at the minimizer points. By solving this (often simple) system of equations, we can "triangulate" the coordinates of the lowest points in our landscape. Once we know the locations, we can even find their "weights" in the optimal measure by solving a simple linear system.

This completes a beautiful intellectual circle. We start with a hard optimization problem. We relax it to an algebraic search for an SOS certificate. We convert that to a geometric SDP involving a Gram matrix. We consider its dual, a problem over moments. And finally, the optimal moments give us the algebraic keys to find the exact coordinates of the solution.

SOS in the Real World: Scaling and Sparsity

This powerful framework forms the basis for modern techniques in control theory, robotics, and many other fields. For example, to prove a robot is "safe," we might search for a barrier function that is positive in the safe region. To prove a system is stable, we search for a Lyapunov function whose value decreases over time. Both can be formulated as SOS programs.

Of course, the real world is messy.

Local Analysis: Often, we only care about behavior in a specific region, $K$ , defined by inequalities $g_i(x) \ge 0$ . Instead of requiring a polynomial $p(x)$ to be globally SOS, we can search for a certificate of the form $p(x) = \sigma_0(x) + \sum_i \sigma_i(x) g_i(x)$ , where each $\sigma$ is an SOS polynomial. This is known as Putinar's Positivstellensatz and is much more powerful for constrained problems.
Diagnosing Failure: What if our SDP solver reports "infeasible"? It might not mean our system is unstable or unsafe. It could just mean our chosen polynomial certificate (e.g., a simple quadratic) was not expressive enough. A sophisticated analysis of the dual problem can often point to a specific "counterexample" point in the state space that our simple certificate failed to handle, guiding us to try a higher-degree polynomial. Sometimes, the problem is just poorly scaled; a simple change of variables can make an intractable problem numerically trivial.
The Curse of Dimensionality: The biggest practical challenge is scalability. The size of the Gram matrix grows explosively with the number of variables and the polynomial degree. For a degree-4 polynomial in just 12 variables, the Gram matrix is already $91 \times 91$ . This is the famous "curse of dimensionality." To combat this, researchers have developed brilliant techniques that exploit the structure of a problem. If a problem has correlative sparsity—meaning variables tend to interact only in small, localized groups—we can break the one monolithic SDP into a coupled system of many smaller, manageable SDPs. This is like realizing you can solve a giant jigsaw puzzle by first sorting the pieces into groups corresponding to different parts of the image. For even larger problems, we can use more aggressive (but still rigorous) inner approximations like Diagonally-Dominant-Sum-of-Squares (DSOS), which can be solved with even faster second-order cone programming, trading a bit of theoretical power for massive gains in computational speed.

From a simple algebraic trick to a powerful computational engine, the sum-of-squares methodology represents a triumph of interdisciplinary thought, weaving together algebra, geometry, and optimization to provide provable answers to some of science and engineering's most challenging questions.

Applications and Interdisciplinary Connections

We have spent some time wrestling with what might seem like a rather abstract mathematical trick: checking if a polynomial can be written as a sum of squares of other polynomials. You might be justifiably wondering, "What on Earth is this good for?" It turns out that this concept is not just a curiosity; it is a key that unlocks solutions to problems in fields so diverse they hardly seem to speak the same language. It provides a surprisingly unified and powerful way of thinking.

Let us now embark on a journey to see how this one simple idea—of certifying that something is positive by writing it as a sum of squares—helps us build safer robots, design the electronics that power our world, solve impossibly hard computational puzzles, and even peek into the strange rulebook of quantum mechanics.

The Art of Not Crashing: Stability and Safety in a Dynamic World

Imagine you have designed a sophisticated system—a drone trying to hover in the wind, a robotic arm in a factory, or even a complex chemical reactor. The behavior of these systems over time is often described by what we call dynamical systems, whose laws of motion can be written down as polynomial equations. The most important question you can ask is: is it stable? If we nudge it a little, will it return to its desired operating point, or will it fly off into some catastrophic failure?

The great Russian mathematician Aleksandr Lyapunov gave us a beautiful way to think about this in the late 19th century. He taught us that a system is stable if we can find a special function, now called a Lyapunov function, which acts like an "energy" for the system. If we can show that this energy function is always positive (except at the stable equilibrium point, where it's zero) and that its value always decreases as the system evolves, then the system must eventually settle down to its stable state, like a marble rolling to the bottom of a bowl.

This is a profound insight, but it comes with a catch: Lyapunov tells us that if we find such a function, we've proven stability, but he doesn't tell us how to find it. For more than a century, finding a Lyapunov function was something of an art form, relying on clever guesswork and intuition.

This is where sum-of-squares (SOS) relaxation enters the scene and turns the art into a science. If we are looking for a polynomial Lyapunov function, the conditions that it must be positive and that its rate of change must be negative are both conditions on the non-negativity of certain polynomials. Instead of trying to prove this non-negativity directly, we can use our new trick: we require that these polynomials be sums of squares. The magic is that this transforms the daunting task of finding a function into a specific type of convex optimization problem known as a semidefinite program (SDP), which modern computers can solve with astonishing efficiency. We have converted an intractable analytical question into a tractable computational one.

This technique allows us to answer more than just "Is it stable?". We can ask, "How large is the 'bowl'?" That is, what is the set of initial states from which the system is guaranteed to return to equilibrium? This is called the Region of Attraction (ROA). Using SOS methods, we can compute a certified inner approximation of this region, giving us a formal guarantee of safe operation within a known envelope.

We can even flip the problem around to deal with safety. Suppose there is an "unsafe" region of states that our system must never, ever enter. We can use SOS to search for a "barrier certificate"—a function that acts like a digital guardrail. If we can prove that this barrier function starts out below a certain value and can never increase to that value along any system trajectory, then we have proven that the system is safe and will never cross into the unsafe zone [@problem_synthesis_synthesis:2751124]. This has profound implications for verifying the safety of autonomous vehicles, aircraft, and medical devices.

The story doesn't end with just analyzing existing systems. The holy grail of engineering is synthesis—the act of creation. While the problem becomes more difficult (it is no longer a simple convex optimization problem), SOS techniques form the backbone of modern methods for automatically synthesizing the control laws that stabilize complex nonlinear systems, moving us from the role of a worried observer to that of a confident designer.

Sculpting Waves and Images: A New Language for Signal Processing

Our world runs on signals. The music you listen to, the images you see on this screen, the Wi-Fi signals that connect you to the internet—all are signals that need to be processed. A fundamental operation in signal processing is filtering: keeping the parts of the signal we want and getting rid of the parts we don't, like noise or interference.

The performance of a filter is characterized by its frequency response, which tells us how much it amplifies or attenuates signals at different frequencies. It turns out that for a huge class of digital filters, this frequency response can be written as a trigonometric polynomial. A design specification, such as "completely block all frequencies in the range $[\omega_1, \omega_2]$ ," becomes a mathematical constraint on this polynomial over a specific interval.

How can we be sure a filter design meets its specifications? We can test it at a million frequency points, but that's not a proof. Once again, SOS comes to the rescue. The specification, say that the filter's magnitude-squared response must be less than some tiny number $\delta^2$ in the "stopband," is a polynomial inequality. We can ask a computer to find an SOS certificate that proves this inequality holds for the entire interval. This provides a formal, mathematical guarantee of the filter's performance. Remarkably, for these one-dimensional signal problems, the SOS method is often exact. There is no gap between a polynomial being non-negative on an interval and it having an SOS certificate.

The plot thickens when we move from one-dimensional signals, like audio, to two-dimensional signals, like images. A cornerstone of 1D filter theory is the Fejér-Riesz theorem, which states that any non-negative frequency response can be factored; it can be written as the magnitude-squared of a single, stable, causal filter. This is immensely useful for design. But in a surprising twist of mathematics, this beautiful theorem fails to generalize to two or more dimensions. There are 2D image filters whose frequency responses are always positive but simply cannot be written as the magnitude-squared of a single 2D filter.

This is where SOS provides a more general and powerful truth. While a 2D filter response might not be a single square, it can often be represented as a sum of squares. The SOS framework gives us a constructive way to find this decomposition, enabling the systematic design and analysis of multidimensional filters that are crucial for image processing, medical imaging, and geophysical data analysis.

Taming the Intractable: A New Perspective on Hard Problems

Some problems in mathematics and computer science are famous for being "hard." For these NP-hard problems, finding the absolute best solution is believed to require a computational effort that grows exponentially with the size of the problem, quickly becoming impossible for even the fastest supercomputers.

A classic example is the Max-Cut problem. Imagine you are organizing a large event and need to divide all attendees into two groups. You have a list of pairs of people who dislike each other. Your goal is to arrange the two groups to maximize the number of pairs of nemeses that are separated. This simple-sounding puzzle is NP-hard.

Faced with such a problem, computer scientists often turn to a clever strategy: relaxation. Instead of insisting that each person must be in either group 1 or group 2 (which we might represent by a variable $x_i$ being $+1$ or $-1$ ), we "relax" this condition. In the celebrated Goemans-Williamson algorithm, each person is represented by a vector on a high-dimensional sphere. The objective of maximizing the cut now becomes a problem of arranging these vectors, which, miraculously, turns into a solvable semidefinite program.

Here is the stunning connection: this landmark algorithm, which guarantees a solution that is at least 87.8% as good as the true, unknowable optimum, is exactly what you get if you apply the simplest, degree-2 SOS relaxation to the polynomial formulation of the Max-Cut problem. This is no coincidence. The SOS framework provides a systematic hierarchy of increasingly powerful, and computationally expensive, relaxations that can be applied to a vast range of hard optimization problems. It reveals that one of the most famous algorithms in theoretical computer science is just the first rung on a much larger ladder.

Peeking at Quantum Reality's Operating System

Perhaps the most breathtaking application of our story takes us to the very edge of human knowledge: the foundations of quantum mechanics. One of the defining features of the quantum world is entanglement—the "spooky action at a distance" that so troubled Einstein. The consequences of entanglement can be framed as a "non-local game" where players who share an entangled quantum state can achieve correlations in their actions that are impossible in our classical world.

A central question in physics is: what are the ultimate limits on these correlations? How much of an advantage does quantum mechanics truly give you? Physicists developed a powerful sequence of tests, known as the NPA (Navascués-Pironio-Acín) hierarchy, to compute ever-tighter upper bounds on the success probability in any such quantum game. Each level of the hierarchy gives a bound that is guaranteed by the laws of quantum mechanics, formulated in the language of operator algebras.

And now for the final, profound revelation. It has been proven that this NPA hierarchy, born from the depths of quantum information theory, is mathematically equivalent to the sum-of-squares hierarchy, applied to a setting where the variables do not commute. A tool forged for polynomial optimization turns out to be the most powerful method we have for charting the boundary between the classical and quantum worlds. It is as if we discovered that the grammatical rules of ancient Latin also perfectly describe the folding of a complex protein.

So, the next time you encounter a polynomial, do not dismiss it as a dry formula from a textbook. Think of it as a piece of a universal language, describing everything from the motion of a robot to the frequency response of a filter to the optimal way to partition a network. And think of the sum-of-squares principle as our Rosetta Stone—a deep and beautiful idea that allows us to translate some of the most important and challenging questions in modern science and engineering into a form that, at last, we can solve.