Sum-of-Squares Optimization

SciencePedia

Key Takeaways

Sum-of-Squares (SOS) optimization provides a computationally tractable certificate for a polynomial's non-negativity by decomposing it into a sum of squared polynomials.
The search for an SOS decomposition is efficiently converted into a solvable semidefinite program (SDP) through the use of a Gram matrix.
SOS is a powerful tool in control theory for finding Lyapunov functions to prove system stability and estimate a system's Region of Attraction.
The method is inherently conservative, as some non-negative polynomials (like the Motzkin polynomial) cannot be expressed as a sum of squares.

Introduction

Proving that a function is always non-negative is a fundamental challenge in many scientific fields, from ensuring a robot's stability to verifying the ground state energy in physics. While simple for basic functions, this task becomes nearly impossible for complex multivariate polynomials, where simply testing points offers no guarantee of universal positivity. This gap in providing a definitive "certificate of positivity" creates a critical hurdle in guaranteeing safety and performance in complex systems.

Sum-of-Squares (SOS) optimization emerges as a powerful and elegant solution. Instead of checking an infinite number of points, it provides a finite, algebraic proof of non-negativity by attempting to rewrite a polynomial as a sum of squares of other polynomials. This article provides a comprehensive overview of this technique, guiding the reader from its foundational concepts to its practical applications.

The following chapter, "Principles and Mechanisms," will delve into the core theory, explaining how the abstract problem of polynomial positivity is translated into a solvable matrix problem—a semidefinite program (SDP). We will explore the ingenious Gram matrix method that enables this translation, discuss the profound concept of duality, and confront the inherent limitations of the approach, such as conservatism and the "curse of dimensionality." Following this theoretical foundation, the "Applications and Interdisciplinary Connections" chapter will showcase the remarkable versatility of SOS optimization, demonstrating its use in solving real-world problems in control theory, signal processing, and even computational biology.

Principles and Mechanisms

Imagine you are an engineer tasked with guaranteeing that a complex robotic arm will always stay within its safe operating zone, or a physicist wanting to prove that the energy of a system never drops below a certain ground state. At the heart of these questions lies a surprisingly deep mathematical challenge: how can you be certain that a given function, say, $P(x_1, x_2, \dots, x_n)$ , is always non-negative?

For a simple high-school parabola $f(x) = ax^2 + bx + c$ , this is straightforward. But for a complicated polynomial in many variables, the landscape can be a wild terrain of peaks and valleys in a high-dimensional space that we can't even visualize. Testing a million, or even a billion, points gives us some confidence, but it's not a proof. There could always be a tiny, hidden valley where the function dips into negative territory. We need a definitive certificate, an irrefutable proof of positivity.

The Quest for Positivity

What if we could find an algebraic structure within the polynomial itself that inherently guarantees its non-negativity? This is the beautiful and powerful idea behind Sum-of-Squares (SOS) optimization. The logic is almost deceptively simple: any number squared is non-negative. Therefore, any sum of squared numbers is also non-negative.

If we can take our complicated polynomial $P(x)$ and rewrite it as a sum of squares of other polynomials, say $P(x) = \sum_{i} h_i(x)^2$ , then we have found our certificate. For any input $x$ , the right-hand side is a sum of squared real numbers, which can never be negative. Therefore, $P(x)$ must be non-negative everywhere. This is an algebraic proof, a stamp of certainty that holds for all possible inputs, without having to check a single one. This is the core promise of the sum-of-squares technique. But this raises the next, crucial question: how on Earth do we find those $h_i(x)$ functions?

The Rosetta Stone: From Polynomials to Matrices

Directly searching for the component polynomials $h_i(x)$ is a daunting task. The magic of SOS is that it provides a "Rosetta Stone" to translate this difficult problem into a completely different, and much more manageable, domain: the world of matrices. This translation is performed by the Gram matrix method.

Let's take a journey with a concrete example. Suppose we have a polynomial candidate for the energy of a system, $V(x_1, x_2) = x_1^4 + x_2^4 + c x_1^2 x_2^2$ , and we want to know for which values of the parameter $c$ this "energy" is certifiably non-negative using the SOS method.

First, we break down our polynomial into its fundamental building blocks. Since $V$ is a polynomial of degree 4, its "square roots" would be polynomials of degree 2. So, we create a vector, let's call it $z(x)$ , containing all the monomial building blocks of degree 2:

z(x) = \begin{pmatrix} x_1^2 \\ x_1 x_2 \\ x_2^2 \end{pmatrix}

Now, we propose that our polynomial $V(x)$ can be written as a quadratic form using these building blocks and a currently unknown symmetric matrix $Q$ , which we'll call the Gram matrix:

V(x) = z(x)^{\top} Q z(x) = \begin{pmatrix} x_1^2 & x_1 x_2 & x_2^2 \end{pmatrix} \begin{pmatrix} q_{11} & q_{12} & q_{13} \\ q_{12} & q_{22} & q_{23} \\ q_{13} & q_{23} & q_{33} \end{pmatrix} \begin{pmatrix} x_1^2 \\ x_1 x_2 \\ x_2^2 \end{pmatrix}

If we multiply this out, we get a new polynomial whose coefficients are linear combinations of the $q_{ij}$ entries. By forcing this new polynomial to be identical to our original $V(x)$ , we derive a set of simple linear equations that the entries of $Q$ must satisfy. For our example, this gives conditions like $q_{11} = 1$ , $q_{33} = 1$ , $2q_{12} = 0$ , and $q_{22} + 2q_{13} = c$ .

Here is the central connection: a polynomial is a sum of squares if and only if there exists a positive semidefinite (PSD) Gram matrix $Q$ that satisfies these linear equations. A matrix $Q$ is positive semidefinite (written $Q \succeq 0$ ) if for any vector $y$ , the number $y^\top Q y$ is non-negative. It's the matrix analogue of a non-negative number.

If we find such a $Q \succeq 0$ , we have our certificate. Why? Because any PSD matrix has a "matrix square root," meaning it can be factored as $Q = L L^\top$ . Substituting this into our expression gives:

V(x) = z(x)^\top L L^\top z(x) = (L^\top z(x))^\top (L^\top z(x))

This last expression is the sum of the squares of the elements of the vector $L^\top z(x)$ . We've explicitly found the sum of squares! The search for an SOS certificate has been transformed into a search for a PSD matrix satisfying linear constraints. This type of problem is known as a semidefinite program (SDP), a class of convex optimization problems that, remarkably, we have efficient algorithms to solve. We've turned an intractable question about polynomials into a solvable one about matrices. For our example polynomial, this machinery reveals that it is SOS if and only if $c \ge -2$ .

The Gap Between Truth and Proof

This machinery seems almost too good to be true. Is it possible that every polynomial that is truly non-negative can be written as a sum of squares? If so, our SOS method would be a perfect, universal test for positivity.

Alas, the world is more subtle. As David Hilbert discovered in the late 19th century, the answer is no. There exist polynomials that are non-negative everywhere, yet can never be decomposed into a sum of squares. The most celebrated of these is the Motzkin polynomial:

M(x, y) = x^4 y^2 + x^2 y^4 - 3x^2 y^2 + 1

One can prove, using calculus or other means, that $M(x, y) \ge 0$ for all real numbers $x$ and $y$ . Yet, it is impossible to write it as $\sum_i h_i(x,y)^2$ .

This reveals a fundamental gap between the truth (the set of all non-negative polynomials) and what is provable with an SOS certificate. The set of SOS polynomials is a strict subset of the non-negative polynomials. This means that when we use SOS as a proxy for non-negativity, we introduce conservatism. We might encounter a system that is perfectly stable, with its true Lyapunov function being non-negative but not SOS. Our SOS-based search would fail and report that it cannot find a certificate, even though one (of a different kind) exists. This is the price we pay for computational tractability.

Beyond Global Truth: Certificates on a Leash

Often in science and engineering, we don't need a guarantee that holds for the entire universe of possibilities. We might only need to prove that a system is stable as long as its state stays within a certain bounded region $\mathcal{S}$ , for example, inside an ellipsoid representing a maximum energy level. Quadratic Lyapunov functions, of the form $V(x) = x^\top P x$ , can only ever prove stability for convex, ellipsoidal regions. But real-world safe sets can have complex, non-convex shapes. This is another area where the flexibility of higher-degree SOS polynomials shines.

How can we use our algebraic SOS machinery, which works everywhere, to prove a property that only needs to hold on a specific set $\mathcal{S}$ ? Let's say our set is defined by an inequality $g(x) \ge 0$ . The trick is to cleverly modify the statement we are trying to prove.

This idea is formalized in a result called the Positivstellensatz (German for "positivity theorem"). A simple version of this is the S-procedure. To prove that $P(x) \ge 0$ for all $x$ where $g(x) \ge 0$ , we can search for an SOS multiplier polynomial $\sigma(x)$ such that the following expression is globally a sum of squares:

P(x) - \sigma(x) g(x) \quad \text{is SOS}

Let's see why this works. We've constructed an identity that holds for all $x$ . Now, let's restrict our attention to the set $\mathcal{S}$ where $g(x) \ge 0$ . On this set, since $\sigma(x)$ is SOS, it is also non-negative. Thus, the term $\sigma(x)g(x)$ is non-negative. Our SOS condition tells us $P(x) - \sigma(x)g(x) \ge 0$ , which implies $P(x) \ge \sigma(x)g(x)$ . Since the right side is non-negative inside $\mathcal{S}$ , we must have $P(x) \ge 0$ inside $\mathcal{S}$ . It's a beautiful algebraic trick that uses a global certificate to prove a local property.

The full Positivstellensatz extends this idea to sets defined by many polynomial inequalities, $g_i(x) \ge 0$ . It gives us a powerful theorem: under certain conditions (namely, the set is compact, or "Archimedean"), any polynomial that is strictly positive on the set can be written with SOS multipliers. A subtle point arises: the theorem guarantees a certificate for strictly positive polynomials ( $P(x) > 0$ ), but we often want to prove non-negativity ( $P(x) \ge 0$ ). The elegant solution is to add a tiny positive "slack" term, $\varepsilon$ , and prove that $P(x) + \varepsilon$ is positive on the set. This provides the rigorous link between the theorems and practical computation.

The Two Sides of the Coin: Duality and a Surprising Connection

The journey into SOS optimization reveals yet another layer of profound beauty, a concept from optimization theory called duality. Every optimization problem, which we can call the "primal" problem, has a corresponding "dual" problem that offers a different perspective on the same question.

Our primal problem for finding the minimum of a polynomial $P(x)$ is:

Find the largest number $\gamma$ such that the polynomial $P(x) - \gamma$ is a sum of squares.

This gives a certified lower bound on the minimum value of $P(x)$ . Now, what is its dual? The dual problem is something completely unexpected:

Find a probability measure on the space of $x$ that minimizes the expected value of $P(x)$ .

A "probability measure" is a way of assigning weights to different regions of the space. The dual problem tries to find the cleverest way to distribute this weight to make the average value of the polynomial as small as possible. The variables in this dual problem are the moments of the measure—abstractions of statistical quantities like mean, variance, and skewness.

The core principle of weak duality states that the solution of the dual problem is always a lower bound for the solution of the primal. For many SOS problems, something stronger holds: strong duality. This means the optimal value of the algebraic primal problem is exactly equal to the optimal value of the probabilistic dual problem. And very often, this common value is the true global minimum of the polynomial.

But the real magic is that the solution to the dual problem—the set of optimal moments—contains geometric information. It acts as a fingerprint of the locations where the minimum is achieved. From this list of numbers, one can often algebraically reconstruct the coordinates $(x_1, x_2, \dots)$ of the global minimizers. Furthermore, if an SOS program is infeasible, the corresponding dual problem provides a certificate of this infeasibility, which can often be interpreted as a specific "counterexample" point in space where the desired positivity condition fails. This interplay between algebra, geometry, and probability is a stunning example of the unity of mathematical concepts.

The Price of Power: The Curse of Dimensionality

So, we have a "superpower": a computational method to reason about infinite spaces of possibilities. What's the catch? The catch is a familiar villain in modern science: the curse of dimensionality.

The SOS machinery requires us to build a Gram matrix. The size of this matrix depends on the number of variables in our polynomial, $n$ , and its degree, $2d$ . Specifically, the side length of the matrix is given by the combinatorial term $\binom{n+d}{d}$ . This number grows explosively. For a homogeneous degree-4 polynomial ( $d=2$ ) in $n=2$ variables, the matrix is a tiny $3 \times 3$ . For $n=12$ , as seen in one of our thought experiments, it's already a $91 \times 91$ matrix. For a degree-20 polynomial in 10 variables, the matrix size is over 100,000!

The time it takes for a computer to solve the resulting SDP scales roughly with the cube of the matrix size. This means the computational cost skyrockets, making problems with many variables or high degrees completely intractable with this "dense" formulation. This is the fundamental trade-off of SOS optimization: increasing the degree of our polynomial certificate can reduce conservatism and find better solutions, but at a punishing computational price.

This reality has spurred a new wave of research. Practitioners use clever heuristics, like starting with low-degree polynomials and iteratively increasing the degree while keeping an eye on a computational budget. They also use pre-conditioning tricks, like rescaling coordinates, to make the numerical problems more stable. At the frontier of the field, researchers are developing more scalable methods that exploit problem structure (like sparse SOS) or use more restrictive but faster-to-check inner approximations of the SOS condition, such as diagonally-dominant-sum-of-squares (DSOS), which can be checked with much faster linear programs (LPs). This ongoing work continues the quest: to harness the power of algebraic certainty while taming the curse of dimensionality.

Applications and Interdisciplinary Connections

We have spent some time admiring the beautiful algebraic machinery of Sum-of-Squares optimization. We've seen how the simple, almost self-evident fact that a sum of squared numbers cannot be negative can be translated into the powerful language of convex optimization. Now, you might be wondering, "That's a neat trick, but what is it good for?"

This is where our journey truly begins. It turns out this "simple" idea is a kind of master key, unlocking doors in worlds as different as a soaring spacecraft, the intricate dance of proteins in a living cell, and even the abstract frontiers of computation itself. The principles we’ve learned are not just textbook exercises; they are active, powerful tools used by scientists and engineers to solve real, challenging problems. Let us now explore some of these worlds and see the remarkable unity this single concept brings to them.

Taming Motion: The Realm of Dynamics and Control

Perhaps the most mature and widespread application of Sum-of-Squares is in the field of dynamical systems and control theory—the science of making things move as we want them to, and ensuring they don't do what we don't want them to.

The Quest for Stability

Imagine a marble at the bottom of a bowl. Nudge it, and it rolls back to the center. We say its position at the center is stable. Now imagine the marble balanced precariously on top of an inverted bowl. The slightest puff of wind sends it tumbling away, never to return. This is unstable. For centuries, physicists and engineers have sought a universal way to distinguish the stable bowl from the unstable one for any system, no matter how complex.

The great Russian mathematician Aleksandr Lyapunov gave us a profound insight: a system is stable if we can find an "energy-like" function, which he called a Lyapunov function, that is always positive except at the equilibrium point and always decreases as the system moves. For our marble, this function could be its height; as it rolls back to the center of the bowl, its height (and thus its potential energy) constantly decreases.

This is a beautiful idea, but there was always a catch: how do you find such a function? For complex, nonlinear systems—the kind that describe everything from aircraft flight to chemical reactions—this was more of an art than a science. There was no systematic procedure.

This is where Sum-of-Squares enters the stage. If our system's dynamics are described by polynomials, we can search for a polynomial Lyapunov function $V(x)$ . The conditions that $V(x)$ must be positive definite and its derivative $\dot{V}(x)$ must be negative definite are precisely questions about polynomial non-negativity! By translating these conditions into SOS constraints, we transform the daunting, creative task of finding a Lyapunov function into a concrete, solvable optimization problem. We essentially ask the computer: "Can you find a sum-of-squares polynomial $V(x)$ such that its negative derivative, $-\dot{V}(x)$ , is also a sum of squares?" If the computer says yes, we have a rigorous, undeniable certificate that our system is stable. The art of stability analysis has become a science.

Mapping the Safe Basin

Knowing that an equilibrium is stable is only half the story. The marble in the bowl is stable, but if we push it hard enough, it will fly out of the bowl entirely. For a pilot, it's not enough to know that the aircraft can recover from a small gust of wind; they need to know the limits—how large a disturbance it can handle before it loses control. This "safe zone" of initial conditions from which the system returns to equilibrium is called the Region of Attraction (ROA).

Mapping this region is notoriously difficult. A common first step is to simplify the system by linearizing it—essentially pretending it behaves like a simple spring near its equilibrium. This gives us a quadratic Lyapunov function and a nice, simple estimate of the ROA, usually a ball or an ellipsoid. However, reality is nonlinear, and this estimate is often ridiculously conservative, like claiming a huge valley's basin of attraction is just a tiny circle at the very bottom.

To get a better picture, we need a Lyapunov function that more closely matches the true "shape" of the system's energy landscape. We can do this by adding higher-order polynomial terms to our function. But which terms? And with what coefficients? This is another seemingly impossible search. Yet again, SOS provides the answer. We can ask our optimization program to find the largest possible sublevel set of a polynomial Lyapunov function where its derivative remains negative. The SOS formulation allows the computer to intelligently choose the coefficients of the higher-order terms, effectively "sculpting" the shape of our Lyapunov function to certify a much larger, non-spherical, and more realistic Region of Attraction.

Building Invisible Fences: Safety and Constraints

Sometimes, stability isn't our primary concern. Instead, we want to guarantee safety—that the system will never enter a forbidden region. A self-driving car must never collide with a pedestrian; a chemical reactor's temperature must never exceed a critical threshold. We want to build an invisible, inviolable fence around these unsafe regions.

This is the idea behind Control Barrier Functions (CBFs). A barrier function $h(x)$ is defined such that the safe region is where $h(x) \ge 0$ . To guarantee safety, we must ensure that whenever we are on the boundary of the safe set (at $h(x)=0$ ), the system's dynamics are not pointing outwards. This condition, that the derivative of $h(x)$ must be non-negative, is another non-negativity constraint perfectly suited for an SOS formulation. By finding an SOS certificate for this condition, we can mathematically prove that no trajectory can ever "escape" the safe set.

This same principle allows us to handle systems with hard physical limits, known as state constraints. If a robot arm cannot move beyond a certain angle, or a voltage cannot exceed a certain limit, we can encode these limits as polynomial inequalities. Using SOS, we can then prove that a system's Region of Attraction is entirely contained within these allowed bounds, guaranteeing both stability and safe operation.

This machinery is not just for analysis; it's also for design. Control Lyapunov Functions (CLFs) extend these ideas to synthesize controllers that not only stabilize a system but also respect constraints on the control inputs themselves, like the maximum torque of a motor. SOS optimization can find both the controller and a certificate of the largest region where this controller is guaranteed to work safely and effectively.

And just as we can certify stability, the same framework can be turned on its head to certify instability by searching for a Chetaev function, which is like an "anti-Lyapunov" function that proves trajectories will escape from the equilibrium.

Sculpting Signals and Images

The reach of Sum-of-Squares extends far beyond mechanics and motion. It finds a surprisingly elegant home in the world of signal processing, the science behind how we interpret, analyze, and manipulate data like sound, images, and radio waves.

The Perfect Filter

Think of an audio equalizer on your stereo. It allows you to boost the bass or cut the treble. This is a digital filter. Its purpose is to let certain frequencies pass through while blocking others. A key specification for a filter is its performance in the "stopband"—the range of frequencies it's supposed to block. We might require that in this band, the energy of any signal is reduced by a huge amount, say, to less than $0.01$ of its original value.

The energy response of many common filters can be written as a polynomial of $\cos(\omega)$ , where $\omega$ is the frequency. The stopband requirement then becomes an inequality: a certain polynomial must remain below a threshold over a given interval. How can we certify this? You guessed it. By reformulating the question, we ask for an SOS proof that the polynomial $\text{threshold} - \text{response}$ is non-negative over the interval. For these univariate polynomial problems, SOS relaxations are not just approximations; they are exact. If a certificate exists, SOS will find it, providing a rigorous quality-control check for the filter's design.

Beyond One Dimension: The True Shape of Non-negativity

The connection to signal processing reveals something even deeper. In one dimension (like a time signal), a famous result called the Fejér-Riesz Theorem states that any non-negative trigonometric polynomial—like a filter's energy response—can be written as the squared magnitude of a single other polynomial, $|H(z)|^2$ . This is called spectral factorization, and it is a cornerstone of 1D filter design.

One might naturally assume this beautiful result extends to higher dimensions, like 2D signals (images) or 3D signals (video). But it doesn't. There exist non-negative 2D polynomials that simply cannot be written as the squared magnitude of a single 2D polynomial. The simple, elegant structure of the 1D world breaks down.

What, then, is the true structure of non-negative polynomials in higher dimensions? Hilbert's 17th problem, a famous mathematical question from 1900, hinted at the answer, which was later proven: while a non-negative polynomial may not be a single square, it is always a sum of squares of rational functions. The SOS framework reveals this more general and beautiful truth. A non-negative polynomial is not necessarily $|H_1(z)|^2$ , but it can be represented or approximated as $\sum_i |H_i(z)|^2$ . SOS gives us a computational tool to find these factors, $H_i$ , providing a principled way to perform spectral factorization and design filters in dimensions where the classical 1D theory fails. It shows that the true building block of non-negativity is not a single square, but a sum of them.

The Logic of Life and Computation

The journey takes one final turn, into the abstract but powerful worlds of formal logic, computational complexity, and even biology.

Certifying the Code of Life

Modern biology is increasingly becoming a quantitative science. Synthetic biologists design gene regulatory circuits in bacteria to make them behave like tiny computers or chemical factories. These systems are often hybrid systems, mixing continuous dynamics (like the slow change in protein concentrations) with discrete, switch-like logic (a gene turning on or off).

Predicting the behavior of such complex systems is a formidable challenge. Will a synthetic circuit always remain stable, or could a protein concentration run away to toxic levels? We can model these questions using temporal logic, a formal language for specifying properties over time. The safety property "the concentration of protein $x$ will always stay below level $\theta$ for the first $T$ hours" can be written as a precise logical formula. Using a time-dependent barrier certificate and SOS programming, we can automatically verify if the hybrid model of the gene circuit satisfies this formula. This provides a formal, mathematical guarantee of biological safety, a crucial step in engineering reliable biological systems.

The Frontiers of Proof and Complexity

Finally, we arrive at the most abstract application: understanding the fundamental limits of computation. Many of the most famous problems in computer science, like the Traveling Salesman Problem or finding the largest independent set in a graph, belong to a class called NP-hard. They are believed to be intrinsically difficult to solve exactly, with the required computation time exploding as the problem size grows.

The Sum-of-Squares hierarchy provides a systematic way to find increasingly better approximations to these hard problems. At each level of the hierarchy, we allow our SOS proof to use polynomials of a certain maximum degree. A degree-2 proof is computationally cheap but might only give a loose approximation. A degree-4 proof is more expensive but yields a tighter bound. As we increase the degree, we get closer to the true answer, but at an exponentially increasing computational cost.

In a sense, the SOS hierarchy creates a smooth landscape between problems we can solve easily and those that are impossibly hard. It provides a measure of the "difficulty" of proving a certain mathematical statement. For some problems, a low-degree SOS proof exists, showing they are not so hard after all. For others, it has been proven that any SOS proof would require an enormous degree, giving strong evidence for their intrinsic computational hardness. This connects SOS to one of the deepest questions in all of science: the P versus NP problem.

From the stability of a physical system to the certification of a biological circuit to the ultimate limits of what can be proven and computed, the simple idea of a sum of squares provides a unifying thread, a testament to the profound and often surprising power of a single mathematical concept.