Sum of Squares Programming

SciencePedia

Key Takeaways

Sum of Squares (SOS) programming transforms the difficult problem of proving a polynomial is non-negative into a tractable algebraic problem of finding an SOS decomposition.
The method works by checking if a polynomial's Gram matrix is positive semidefinite, a task that can be solved efficiently using semidefinite programming (SDP).
While being an SOS is only a sufficient condition for non-negativity (creating conservatism), theorems like the Positivstellensätze enable proofs of positivity on specific sets, which is vital for practical applications.
SOS programming is a versatile tool with profound applications in certifying stability and safety in control systems, designing multidimensional filters in signal processing, and analyzing computational complexity.

Introduction

In many critical fields of science and engineering, from guaranteeing the safety of an aircraft to ensuring the stability of a robotic system, a fundamental challenge arises: how can we be certain that a complex polynomial function never takes a negative value? Directly checking every possible input is an impossible task, creating a significant gap between mathematical models and provable real-world guarantees. This article introduces Sum of Squares (SOS) programming, a powerful computational technique that brilliantly bridges this gap. It provides a tractable method for certifying the positivity of polynomials, transforming an infinite geometric problem into a finite algebraic one solvable by computers. The following chapters will first delve into the core "Principles and Mechanisms" of SOS programming, explaining how it connects non-negative polynomials to semidefinite programming, its inherent limitations, and powerful extensions. Subsequently, the "Applications and Interdisciplinary Connections" chapter will showcase the surprising versatility of this method, exploring its profound impact on control theory, signal processing, and even theoretical computer science.

Principles and Mechanisms

At its heart, science often progresses by finding clever ways to answer difficult questions with easier ones. If you want to know if an unknown creature is a mammal, you could embark on a lengthy study of its anatomy and genetics. Or, you could ask an easier question: "Does it have hair?" If the answer is yes, you're almost certainly looking at a mammal. This isn't a perfect, foolproof test—a whale's hair is not obvious—but it's a wonderfully efficient and powerful shortcut.

Sum-of-squares (SOS) programming is precisely this kind of brilliant shortcut, applied to a notoriously difficult mathematical problem: how can you be certain that a polynomial function never dips below zero? This question is not just an academic curiosity; it is fundamental to proving the safety of an aircraft, the stability of a power grid, or the reliability of a robot.

The Great Leap: From Geometry to Algebra

Imagine a complicated polynomial function $p(x)$ , a landscape of hills and valleys stretching across multiple dimensions. We want to guarantee that this landscape never drops below sea level, that is, $p(x) \ge 0$ for all possible values of its variables $x$ . This is a geometric question. Checking every single point is impossible. How can we be sure there isn't some hidden, deep valley we missed?

The sum-of-squares technique makes a breathtaking leap from this infinite geometric problem to a finite, algebraic one. It asks a different, simpler question: "Can we write our polynomial $p(x)$ as a sum of other polynomials that have been squared?"

$p(x) = \sum_{i=1}^{k} h_i(x)^2$

If we can find such a representation, the non-negativity of $p(x)$ is self-evident. A squared real number can never be negative, and a sum of non-negative numbers can never be negative. The problem of checking infinitely many points is replaced by the problem of algebraic manipulation.

But how do we find such a decomposition? This is where the magic of linear algebra comes in. Any polynomial $p(x)$ of an even degree, say $2d$ , can be written in the form:

$p(x) = v_d(x)^T Q v_d(x)$

Here, $v_d(x)$ is a vector containing all the monomials of degree up to $d$ (e.g., for two variables and $d=1$ , it would be $\begin{pmatrix} 1 & x & y \end{pmatrix}^T$ ). The matrix $Q$ is a symmetric matrix of coefficients called the Gram matrix. Finding this matrix is straightforward. The truly profound insight is this: a polynomial is a sum of squares if and only if its Gram matrix can be chosen to be positive semidefinite ( $Q \succeq 0$ ).

A positive semidefinite matrix is one that, when applied to any vector $z$ , never produces a negative result in the quadratic form $z^T Q z \ge 0$ . And here's the kicker: checking if a matrix is positive semidefinite is a standard, efficiently solvable problem in convex optimization known as a semidefinite program (SDP). Suddenly, our intractable problem has been transformed into one that computers can solve robustly.

For instance, when analyzing the stability of a nonlinear system, we might search for a Lyapunov function $V(x,y)$ whose derivative $\dot{V}(x,y)$ is always negative. By requiring $-\dot{V}(x,y)$ to be a sum of squares, we can convert the stability analysis into a search for a positive semidefinite matrix, a task a computer can handle.

The Catch: On Non-negative Polynomials and White Crows

So, we have a powerful tool. But is it perfect? Is every non-negative polynomial a sum of squares? In the late 19th century, the great mathematician David Hilbert posed this as his 17th problem. The answer, surprisingly, is no.

For polynomials in one variable, or for any quadratic polynomial (degree 2), non-negativity and being a sum of squares are indeed the same thing. This is a wonderful result, because it means that for analyzing linear systems with quadratic Lyapunov functions, our SOS shortcut is exact—it introduces no conservatism whatsoever.

However, for more complex cases, such as a polynomial of degree six in two variables, there exist "white crows"—polynomials that are non-negative everywhere but cannot be written as a sum of squares of polynomials. The most famous example is the Motzkin polynomial:

$M(x, y) = x^4 y^2 + x^2 y^4 - 3x^2 y^2 + 1$

You can verify that $M(x,y)$ is never negative, but it is not a sum of squares. This means that if we are searching for a stability certificate and the true certificate happens to be a Motzkin-like polynomial, our SOS test will fail. It will report it cannot find a certificate, even though one (a non-SOS one) exists. This gap between the set of all non-negative polynomials and the smaller, more manageable set of SOS polynomials is known as conservatism. Our shortcut is sufficient, but not always necessary.

Artin later solved Hilbert's 17th problem by showing that any non-negative polynomial can be written as a sum of squares of rational functions (ratios of polynomials). While theoretically beautiful, this doesn't immediately help our computational quest, as searching for unknown denominators leads to non-convex problems that are hard to solve.

Certificates for the Real World: Positivity on a Leash

In many engineering applications, we don't need a guarantee that holds for all of $\mathbb{R}^n$ . We care about a specific region of operation: inside a "safe set" $\mathcal{C}$ , below a certain energy level, or within the physical limits of a robot's joints. Let's say our safe set is defined by a polynomial inequality, $\mathcal{C} = \{x \mid g(x) \ge 0\}$ . How can we prove that another polynomial, $p(x)$ , is non-negative just for the $x$ inside $\mathcal{C}$ ?

This is where the true power of the algebraic approach shines, through a set of theorems known as the Positivstellensätze (German for "positive-locus-theorems"). They provide algebraic "proofs" or certificates of positivity on a set. The simplest version, often called the S-procedure, works like this: if we can find a sum-of-squares polynomial $\sigma(x)$ such that

$p(x) - \sigma(x)g(x) \text{ is a sum of squares}$

then we have a certificate that $p(x) \ge 0$ whenever $g(x) \ge 0$ . Why? The term $\sigma(x)g(x)$ is non-negative on $\mathcal{C}$ (since $\sigma$ is SOS and $g \ge 0$ there), and we are adding another SOS polynomial (which is always non-negative). The result must be non-negative. This allows us to prove properties that are only locally true, dramatically reducing conservatism.

More advanced theorems, like those from Schmüdgen and Putinar, extend this idea to sets defined by multiple inequalities, $g_i(x) \ge 0$ . They show that if a polynomial is strictly positive on a compact set, it can be written as a combination of the $g_i$ with SOS multipliers. For instance, a certificate for $p(x) \ge 0$ on the set where $g_1(x) \ge 0$ and $g_2(x) \ge 0$ might look for SOS polynomials $\sigma_1, \sigma_2, \sigma_{12}$ such that $p(x) - \sigma_1(x)g_1(x) - \sigma_2(x)g_2(x) - \sigma_{12}(x)g_1(x)g_2(x)$ is itself a sum of squares. Putinar's version is particularly popular because it uses a simpler certificate structure that is computationally more efficient.

One subtle point is that these powerful theorems guarantee a certificate only if the polynomial is strictly positive ( $>0$ ) on the set. What if we only need to prove non-negativity ( $\ge 0$ ), as is common in stability analysis? The solution is beautifully simple: we just add a tiny positive "nudge" $\epsilon$ . Instead of proving $-\dot{V}(x) \ge 0$ , we prove $-\dot{V}(x) + \epsilon > 0$ for some very small $\epsilon > 0$ . An SOS certificate for this slightly stronger condition serves as a perfectly valid proof for the original, non-strict inequality.

The Curse of Dimensionality and a Path Forward

For all its elegance, the SOS method has a formidable adversary: the curse of dimensionality. The size of the Gram matrix $Q$ we need to check grows explosively with the number of variables $n$ and the polynomial degree $d$ . To certify a simple degree-4 polynomial in just $n=12$ variables, the Gram matrix is already of size $91 \times 91$ . For $n=20$ , it would be $231 \times 231$ . The computational cost of the SDPs involved becomes prohibitive for large-scale systems.

But ingenuity finds a way. Researchers have developed several techniques to tame this complexity:

Sparsity: If a system has a natural structure—for example, if variables only interact with their "neighbors"—the polynomial to be certified will be sparse. Sparse SOS techniques exploit this structure to break one enormous SDP into a collection of much smaller, coupled SDPs, making the problem tractable again.
Stricter Relaxations: We can choose to search within an even smaller, but computationally simpler, subset of the SOS cone. Diagonally-dominant Sum of Squares (DSOS) and Scaled Diagonally-dominant Sum of Squares (SDSOS) are two such inner approximations. Instead of requiring the Gram matrix to be positive semidefinite (an SDP), they require it to be diagonally dominant, a condition that can be checked with much faster Linear Programs (LPs) or Second-Order Cone Programs (SOCPs). This introduces more conservatism—we might fail to find a certificate that a full SOS search would have found—but the computational savings are often worth it. Importantly, if a DSOS or SDSOS certificate is found, it is still a completely valid proof of non-negativity.

A Glimpse of the Dual World: Moments and Measures

The story has one final, beautiful twist. Every optimization problem has a "dual," a shadow problem that offers a different perspective. The dual of searching for an SOS certificate for a polynomial's non-negativity is searching for a probability measure that could explain its values. This is the moment problem.

Instead of trying to prove $p(x) - p^\star \ge 0$ directly, we can ask: "Does there exist a probability distribution $\mu$ whose support lies only on the global minimizers of $p(x)$ ?" The SOS machinery provides a way to construct a sequence of "pseudo-distributions" (moment sequences) that get closer and closer to answering this question. A remarkable result known as the Flat Extension Theorem gives a condition—a check on the rank of a "moment matrix"—that tells you when your pseudo-distribution is, in fact, a real probability measure supported on a finite number of points. When this condition is met, it not only proves that you have found the true global minimum $p^\star$ , but it also tells you exactly where all the minimizer points are located.

This duality between algebra (sums of squares) and analysis (measures) is a deep and powerful theme in modern mathematics. It reveals that our "simple" shortcut of checking for squares is connected to profound ideas about geometry and probability, providing a computational bridge between them. From certifying the safety of a robot to uncovering the hidden minima of complex functions, sum-of-squares programming stands as a testament to the power of finding the right, simpler question to ask.

Applications and Interdisciplinary Connections

We have spent some time admiring the elegant machinery of Sum of Squares (SOS) programming. We've seen that checking if a polynomial is non-negative everywhere is a hard problem, but checking if it can be written as a sum of squares of other polynomials is a tractable one, solvable with semidefinite programming. Now, let's take this machine out for a spin. Where does it take us? The answer is quite surprising. What begins as a clever algebraic trick turns out to be a master key, unlocking problems in fields that, at first glance, seem to have nothing to do with each other. We will see how this one idea helps us build safer control systems for robots and aircraft, design better digital filters for image processing, and even probe the fundamental limits of computation. It’s a wonderful illustration of the unity of mathematics and its unreasonable effectiveness in the sciences.

The Natural Home: Taming Unruly Systems in Control Theory

Perhaps the most natural and immediate application of SOS programming is in control theory, the science of making systems behave as we want them to. Many systems, from a simple pendulum to a complex chemical reactor or a spacecraft, can be described by polynomial equations. The central questions are always about stability, performance, and safety. SOS gives us a revolutionary new way to answer them.

The Quest for Stability

The first question you ask about any system is: is it stable? If I have a robot arm and I tell it to hold a position, will it stay there, or will it drift away or start oscillating wildly? The great Russian mathematician Aleksandr Lyapunov gave us a beautiful way to think about this in the late 19th century. He said a system is stable around an equilibrium point (say, the origin $x=0$ ) if you can find a function $V(x)$ , which we now call a Lyapunov function, that acts like an "energy bowl". This function must be positive everywhere except at the origin, where it is zero. If, along any trajectory of the system, the value of this function always decreases (meaning its time derivative $\dot{V}(x)$ is negative), then the state must be rolling down to the bottom of the bowl—it must be stable.

Finding such a function $V(x)$ has historically been a black art, a matter of clever guesswork. But what if our system dynamics are polynomial and we decide to search for a polynomial Lyapunov function? The conditions become:

Is $V(x)$ positive definite?
Is $\dot{V}(x) = \nabla V(x)^\top f(x)$ negative definite?

These are questions about the non-negativity of polynomials! Suddenly, the problem of finding a Lyapunov function transforms into an SOS program. We can let the coefficients of $V(x)$ be decision variables and ask the computer to find a certificate of stability. This is a monumental shift from guesswork to systematic design.

Of course, a system might not be stable from any starting point. The set of all initial states that eventually return to the equilibrium is called the Region of Attraction (RoA). For an aircraft, you desperately want to know the boundaries of its safe flight envelope; stray outside, and it may not be able to recover. SOS methods allow us to not just prove stability, but to estimate this region. We can search for the largest sublevel set $\Omega_\rho = \{x : V(x) \le \rho\}$ where the Lyapunov conditions hold.

In practice, solving a large SOS program from scratch can be difficult. A much more robust engineering approach is to start small. We can analyze the system's linearization around the equilibrium—a simpler, linear system whose stability is easy to check—to get a first guess for a quadratic Lyapunov function, $V_2(x) = x^\top P x$ . This gives us a small, ellipsoidal estimate of the RoA. Then, we can add higher-degree polynomial terms to this function and use SOS to incrementally "inflate" our certified region, progressively finding a better-fitting, non-ellipsoidal shape that more accurately captures the true RoA. This continuation method, which wisely uses the solution of a simpler problem to "warm-start" a more complex one, is a beautiful example of how theoretical tools are adapted for practical success.

It's important to remember, however, that SOS provides a sufficient but not necessary condition. If we find an SOS certificate, the system is stable. But if our search fails, we can't conclude the system is unstable. It might be that a valid polynomial Lyapunov function exists, but it's not a sum of squares, or its degree is higher than what we searched for. This gap between "non-negative" and "sum of squares" is a fundamental limitation, but one that vanishes in many important cases, and the power of the sufficient condition is often more than enough.

From Observer to Actor: Synthesizing Controllers and Respecting Limits

So far, we have acted as passive observers, analyzing a given system. The real goal of control is to be an actor: to design a control input $u(x)$ that makes a system behave well. Instead of just checking if $\dot{V}(x)$ is negative, we can design $u(x)$ to force it to be negative. This leads to the idea of a Control Lyapunov Function (CLF).

The controller synthesis problem then becomes: find a controller $u(x)$ and a CLF $V(x)$ such that the closed-loop system is stable. But there's a catch. Any real-world actuator has limits. A motor can only provide so much torque; a valve can only open so far. Our controller must respect these input constraints, say $|u(x)| \le u_{\max}$ .

Once again, SOS provides a unified framework. The input constraint itself can often be written as a polynomial inequality, for instance, $u_{\max}^2 - u(x)^2 \ge 0$ . We can then formulate an SOS program that simultaneously searches for the coefficients of the controller polynomial and the Lyapunov function, subject to all the constraints at once: $V(x)$ must be positive definite, $\dot{V}(x)$ must be negative definite, and the input constraints must be satisfied, all within a certified region of operation. This is done by applying the same algebraic S-procedure logic to each constraint, translating them into a set of convex SOS conditions that can be solved together.

The Guardian Angel: Certifying Safety and Robustness

For many systems, stability is not enough; we need to guarantee safety. A self-driving car must not only follow its lane, but it must never hit a pedestrian. We can define a "safe set" $\mathcal{C}$ by a polynomial inequality $h(x) \ge 0$ . To ensure the system never leaves this set, we need to show that on the boundary of the set (where $h(x)=0$ ), the system's velocity vector does not point outwards. This condition can be relaxed to a polynomial non-negativity constraint, which can be certified using SOS. Functions $h(x)$ used in this way are called Control Barrier Functions (CBFs), acting like an invisible "electric fence" that repels the system from danger. We can even combine these safety constraints with performance objectives, using SOS to find a controller that is both safe and efficient, navigating a complex web of state and input constraints.

Another crucial aspect of real-world control is robustness. Our mathematical models are never perfect. There are always uncertain parameters. A robust controller is one that works for a whole range of possible parameter values. How can we prove stability for an infinite number of possible systems? The answer is to treat the uncertain parameter, say $\delta$ , as a variable in our SOS program. We seek a parameter-dependent Lyapunov function $P(\delta)$ and prove that the Lyapunov conditions hold for all values of $\delta$ within its specified range (e.g., $1-\delta^2 \ge 0$ ). This transforms the problem of robust stability into a search for polynomial certificates in both the state variables $x$ and the uncertainty parameter $\delta$ , a task for which SOS is perfectly suited.

A Different Tune: Spectral Factorization in Signal Processing

Now, let us leave the world of mechanics and motion and travel to the realm of signals and information. It seems a world apart, but we will find our familiar friend, the sum of squares, waiting for us.

In signal processing, a fundamental concept is the power spectrum of a signal, which tells us how the signal's power is distributed over different frequencies. A power spectrum, being a measure of power, can never be negative. For a one-dimensional signal (like an audio recording), the celebrated Fejér-Riesz theorem tells us something wonderful: any non-negative trigonometric polynomial representing a power spectrum can be factored as $|H(e^{\jmath \omega})|^2$ . This means we can always find a single stable, causal filter $H$ whose squared magnitude response gives us precisely the desired spectrum. This is the cornerstone of classical filter design.

When we move to two or more dimensions—for instance, in image processing—a surprise awaits. This theorem breaks down. There exist 2D power spectra (non-negative trigonometric polynomials in two variables) that simply cannot be represented as the squared magnitude of a single 2D filter.

So, what is the correct generalization? A non-negative multivariate trigonometric polynomial may not be a single square, but it can always be represented as a sum of squares of magnitudes of rational functions, and in many cases, as a sum of squares of magnitudes of polynomials: $P(e^{\jmath \omega_1}, e^{\jmath \omega_2}) = \sum_i |H_i(e^{\jmath \omega_1}, e^{\jmath \omega_2})|^2$ . This is an exact echo of the algebraic structure that underpins SOS programming. We can use SOS methods to find an approximate factorization by searching for a positive semidefinite Gram matrix $Q$ that makes the resulting SOS polynomial $v^*Qv$ as close as possible to our target spectrum. The rank of the resulting matrix $Q$ tells us how many filters $H_i$ we need in our sum. This provides a powerful, convex-optimization-based tool for multivariate filter design, a problem that is otherwise notoriously difficult.

Probing the Depths: Logic and Computational Complexity

From the concrete world of engineering, we take a final leap into the most abstract of realms: the world of mathematical logic and the theory of computation. Can checking for non-negative polynomials tell us something about the notorious P vs. NP problem?

Remarkably, yes. Many famously hard combinatorial problems—like finding the largest group of mutual friends (a "clique") in a social network, or even solving a Sudoku puzzle—can be translated into a question about the feasibility of a system of polynomial equations. For example, to find a 3-clique in a graph, we can assign a variable $x_i$ to each vertex, require $x_i^2 - x_i = 0$ (so $x_i$ is either 0 or 1), require $\sum x_i = 3$ (we select 3 vertices), and add equations $x_i x_j = 0$ for every pair of vertices $(i,j)$ that are not connected by an edge. A 3-clique exists if and only if this system of equations has a solution.

How could we prove that no solution exists? The answer comes from a deep theorem in algebra called Hilbert's Nullstellensatz. One way to prove a system of polynomial equations has no solution over the complex numbers is to show that $1$ belongs to the ideal generated by the polynomials. For real numbers, the corresponding statement is that $-1$ can be written as a sum of squares of polynomials within that same algebraic structure. This is an SOS refutation: a formal, checkable proof of impossibility. By showing that $-1$ is equal to a sum of non-negative squares (plus terms that are zero for any valid solution), we reach the absurd conclusion that $-1 \ge 0$ . The only way out is that no solution exists.

This provides an algorithm: to prove no $k$ -clique exists, we can search for such an SOS certificate. This search can be done with semidefinite programming. The degree of the polynomials needed in this refutation defines a hierarchy of proof systems, the "Sum of Squares hierarchy," which is an incredibly powerful tool in theoretical computer science for understanding the difficulty of optimization problems and for designing approximation algorithms.

A Unifying Thread

Our journey is complete. We have seen the same fundamental idea—certifying positivity through a sum-of-squares decomposition—provide a powerful, computationally tractable approach to a dizzying array of problems. From ensuring a robot arm is stable, to guaranteeing a self-driving car stays on the road, to designing a 2D filter for an image, and even to proving that a graph does not contain a certain substructure. The appearance of the same mathematical structure in such vastly different contexts is no accident. It is a hallmark of a deep and beautiful concept, one that unifies seemingly disparate fields through a common language and a shared computational framework.