Convex Relaxations: A Guide to Solving Intractable Problems

SciencePedia

Key Takeaways

Convex relaxation is a strategy to solve intractable nonconvex problems by replacing them with simpler, solvable convex approximations that provide a bound on the optimal solution.
Key mechanisms include creating geometric "envelopes" around nonconvex functions, using algebraic substitutions like the $\ell_1$ norm, or "lifting" the problem into a higher-dimensional convex space.
Algorithms like Branch and Bound iteratively apply and tighten relaxations over smaller subproblems to systematically search for and certify a global optimum.
This technique is fundamental to modern applications, from designing robust machine learning models and optimizing financial portfolios to solving problems in image processing and network design.

Introduction

Many of the most critical challenges in science and industry—from designing a logistics network to training an artificial intelligence—can be framed as optimization problems: finding the best possible solution from a vast set of options. For some problems, the solution landscape is like a simple, smooth bowl, where any step downhill leads to the single lowest point. These are convex problems, and they are efficiently solvable. However, most real-world problems are "nonconvex," resembling a rugged mountain range with countless peaks and valleys. Standard methods get easily trapped in a local valley, mistaking it for the true lowest point. This "curse of nonconvexity" makes finding the true global optimum an often intractable task.

This article addresses this fundamental challenge by exploring the elegant and powerful strategy of convex relaxation. It provides a guide to a simple but profound idea: if the real problem is too complex, build a simpler, idealized version of it that you can solve, and use that solution to guide your search in the original, difficult landscape. You will learn how this method transforms impossible problems into tractable ones, providing invaluable insights and often, surprisingly exact solutions.

First, the Principles and Mechanisms chapter will break down what makes a problem nonconvex and introduce the core relaxation techniques, from constructing geometric envelopes to leveraging clever algebraic substitutions and lifting problems into higher dimensions. Following this, the Applications and Interdisciplinary Connections chapter will demonstrate these principles in action, revealing how convex relaxation is a unifying tool used to solve pressing problems in fields ranging from finance and engineering to machine learning and computer vision.

Principles and Mechanisms

Imagine you are a hiker in a vast, foggy mountain range, and your goal is to find the absolute lowest point in the entire region. If the landscape is a single, enormous, perfectly smooth bowl—a convex landscape—your task is simple. No matter where you start, every step downhill takes you closer to the bottom. You are guaranteed to find the single lowest point. But what if the landscape is nonconvex? What if it's a rugged, chaotic terrain of countless peaks, valleys, and hidden basins? Now your task is treacherous. Any valley you descend into might be a local minimum, a trap that feels like the bottom but hides an even deeper canyon just over the next ridge. Finding the true global minimum becomes an excruciatingly difficult, often impossible, task.

This is the curse of nonconvexity, the central villain in the world of optimization. Many real-world problems, from designing a neural network to planning a nationwide logistics network, are like navigating this rugged, foggy landscape. The rules of the problem create hills and valleys in the "solution space," and traditional methods that just "walk downhill" can easily get stuck. This is where the elegant and powerful strategy of convex relaxation comes into play. The big idea is simple and profound: if the real world is too complicated, build a simpler, idealized version of it that you can solve, and then use the solution to that simpler world to intelligently guide your search in the real one.

The Two Faces of Nonconvexity

Before we can "relax" a problem, we must understand what makes it so difficult. Nonconvexity typically shows up in two ways, and both can be understood with simple geometric pictures.

First, the domain of the problem—the map of allowed places you can search—might be broken. Imagine our hiker is told they can only walk on two disconnected mountain paths, but not in the valley between them. Even if the function we want to minimize is a simple bowl, being restricted to a nonconvex set can cause our algorithms to fail. A simple downhill-walking strategy might start on one path, see that the "best" next step is in the forbidden valley, and get completely confused, perhaps oscillating back and forth forever between the two paths, never settling on a solution.

Second, and more commonly, the rules of the problem itself create the bumpy landscape. The most innocent-looking mathematical expressions can hide treacherous nonconvex features. Consider the simple product of two variables, $w = xy$ . If you plot this function, it forms a Pringles-chip or saddle shape. This saddle is a classic example of a nonconvex function; it curves up in one direction and down in another. An optimization problem that contains a rule like this is like trying to find the lowest point on that Pringle—a task that is no longer straightforward. Other examples abound, from functions with a "kink" in them, like the concave production function $y = x^{\alpha}$ (for $0 \alpha 1$ ) which forms a dome, to abstract combinatorial rules like "you can only pick at most $k$ items".

The Strategy: If You Can't Solve It, Relax It

Convex relaxation is the art of methodically replacing these two types of nonconvexity with convex approximations.

If the search domain is broken, the relaxation is intuitive: we "pave over the gaps." For the two disconnected paths, we would relax the domain to include the valley in between, forming a single, connected, convex region. This is called taking the convex hull of the set.

If the rules are the problem, we replace the complex, nonconvex rule (like $w = xy$ ) with a set of simpler, convex rules that "contain" it. This creates a new, simplified problem whose solution space is a smooth bowl. The solution to this relaxed problem is not, in general, the solution to the original hard problem. But—and this is the crucial insight—it provides a bound. If we are minimizing, the solution to the relaxed problem gives us a definitive lower bound. It tells us, "The true answer to your hard problem, wherever it may be, cannot possibly be lower than this value." This bound is an invaluable piece of information, a guiding light in our foggy landscape. The difference between this bound and the true optimal value is known as the optimality gap.

Mechanism I: The Art of the Envelope

How, precisely, do we replace a bumpy function with a smooth one? The most common technique is to build an "envelope" around the nonconvex shape.

Imagine again the saddle shape of $w = xy$ . We can construct a convex relaxation by "sandwiching" it between simpler surfaces. We can stretch a flat plane underneath it that touches the saddle at its lowest points (a convex underestimator) and another plane above it that touches it at its highest points (a concave overestimator). In fact, we can build a complete polyhedral "envelope" using four planes that perfectly enclose the true saddle shape over a given rectangular domain. This set of four linear inequalities is known as the McCormick relaxation. Instead of the difficult constraint $w = xy$ , we now have four simple linear constraints like $w \ge a x + b y + c$ . Our problem is transformed from a nonconvex one into a simple linear program, which can be solved with breathtaking speed.

A similar idea works for other shapes. For a concave dome like $y = x^{\alpha}$ , we can build a "tent" around it. The floor of the tent is the straight secant line connecting the endpoints of the function over our domain, providing a lower bound. The roof is formed by a collection of tangent lines that lie entirely above the dome, providing upper bounds. The region inside this polyhedral tent is a convex relaxation of the original curved graph. This is a general and powerful method: replace complex curves with a collection of simple lines and planes.

Mechanism II: The Power of Divide and Conquer

These relaxations are wonderful, but they are not perfect. The optimality gap can be large, meaning our bound might be too loose to be useful. So, how do we tighten the screws on our relaxation? The answer is beautifully simple: divide and conquer.

This is the core idea behind algorithms like Branch and Bound. A relaxation that is loose over a large domain becomes progressively tighter as the domain shrinks. Let's revisit our $w=xy$ example. If we compute the lower bound over a large box, we might get a very pessimistic estimate. But if we cut that box in half and compute the bounds for each smaller sub-box, the overall bound we get (the minimum of the two sub-bounds) is significantly better, closer to the true value.

Why does this happen? The magic is in the boundaries. The McCormick relaxation, for instance, is constructed to be exact (i.e., have zero gap) at the corners of the bounding box. As we shrink the box around a potential solution, the relaxation is forced to conform more and more closely to the true function. In the limit, as we branch on a variable and shrink its domain to a single point, the relaxation becomes perfectly exact. For example, if we fix the variable $x$ to some value $x_{fix}$ , the difficult nonconvex constraint $w = xy$ becomes the simple linear constraint $w = x_{fix} y$ . At this node in our search tree, the relaxation is perfect, and the subproblem becomes convex. The Branch and Bound algorithm is a systematic way of using this property, successively dividing the problem space and using the bounds from the relaxations to prune away entire regions where the global optimum cannot possibly lie.

Mechanism III: The Magic of a Different Perspective

So far, our relaxations have been geometric. But some of the most startlingly effective relaxations are purely algebraic.

Consider the challenge of finding a "sparse" solution to a system of equations—a solution with the fewest possible non-zero entries. This is the problem of sparse recovery, which is at the heart of modern technologies like medical imaging (MRI) and digital communication. Stating this goal mathematically involves the so-called  $\ell_0$ pseudo-norm, $\|x\|_0$ , which simply counts the non-zero elements of a vector $x$ . Minimizing $\|x\|_0$ is an NP-hard, combinatorial nightmare.

The convex relaxation is audacious. We replace the non-convex, discontinuous $\ell_0$ norm with the closest convex equivalent: the  $\ell_1$ norm, $\|x\|_1 = \sum_i |x_i|$ , which is simply the sum of the absolute values of the elements. This seems like a crude approximation. Yet, astoundingly, under certain conditions on the system of equations (elegantly captured by a mathematical condition called the Null Space Property), minimizing the $\ell_1$ norm is guaranteed to find the exact same, sparsest solution as the original, intractable $\ell_0$ problem. This is not an approximation; it is an exact equivalence. By relaxing the problem, we have transformed an impossible search into an efficient linear program. This principle finds applications in finance as well, for instance, in relaxing the constraint of building a portfolio from a limited number of assets.

This reveals another profound lesson: sometimes, how you write your problem matters. Two algebraically identical formulations of a constraint may lead to vastly different convex relaxations. Adding a seemingly redundant constraint, written in a different way, can sometimes dramatically tighten the relaxation and improve the bound, a technique at the heart of advanced global optimization methods.

At the Frontier: Lifting into a New Reality

The final step on our journey is the most abstract and, perhaps, the most beautiful. For some very difficult problems, the key is not to simplify them in their own space, but to "lift" them into a higher-dimensional world where they suddenly appear convex.

Consider a Quadratically Constrained Quadratic Program (QCQP), where we want to minimize a quadratic function of $x$ subject to a quadratic constraint, like $x^\top x = 1$ (forcing $x$ to lie on a sphere). The problem is nonconvex. The relaxation strategy is to move from the vector variable $x$ to a matrix variable $X$ , which is meant to represent the outer product $xx^\top$ . The nonconvex constraint is that $X$ must be a rank-one matrix. The relaxation is to drop this rank constraint and require only that $X$ be positive semidefinite—a natural convex generalization.

This transforms the nonconvex QCQP into a convex Semidefinite Program (SDP). What is truly remarkable is that, for certain problems, the solution to this higher-dimensional convex problem can be proven to be exact. It might just happen that the optimal matrix $X$ turns out to be rank-one after all, allowing us to recover the optimal vector $x$ from the original problem. For the problem of minimizing $x^\top Q x$ subject to $x^\top x = 1$ , the exact answer is found by solving the SDP relaxation, and its value is, with stunning simplicity, the smallest eigenvalue of the matrix $Q$ . This reveals a deep and unexpected unity between the geometry of optimization and the algebra of eigenvalues, a perfect testament to the power and elegance of thinking convexly.

Applications and Interdisciplinary Connections

We have spent some time learning the rules of a beautiful game—the game of convex optimization. We've learned to recognize its players (convex functions) and its playing field (convex sets). But what is the point of a game if it is never played? The true power and elegance of these ideas are revealed only when we take them out into the wild, into the messy, complicated, and distinctly non-convex world of real problems.

You see, most interesting questions we can ask—how to design the most efficient network, how to build the most profitable investment portfolio, how to teach a machine to see—are inherently difficult. They are combinatorial nightmares, full of discrete choices and "either/or" conditions that create a landscape of possibilities with countless sharp peaks and jagged valleys. Trying to find the absolute best solution is like trying to find the single lowest point in the entire Himalayan mountain range while blindfolded. It's a computationally intractable task.

This is where convex relaxation comes in as our guide. The strategy is brilliantly simple in spirit: if the real landscape is too rugged, let's find a smooth, bowl-shaped landscape that lies just beneath it. We "relax" the hard constraints, trading binary "yes/no" decisions for continuous possibilities, and replacing jagged penalties with smooth ones. Finding the bottom of this new, convex bowl is easy. And while the bottom of our bowl might not be the exact location of the lowest point in the mountains above, it gives us a fantastic starting point, a provable lower bound on what is possible. Sometimes, to our great surprise and delight, the bottom of the bowl touches the mountain's true minimum, and we get the exact answer for free!

Let us now embark on a journey through various fields of science and engineering to witness this powerful idea in action.

The Art of Approximation: From Graphs to Engineering Design

Some of the most fundamental hard problems live in the world of networks and graphs. Consider the famous "Maximum Cut" problem: how do you partition the nodes of a network into two sets to maximize the number of connections between the sets? This has applications everywhere, from circuit layout design to statistical physics. The problem is hard because each node must be in one set or the other, a binary choice.

The breakthrough relaxation technique here is to stop thinking about the individual nodes' assignments and instead think about the relationship between every pair of nodes. We create a large matrix where each entry represents the correlation between two nodes' assignments. By enforcing that this matrix must have certain properties—specifically, that it must be positive semidefinite—we transform the problem into a convex Semidefinite Program (SDP). This leap from a vector of binary choices to a continuous matrix of correlations is a masterpiece of mathematical insight, giving an approximation that is remarkably effective.

This same spirit of relaxation appears in engineering design. Imagine you need to select a small number of sensors from a large library of candidates to best monitor a system. A good system is one that gathers a lot of "information," which can often be quantified by the smallest eigenvalue of a so-called Fisher information matrix. The problem is choosing the subset of sensors. Again, we face a combinatorial beast. The convex relaxation approach is to replace the binary choice of "in or out" for each sensor with a continuous "weight." We can then formulate the problem of finding the best weights as an SDP, maximizing a variable that represents the smallest eigenvalue. This elegant convex program provides an excellent approximation to the optimal, but computationally ferocious, sensor placement problem.

Navigating Finance and Operations Research

Decision-making in business and finance is rife with hard, discrete choices. Consider a factory manager scheduling jobs on a single machine. Each job must be done, but only one at a time. The original problem involves assigning each job to a specific, integer time slot. The convex relaxation allows a job to be "fractionally" completed across multiple time slots—a physical impossibility, of course!

While the solution to this relaxed problem is not a valid schedule, its optimal value provides a crucial piece of information: a hard limit on how good any real schedule could possibly be. The difference between the value of the best real schedule and the "fractional" one is called the integrality gap, and it quantifies the price of our simplification. Understanding this gap is fundamental to analyzing the quality of our approximations.

This idea travels directly to Wall Street. A portfolio manager faces a similar challenge. Beyond just choosing what fraction of their money to put in each asset, they face practical constraints. Opening a position in an asset might incur a fixed cost, regardless of the amount invested. This "on/off" cost makes the problem non-convex. By relaxing the binary "invest/don't invest" decision into a continuous variable, we can transform the fixed cost into a smooth penalty term in a new, convex objective function. This allows the manager to use powerful optimization tools to find a near-optimal portfolio, where the relaxation cleverly models the trade-off between an asset's expected return and the cost of activating it.

But we can be even more clever. What if the manager is limited to investing in at most, say, 15 assets out of a universe of 500? This "cardinality constraint" is notoriously difficult. A simple relaxation might be weak, creating a convex valley that lies far below the true mountain peaks. More advanced techniques, like using "perspective cuts," allow us to define a tighter relaxation—a valley that hugs the contours of the original problem more closely. These sophisticated relaxations provide much better guidance and demonstrate that there is a true art to how one simplifies a problem.

This leads to a crucial lesson: not all relaxations are created equal. For a given non-convex problem, there can be many different ways to "convexify" it. Consider modeling a hybrid system, like a thermostat that can be either on or off. A naive "big-M" relaxation creates a large, loose convex region of possibilities. A more sophisticated "convex hull" relaxation carves out the tightest possible convex set that contains all the true discrete states. Geometrically, the area of the feasible region defined by the big-M method can be significantly larger than that of the convex hull, representing a vast overestimation of possibilities. Choosing a tighter relaxation leads to far more accurate solutions.

The Magic of Seeing: Machine Learning and Vision

Nowhere has the impact of convex relaxation been more profound in recent years than in machine learning and computer vision. Take the task of image segmentation: separating the foreground from the background. We can model this as assigning a binary label (0 for background, 1 for foreground) to every pixel. The total "energy" of a labeling has two parts: a data term (how well the label fits the pixel's color) and a regularization term (a penalty for adjacent pixels having different labels). This second part, known as the Potts model, makes the problem discrete and hard.

The convex relaxation is stunningly elegant. We allow each pixel's label to be a continuous value between 0 and 1, representing its "foreground-ness." The penalty on label differences becomes the famous Total Variation (TV) regularizer. This transforms the problem into a beautiful, convex optimization problem that can be solved with incredible efficiency. What's more, for this particular problem, the relaxation is often "tight"—the minimum of the relaxed continuous problem is exactly the same as the minimum of the original discrete one! This deep connection, related to the famous min-cut/max-flow theorem in graph theory, is a cornerstone of modern image processing.

Sometimes, the magic runs even deeper. Consider the k-means clustering algorithm, a workhorse of data analysis. The algorithm alternates between two steps: assigning each data point to the nearest cluster center, and then updating each center to be the average of its assigned points. The assignment step is a discrete choice. What if we relax it? We can allow each point to have a "partial" assignment, a fractional membership in all the clusters. We are now minimizing a linear function over a convex set (the probability simplex). A basic theorem tells us the solution must lie at a vertex of this set. And what are the vertices? They are precisely the "one-hot" vectors corresponding to a hard, discrete assignment!

Think about that for a moment. We gave the problem the freedom to find a fractional, "soft" assignment, but the optimal solution snaps right back to being a discrete, "hard" one. We found a case where the convex relaxation comes at no cost whatsoever; it gives the exact integer solution. It's a beautiful example of getting a "free lunch" from the underlying mathematical structure.

Finally, convex relaxation provides the theoretical tools to analyze and design learning algorithms that are robust to attack. Imagine a zero-sum game between a machine learning model and an adversary who can maliciously flip the labels of a few training examples to corrupt the model. The adversary's actions are discrete and combinatorial. To find a robust learning strategy, the learner must solve a "minimax" problem. By relaxing the adversary's move set from a discrete set of flips to a continuous one, we create a convex-concave game. Now, the powerful Minimax Theorem can be invoked, allowing us to swap the "min" and "max" and reduce the complex game to a single, tractable convex optimization problem for the learner. This allows us to train models that are provably robust against a certain budget of adversarial attacks.

From designing circuits to seeing images, from investing money to playing games against an adversary, the principle of convex relaxation is a unifying thread. It is a language that translates the world's impossibly complex problems into a form we can understand and solve. It is a testament to the profound idea that sometimes, the best way to scale an impossibly jagged peak is to first understand the shape of the smooth valley that lies beneath.