Linear Quadratic Regulator (LQR)

SciencePedia

Key Takeaways

The LQR problem finds an optimal controller by minimizing a cost function that balances the penalty for state deviation (performance) against the cost of control effort.
The optimal LQR control is a linear state-feedback law where the gain matrix K is determined by solving the Algebraic Riccati Equation (ARE).
A stabilizing LQR controller can only exist if the system is both stabilizable (unstable modes are controllable) and detectable (unstable modes are observable in the cost function).
Through the Separation Principle, the solution to the Linear Quadratic Gaussian (LQG) problem for noisy systems is to combine a Kalman filter estimator with an LQR controller.

Introduction

At its heart, control theory is about making systems behave as we desire. But how do we define 'best' behavior? Is it the fastest response, the smoothest motion, or the one that uses the least energy? The Linear Quadratic Regulator (LQR) problem provides a powerful and elegant answer to this fundamental question. It offers a framework for designing optimal controllers by mathematically balancing performance against effort, moving beyond ad-hoc solutions to find a provably best strategy. This article demystifies the LQR problem, addressing the challenge of how to systematically derive an optimal control law for linear systems.

This article will guide you through the foundational concepts of LQR. In the first part, "Principles and Mechanisms", we will dissect the LQR cost function, understand the pivotal role of the Algebraic Riccati Equation, and explore the fundamental conditions—stabilizability and detectability—that determine if a solution is even possible. Following this, the "Applications and Interdisciplinary Connections" section will demonstrate how these core principles are extended to solve real-world challenges, such as tracking moving targets, dealing with noisy measurements through the Linear Quadratic Gaussian (LQG) framework, and forming the theoretical bedrock for modern techniques like Model Predictive Control (MPC). We begin by examining the core principles that make LQR a masterpiece of engineering thought.

Principles and Mechanisms

Imagine you are trying to balance a long pole in the palm of your hand. You don't just hold your hand perfectly still. Instead, you constantly watch the top of the pole. If it starts to lean, you move your hand to counteract the motion. You are not following a pre-planned dance; you are reacting, in real-time, to the pole's "state"—its angle and how fast it's changing. Your brain, in its own magnificent way, is solving an optimal control problem. It's trying to keep the pole upright (minimize the error) without making wild, jerky movements (minimize effort). The Linear Quadratic Regulator (LQR) is the mathematical formalization of this very idea. It provides a recipe for the best possible way to control a system, and the principles behind it are as elegant as they are powerful.

The Anatomy of "Best": The Cost Function

To find the "best" way to do something, we first need a way to keep score. In LQR, this scorekeeper is called the cost functional, usually denoted by $J$ . It's an integral over time that adds up two things at every single moment: the cost of being off-target and the cost of the effort used to get back on target. For a system whose state is a vector $x$ and whose control input is a vector $u$ , the cost for a process that runs forever (an "infinite-horizon" problem) looks like this:

J = \int_{0}^{\infty} \left( x(t)^{\top} Q x(t) + u(t)^{\top} R u(t) \right) \, dt

Let's dissect this beautiful expression.

The term $x(t)^{\top} Q x(t)$ is the state penalty. Think of $x$ as the vector of deviations from our desired state (which we'll assume is the origin, $x=0$ ). This term penalizes us for being away from our goal. The matrix $Q$ is our tuning knob. It's a "weighting" matrix that lets us decide which state deviations are more important than others. Why is it a quadratic form ( $x^\top Q x$ ) and not just the size of $x$ ? Because nature often penalizes errors quadratically! A small deviation might be perfectly acceptable, but a large one can be catastrophic. A quadratic cost reflects this: it's very forgiving of small errors but punishes large errors severely. For this to make physical sense—so that we are never "rewarded" for being far from our goal—the matrix $Q$ must be positive semidefinite ( $Q \succeq 0$ ). This guarantees that the state cost $x^{\top} Q x$ is always zero or positive.

The second term, $u(t)^{\top} R u(t)$ , is the control penalty. This is the cost of our effort—the amount of fuel burned, the electrical energy consumed, or the mechanical stress on an actuator. The matrix $R$ is another weighting matrix that lets us specify the cost of different control actions. This term is also quadratic for a similar reason: small adjustments are cheap, but large, sudden maneuvers are very expensive. To ensure that any control action, no matter how small, has some cost, we require the matrix $R$ to be strictly positive definite ( $R \succ 0$ ). This seemingly small detail is crucial. It ensures that the problem of choosing the best control input has a single, unique, well-defined answer. It guarantees that the "cost landscape" has a single smooth valley, so we can always find the bottom.

The genius of the LQR framework lies in the balance between these two terms. The controller's entire personality is defined by the matrices $Q$ and $R$ . If the elements of $Q$ are large compared to $R$ , we are saying "performance is everything, I don't care about the cost!" This creates an aggressive controller that will use a lot of energy to stamp out any deviation from the target. If $R$ is large compared to $Q$ , we are saying "be frugal with the energy, I can tolerate some deviation." This results in a lazy, sluggish controller.

What's truly profound is that the optimal control depends only on the ratio of the penalties in $Q$ and $R$ , not their absolute values. If you multiply both $Q$ and $R$ by the same positive number, say, 100, you are making both deviation and effort 100 times more "expensive". But because the trade-off between them is the same, the optimal strategy—the feedback gain $K$ —doesn't change at all! Only the final "score" $J$ will be 100 times larger. This scaling property shows that LQR is fundamentally about optimizing a trade-off.

The Solution: A Surprisingly Simple Recipe

So, we have a way to score our performance. How do we find the control law $u(t)$ that gets the best possible score (the minimum cost $J$ )? The search for this optimal control could be a nightmare. We have to choose the right value of $u$ at every single instant from now until forever.

And yet, the answer is astonishingly simple and elegant. For a linear system, the optimal control is a linear state-feedback law:

u(t) = -K x(t)

This is a remarkable result. It means the best thing to do at any time $t$ is simply to look at the current state of the system, $x(t)$ , and apply a control action that is proportional to it. The matrix $K$ is the optimal feedback gain, a set of pre-calculated constants. The controller doesn't need to remember the past or predict the future in a complex way; it just needs to know "where am I now?".

But where does this magic matrix $K$ come from? It comes from the solution to a famous equation called the Algebraic Riccati Equation (ARE). For a continuous-time system $\dot{x} = Ax + Bu$ , the ARE is:

A^{\top} P + PA - P B R^{-1} B^{\top} P + Q = 0

This equation looks intimidating, but its role is straightforward. It is a machine for calculating a special matrix $P$ . This matrix $P$ is symmetric and positive semidefinite, and it represents the optimal "cost-to-go". The quadratic form $x^{\top} P x$ tells you the minimum possible score you can get if you start your journey from state $x$ . Once you've solved the ARE for $P$ , the optimal gain $K$ is found with a simple formula:

K = R^{-1} B^{\top} P

Let's make this concrete with a simple, yet insightful, example. Consider a microbial population that is inherently unstable, modeled by $\dot{x} = x + u$ . Without control ( $u=0$ ), the population $x$ grows exponentially. We want to stabilize it at $x=0$ by minimizing a cost with $Q=2$ and $R=0.5$ . Here, $A=1$ , $B=1$ . Substituting into the ARE gives $1\cdot P + P\cdot 1 - P \cdot 1 \cdot (0.5)^{-1} \cdot 1 \cdot P + 2 = 0$ , which simplifies to $2P - 2P^2 + 2 = 0$ , or $P^2 - P - 1 = 0$ . Solving this quadratic equation for the positive solution $P$ gives $P = \frac{1+\sqrt{5}}{2}$ (the golden ratio!). The optimal gain is then $K = R^{-1}B^{\top}P = (0.5)^{-1}(1)P = 2P = 1+\sqrt{5} \approx 3.236$ . The optimal control law is $u(t) = -(1+\sqrt{5})x(t)$ . This simple rule is the mathematically perfect way to tame the unstable system according to our chosen criteria.

A Deeper View: Control and Classical Mechanics

The Riccati equation might seem like a bit of mathematical wizardry pulled from a hat. But there is a deeper, more beautiful structure lurking beneath the surface, one that connects optimal control directly to the foundations of classical physics. The LQR problem can be reformulated using the language of Hamiltonian mechanics.

We can define a special Hamiltonian matrix that encapsulates the entire problem's dynamics and costs:

H = \begin{bmatrix} A & -BR^{-1}B^{\top} \\ -Q & -A^{\top} \end{bmatrix}

This $2n \times 2n$ matrix (for an $n$ -dimensional state) governs the joint evolution of the system's state $x$ and a "costate" variable $\lambda$ , which can be thought of as the momentum of the system in the cost space. The eigenvalues of this Hamiltonian matrix have a perfect symmetry: if $\lambda$ is an eigenvalue, then so is $-\lambda$ . Under the standard assumptions for LQR, there will be exactly $n$ eigenvalues with negative real parts (stable modes) and $n$ with positive real parts (unstable modes).

The key insight is this: the set of all initial states $(x(0), \lambda(0))$ that lead to a stable trajectory—one that converges to the origin with minimum cost—forms a specific $n$ -dimensional subspace in the $2n$ -dimensional space of the Hamiltonian system. This is the stable invariant subspace. The solution to the LQR problem, the matrix $P$ from the Riccati equation, is nothing more than the linear map that connects the costate to the state for any point within this special subspace: $\lambda = Px$ . Finding the basis vectors for this subspace and solving for $P$ is an alternative, and conceptually more geometric, way to solve the LQR problem. It reveals that the algebraic Riccati equation is a consequence of this fundamental geometric property, connecting the abstract problem of optimal control to the tangible evolution of a physical system in phase space.

The Rules of the Game: When Control Is Even Possible

Having a recipe for an optimal controller is one thing. But what if the system has a fundamental flaw that makes it impossible to control? The LQR framework is not magic; it cannot violate the physical limitations of a system. Two fundamental concepts, stabilizability and detectability, define the "rules of the game" and tell us when a stabilizing LQR controller can even exist.

Rule 1: You Can't Steer a Car with No Steering Wheel (Stabilizability)

A system is stabilizable if every part of it that is inherently unstable can be influenced by the control input. If a system has an unstable mode (like a tendency to drift or explode) that is also uncontrollable, no amount of clever feedback can fix it. The controller simply has no "lever" to pull to affect that part of the system.

Consider a system composed of two decoupled parts: an unstable, uncontrollable part $\dot{x}_1 = x_1$ , and a stable, controllable part $\dot{x}_2 = -x_2 + u$ . No matter what we do with the control input $u$ , it will never affect $x_1$ . The cost function only penalizes deviations in $x_2$ and the control $u$ . The LQR controller will do a perfect job of controlling $x_2$ , finding a finite optimal cost that depends only on the initial state of $x_2$ . Meanwhile, the $x_1$ component, oblivious to the controller's heroic efforts, will happily grow to infinity. The closed-loop system as a whole is unstable, and there is nothing any state-feedback controller can do about it. The necessity of stabilizability is a fundamental truth, independent of the choice of $Q$ and $R$ . LQR can find the best path, but only if a path to the goal exists in the first place.

What about modes that are uncontrollable but naturally stable? Imagine your system has a component that vibrates, but the vibration dies down on its own. If this mode is uncontrollable, LQR can't do anything about it. In many cases, we can simply ignore it and design a controller for the rest of the system. This is only possible, however, if this stable, uncontrollable mode doesn't interfere with the controllable part of the system through either the dynamics or the cost function.

Rule 2: You Can't Correct an Error You Can't See (Detectability)

The second rule is the flip side of the first. The controller must be able to "see" the unstable parts of the system through the cost function. A system is detectable if every unstable mode contributes to the state cost $x^{\top}Qx$ .

Imagine trying to stabilize an unstable system, but you've set your $Q$ matrix such that the penalty for that specific unstable mode is zero. What will the optimal controller do? It will do nothing! From the controller's perspective, letting that mode grow unbounded costs absolutely zero. It will find that the "optimal" strategy is to apply zero control and achieve a total cost of zero, while the system state careens towards infinity. Detectability ensures this pathological situation cannot happen by requiring that any unstable part of the system is "visible" to the cost function.

These two conditions—stabilizability of $(A,B)$  and detectability of $(A, Q^{1/2})$ —are the cornerstones for the existence of a meaningful LQR solution. They are not merely mathematical fine print; they are the laws of nature for feedback control.

Beyond Infinity: Finite Horizons and Final Goals

Our discussion so far has focused on "infinite-horizon" problems, where the controller runs forever. This is perfect for regulation tasks like maintaining an airplane's altitude. But many tasks have a definite end: landing a rocket, docking a spacecraft, or executing a robotic maneuver. For these finite-horizon problems, we need to care about where the system ends up at the final time, $N$ .

The LQR framework is easily adapted by adding a terminal cost to the score, of the form $x_N^{\top} Q_f x_N$ . The matrix $Q_f$ penalizes any deviation from the desired final state. This is a natural way to specify a goal. This terminal cost also provides the crucial starting point for the solution method, which works backward in time from the final step using dynamic programming. Instead of a single Algebraic Riccati Equation, we solve a recursive Difference Riccati Equation backward from the boundary condition at time $N$ . Choosing this terminal cost is an art, often serving as a stand-in for the cost you would have accumulated had the problem continued, gracefully connecting the finite-horizon solution to its infinite-horizon cousin.

From its simple, intuitive cost function to the deep connections with Hamiltonian mechanics and the hard-won wisdom of its fundamental limitations, the LQR framework is a masterpiece of engineering thought. It provides not just an answer, but a profound understanding of the interplay between dynamics, cost, and control.

Applications and Interdisciplinary Connections

Having understood the elegant machinery of the Linear Quadratic Regulator, one might ask, "This is beautiful, but where does it live in the real world?" It is a fair question. A physical theory, no matter how beautiful, must ultimately connect with observation and application. The LQR is not merely a mathematical curiosity; it is the bedrock upon which much of modern control engineering is built. Its principles echo in fields ranging from aerospace and robotics to economics and neuroscience. In this chapter, we will embark on a journey outward from the pristine world of the LQR problem to see how its core ideas are adapted, extended, and connected to solve a staggering variety of real-world challenges.

The Workhorse: Making Things Go Where You Want

Perhaps the most fundamental task in control is not just stabilizing a system, but making it do something—making a robotic arm follow a trajectory, a chemical reactor maintain a set temperature, or an aircraft hold a certain altitude. This is the problem of reference tracking. The standard LQR formulation aims to drive the state to zero, but with a clever twist, we can teach it to chase a moving target.

The trick is to give the controller a memory. If we want the system's output to match a reference value, any persistent difference between them—the tracking error—is something we want to eliminate. A wonderfully effective way to do this is to tell the controller not just about the current error, but about the accumulation of all past errors. We augment our system's state with a new variable: the integral of the error. By including this integral term in our quadratic cost function, we penalize any sustained, lingering error. The LQR, in its relentless quest to minimize the cost, will generate control actions that force this accumulated error, and therefore the steady-state error itself, to zero.

This introduces the art of control design. How much should we penalize this integrated error compared to, say, the velocity of the system or the amount of fuel we are using? By adjusting the weights in our cost function—the $Q$ and $R$ matrices—we engage in a delicate balancing act. Increasing the penalty on the error integral might make the system respond faster to eliminate drift, but it could also cause it to overshoot the target or oscillate, like an overeager student. Decreasing the penalty on control usage ( $R$ ) allows for more aggressive action, speeding up response but potentially demanding impossible feats from our motors and actuators.

This art must also be grounded in physical reality. A state vector might contain positions in meters ( $m$ ), velocities in meters per second ( $m/s$ ), and angles in radians ( $rad$ ). Simply lumping their squares into a single cost is like comparing apples and oranges. A principled approach, one that would be familiar to any physicist, is nondimensionalization. We scale each variable by a "characteristic" value—a typical position, a maximum velocity. This transforms the problem into a dimensionless space where a 1 in one state component is comparable to a 1 in another. This not only makes the choice of weights more intuitive but also ensures the numerical problem we solve on a computer is well-conditioned and robust.

The Ghost in the Machine: Delays and Other Demons

The pristine LQR formulation assumes the control action $u(t)$ instantaneously affects the system's change $\dot{x}(t)$ . But the universe often has other plans. Signals take time to travel, chemicals take time to react, and momentum takes time to build. Time delays are everywhere. A simple delay of $\tau$ seconds, represented by the transfer function $\exp(-s\tau)$ , is not a rational polynomial and thus doesn't fit into our standard state-space framework.

An engineer's first impulse is to approximate. We can replace the transcendental delay term with a rational Padé approximant, which is a ratio of two polynomials that mimics the behavior of the delay. For instance, a first-order approximation is $P_1(s) = (2/\tau - s)/(2/\tau + s)$ . This seems like a perfectly reasonable mathematical substitution that allows us to augment our state-space model and apply the LQR machinery.

But here, nature reveals a beautiful and subtle trap. This particular approximation contains a zero in the right-half of the complex plane, at $s = 2/\tau$ . This is what we call a "non-minimum phase" zero. Such systems have a peculiar and counterintuitive habit of initially moving in the opposite direction of their eventual goal. When we incorporate this approximation and transform the problem into the standard LQR form, this troublesome zero manifests as an unstable mode of the system that is, astonishingly, completely invisible to the cost function. The LQR controller, trying to minimize cost, is blind to this lurking instability. Because the unstable mode is "undetectable" by the cost, the Algebraic Riccati Equation has no stabilizing solution. This serves as a profound lesson: our models are not reality, and the approximations we make can have deep, structural consequences that doom our designs from the start.

A Tale of Two Problems: The Duality of Control and Estimation

Let us now turn to a different, though strangely familiar, problem. Imagine you are not trying to control a system, but merely to observe it. The system is corrupted by unknown random noise, and your measurements are also noisy. What is the best possible estimate of the system's true state, given your history of noisy measurements? This is the problem of optimal estimation, and its solution is the celebrated Kalman filter.

The Kalman filter, like the LQR, involves solving a matrix Riccati equation to find an optimal gain. But here, the gain is not for feedback control, but for blending our model's prediction with the new, noisy measurement. It feels like a completely different world.

Or is it? Consider two scenarios. In Scenario 1, we have a system $(A, B)$ and we design an LQR controller to minimize a cost with weights $(Q, R)$ . This involves solving a Control Algebraic Riccati Equation (CARE). In Scenario 2, we have a different system $(A_f, C_f)$ with process noise of covariance $Q_f$ and measurement noise of covariance $R_f$ , and we design a Kalman filter. This involves solving a Filter Algebraic Riccati Equation (FARE).

Now for the magic. What if we choose the matrices for the second problem to be the transposes of the first, i.e., $A_f = A^T$ and $C_f = B^T$ ? And what if we set the noise covariances to be the control weights, $Q_f = Q$ and $R_f = R$ ? If you write down the two Riccati equations—the CARE for the control problem and the FARE for the estimation problem—you will find that they are exactly the same equation. The solution matrix for the LQR problem is identical to the solution matrix for this "dual" estimation problem.

This is the principle of duality, a concept as deep and beautiful as any in physics. It tells us that the problem of controlling a system is, in a precise mathematical sense, the same as the problem of observing its dual. Controllability, the ability to steer the state, is the dual of observability, the ability to deduce the state from its outputs. This hidden symmetry is a stunning example of the unity of mathematical structures in the physical world.

Certainty Isn't Certain: The LQG Controller

We are now equipped to tackle the full, messy reality: controlling a noisy system whose state we can only estimate through noisy measurements. This is the Linear Quadratic Gaussian (LQG) problem. It seems impossibly complex. How can we decide on the best control action when we don't even know for sure what state the system is in?

The solution is one of the most remarkable results in control theory: the Separation Principle. It states that under the "LQG" trifecta—a Linear system, a Quadratic cost, and Gaussian noise—the optimal controller can be designed in two separate, independent steps.

Design the best possible estimator. Pretend you are not controlling the system at all and design a Kalman filter to produce the optimal estimate of the state, $\hat{x}(t)$ , given the noisy measurements. This estimate is the conditional mean, which is the best guess in a mean-square error sense.
Design the best possible controller. Pretend you have perfect, noise-free measurements of the state and design a standard LQR controller, finding the optimal gain $K$ .

The separation principle guarantees that the optimal stochastic controller is simply to connect these two: take the state estimate $\hat{x}(t)$ from the Kalman filter and feed it into the LQR controller. The control law is $u(t) = -K\hat{x}(t)$ . This is called certainty equivalence: we act as if our best estimate were, in fact, the certain truth. The controller designer doesn't need to know the noise levels, and the filter designer doesn't need to know the control objectives. They can work in separate rooms, and when their designs are combined, the result is provably optimal. The poles of the resulting closed-loop system are simply the union of the LQR controller poles and the Kalman filter poles, beautifully separated.

This "miraculous" decoupling is a direct consequence of the interplay between the linear dynamics and the Gaussian statistics. If the noise is not Gaussian, or if we introduce nonlinearities like actuator limits, the principle breaks down, and the worlds of estimation and control become hopelessly entangled.

From Ideal to Real: Robustness and Modern Control

The LQG controller is optimal, but optimal with respect to a very specific mathematical criterion. In engineering practice, this mathematical optimality does not always translate to good "robustness." A key property of the LQR controller (with full state feedback) is that it has guaranteed stability margins—it can tolerate a fair amount of unmodeled delays or gain variations. Shockingly, an LQG controller can have arbitrarily small margins, making it brittle and fragile in practice.

To bridge this gap, engineers developed a set of techniques called Loop Transfer Recovery (LTR). The goal of LTR is to systematically shape the LQG design to "recover" the excellent robustness properties of its underlying LQR target loop. This is done through a clever procedure: one designs the Kalman filter by manipulating fictitious noise covariances. By pretending the process noise is large and aligned with the control input, and the measurement noise is vanishingly small, we force the Kalman filter to become extremely "fast" and aggressive. In the limit, the dynamics of the estimator become so fast that they don't interfere with the control loop, and the input-output behavior of the LQG controller asymptotically approaches that of the robust LQR controller. This requires the plant to be minimum-phase, yet another appearance of this fundamental system property.

The final connection we will explore is to the world of modern computational control. If LQR is the idealized, analytical solution, then Model Predictive Control (MPC) is its powerful, computer-driven descendant. At each and every time step, an MPC controller solves a finite-horizon LQR-like optimization problem. It computes an entire sequence of future optimal control moves, but only applies the very first one. Then, at the next time step, it takes a new measurement and solves the entire problem all over again, with a "receding horizon."

What is the connection? The LQR controller is precisely what you get from an unconstrained MPC controller if you let the prediction horizon $N$ go to infinity. Alternatively, if for a finite horizon MPC, you set the terminal cost to be the solution of the LQR's algebraic Riccati equation, the first control move of the MPC is identical to the LQR control move.

LQR is the theoretical foundation. MPC is its practical, computationally intensive implementation. The true power of MPC is that, by re-solving the optimization problem at every step, it can explicitly handle real-world constraints. It can be told, "Minimize this quadratic cost, but do not let the motor torque exceed its maximum value, and do not let the state leave this safe region." This is something the classic LQR formulation simply cannot do. The efficiency of solving these repeated optimization problems relies on the same numerical linear algebra we saw earlier, often using methods like Cholesky factorization on the symmetric positive-definite matrices that naturally arise.

From a simple principle of minimizing a quadratic cost for a linear system, we have seen a universe of connections unfold. The LQR is a tool for tracking, a lens that reveals hidden instabilities in our models, a dual to the problem of observation, the core of optimal stochastic control, and the intellectual ancestor of today's most powerful control algorithms. It stands as a testament to the power of a single, elegant idea to unify and illuminate a vast and complex world.