Non-Model Based Control

SciencePedia

Key Takeaways

Non-model based control treats sufficiently rich experimental data not just as a tool to build a model, but as the model itself.
Willems' Fundamental Lemma provides the theoretical cornerstone, proving that a single data trajectory can represent all possible behaviors of a linear system.
In the presence of noise and uncertainty, the paradigm shifts from seeking exact solutions to achieving robust performance through statistical regularization and probabilistic guarantees.
The approach encompasses a diverse toolkit, from direct data methods like DeePC to learning simplified models (Koopman operators) and adaptive trial-and-error (Reinforcement Learning).

Introduction

For centuries, controlling the world around us has meant first understanding it through the language of mathematics. From planetary orbits to electrical circuits, we have relied on creating precise models—sets of equations—to predict and manipulate system behavior. This model-based approach is the bedrock of classical engineering and science. However, the increasing complexity of modern systems, from autonomous robots to biological networks, challenges our ability to write down perfect models. What if there were a different way? What if we could bypass the difficult step of manual modeling and let a system's own behavior teach us how to control it?

This article explores the powerful and transformative paradigm of non-model based control, a philosophy where data takes center stage. We will investigate how, under the right conditions, raw data can serve as a complete and sufficient representation of a system's dynamics, eliminating the need for an explicit mathematical model. We will first journey through the core ideas that make this possible in the "Principles and Mechanisms" section, uncovering the concepts of persistence of excitation and the profound implications of Willems' Fundamental Lemma. Following that, in "Applications and Interdisciplinary Connections," we will see these principles in action, exploring a symphony of modern techniques—from data-enabled predictive control and reinforcement learning to methods that learn from uncertainty—that are revolutionizing robotics, automation, and beyond.

Principles and Mechanisms

How do we understand the world? For centuries, the path laid down by giants like Newton has been clear: observe a phenomenon, formulate a set of mathematical equations—a model—that describes it, and then use that model to predict and control. Think of the crisp, elegant law of gravitation, $F = G \frac{m_1 m_2}{r^2}$ . It’s a compact, powerful summary of how planets move. This "model-based" approach has been the bedrock of science and engineering. But what if there’s another way? What if, instead of distilling the world into a few equations, we could let the world speak for itself? What if a sufficiently rich recording of a system’s behavior could, in itself, serve as a perfect model?

This is the radical and beautiful idea at the heart of non-model-based control. It’s a shift in perspective: from viewing data as a mere tool to build a model, to seeing data as the model itself. Let’s embark on a journey to understand how this is possible.

The Magic Ingredient: Persistence of Excitation

Imagine you’re test-driving a new car, and you want to understand everything about its handling. If you only ever drive in a straight line at a constant speed, you'll learn very little about its ability to corner, its suspension response to bumps, or its agility. To truly understand the car, you need to "excite" it: you must turn the wheel, accelerate, brake, and maybe even drive over a few bumps. Your inputs—the actions you take—must be sufficiently rich and varied.

In the language of control theory, this richness has a wonderfully descriptive name: persistence of excitation (PE). An input signal is persistently exciting if it’s "wiggly" enough, over a long enough time, to shake out all the hidden dynamics of a system. It’s the opposite of a monotonous, predictable input. A constant signal is not persistently exciting; a signal that repeats the same simple pattern is not. A signal rich with many frequencies is.

But what does "wiggly enough" mean mathematically? It’s not just about random shaking. There’s a beautiful and precise structure to it. Consider a simple system where we can only give it binary inputs: a 1 ("on") or a 0 ("off"). Let's say we want our input to be persistently exciting of order 4, which means we want to be able to distinguish any sequence of 4 system responses. How could we design such an input? We could, for instance, construct a sequence that methodically isolates each possible response over time. A clever way to do this is to design an input sequence whose time-shifted windows create the standard basis vectors. For example, the sequence $\{1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, \dots\}$ would have a window starting at time 0 of $(1, 0, 0, 0)$ , a window starting a few steps later of $(0, 1, 0, 0)$ , and so on. By "activating" each time slot within a window one by one, we guarantee that our input probes the system's response in every possible way over that window length, ensuring no dynamic mode can hide from us. This ability to construct such an input, even a simple binary one, demonstrates that persistence of excitation isn't an impossibly abstract condition; it's a designable property of an experiment.

This property, this guarantee of sufficient richness in our experimental data, is the key that unlocks the door to data-driven control.

The Rosetta Stone: Willems' Fundamental Lemma

Once we have data from an experiment with a persistently exciting input, what can we do with it? This is where a truly profound result, known as Willems' Fundamental Lemma, enters the stage. It is the cornerstone of the entire behavioral approach to systems.

In simple terms, the lemma states the following:

If you have a linear time-invariant (LTI) system of order $n$ , and you perform a single experiment on it using an input that is persistently exciting of order $L+n$ , then any possible input-output behavior of that system over a time horizon of length $L$ can be expressed as a linear combination of the time-shifted segments of your single experimental trajectory.

Let's unpack this. The "order" $n$ of a system is, loosely speaking, the number of internal memory states it has. Think of it as the complexity of the system. The horizon $L$ is the length of the behavior we want to predict or control. The lemma's condition is that our input must be "rich enough" for a duration that accounts for both the behavior we're interested in ( $L$ ) and the system's internal memory ( $n$ ).

The consequence is astonishing. The data we collected, when arranged in a special kind of matrix called a Hankel matrix (which is just a neat way of stacking all the length- $L$ snippets from our experiment), becomes a dictionary. Every column is a "word"—a valid behavior the system has exhibited. The lemma guarantees that this dictionary is complete. Any valid sentence (any possible behavior of length $L$ ) can be written by combining these words. Your one experiment has captured the essence of all possible experiments of that duration. The data is the model.

Of course, this magic has a price. To achieve this, our experiment must be long enough. The total length of the experiment, $T$ , must be at least $(m+1)(L+n)-1$ , where $m$ is the number of inputs. This makes perfect sense: to generate a complete dictionary for a more complex system (larger $n$ ), with longer words (larger $L$ ), and a more expressive language (more inputs $m$ ), you need a longer text to draw from (larger $T$ ).

From Prediction to Exact Control

How is this dictionary useful? Let's start with prediction. Suppose we want to predict the next $N$ steps of a system's output, given the past $T_{\mathrm{ini}}$ steps. The total behavior has length $L = T_{\mathrm{ini}} + N$ . According to the lemma, this entire trajectory (past and future) must be a combination of the "words" in our data dictionary. We know the past part of the trajectory. So, we can search for the specific combination of dictionary words that perfectly reconstructs the known past. Once we find that unique combination, we simply use the same combination to see what the future part must be! It’s like finding a sentence in your Rosetta Stone that matches a known hieroglyphic phrase, which then immediately tells you the corresponding Greek translation.

What's truly remarkable is that if our data is noise-free and meets the PE condition, and our past observation window is long enough to resolve the system's state ( $T_{\mathrm{ini}} \ge n$ ), this data-driven prediction is not an approximation. It is exactly the same as the prediction you would get from a perfect, traditional model $(A,B,C,D)$ .

This equivalence extends from simple prediction to sophisticated optimal control. Consider the classic Linear Quadratic Regulator (LQR) problem, a cornerstone of modern control that finds the best way to steer a system to a target with minimum energy. The traditional solution involves solving a complex matrix equation called the Riccati equation, which requires a perfect model. The data-driven approach, however, can rephrase the entire LQR problem in terms of our data dictionary. Under the same conditions of PE and well-posedness, the optimal controller found directly from data is identical to the one found using the true model. Data-driven methods are not just a cheap substitute; they can be fundamentally equivalent.

We can even go deeper. We can ask questions about a system's intrinsic physical properties. For instance, is the system dissipative? Does it behave like a physical process that stores and loses energy, like a spring with friction? We can postulate a mathematical form for its "storage function" (like kinetic or potential energy) and use the measured data to directly check if the system's behavior is consistent with this property, all without ever writing down a model of its dynamics.

Living with Uncertainty: From Absolute Guarantees to Statistical Confidence

The world, however, is not a perfect, noise-free textbook. Data is messy. What happens to our beautiful theory then? This is where the approach shows its true maturity and power, by embracing uncertainty.

The Uncertainty of the Model

Suppose our experiment wasn't quite rich enough to uniquely pin down the system. Instead of one true model, our data might be consistent with a whole family of possible models. What do we do? We can't just pick one at random and hope for the best. That would be like navigating a ship knowing the iceberg is "somewhere over there."

A robust approach demands that we design a controller that works for every single model in the family of possibilities defined by our data. This is the idea of informativity for control: is our data good enough to design one controller that is guaranteed to work, no matter which of the consistent models is the real one?

The quality of our data—the strength of its persistence of excitation—directly impacts this. A very rich, highly exciting experiment will shrink the family of possible models to a very small neighborhood. This small uncertainty allows us to design a high-performance, "aggressive" controller. Conversely, if our data lacks PE in some direction, the family of models becomes unbounded in that direction. The uncertainty is infinite. Any robust controller would have to be so "conservative" to handle this infinite uncertainty that it would be practically useless—like keeping the ship in port because you don't know where the iceberg is. This creates a direct, quantifiable link between the quality of an experiment and the performance of the resulting controller.

The Uncertainty of the World

The other source of uncertainty is noise. Every measurement is corrupted, every system is buffeted by random disturbances. Imagine a system pushed around by random noise whose true maximum strength is unknown. If we only have a finite amount of data, we can measure the biggest disturbance we've seen so far, but we can never be sure we've seen the absolute worst-case disturbance the universe can throw at us. Because of this, a 100% iron-clad, worst-case guarantee of safety ("the system will never fail") is impossible to certify from finite data.

So, we must shift our philosophical stance. We move from the world of absolute certainty to the world of probabilistic guarantees. We perform many independent experiments. In some, the system might fail due to a particularly unlucky sequence of disturbances; in most, it won't. We count the failures. From this empirical failure rate, we can use powerful statistical tools, like Hoeffding's inequality, to make a statement like:

"Based on our 2000 experiments, we are 99.9% confident that the true probability of this controller failing in any given mission is less than 4.4%."

This is not a statement of absolute safety, but it is an honest, quantitative, and incredibly powerful statement of risk. It's the kind of guarantee that underpins much of modern science, medicine, and engineering.

By starting with a simple, elegant idea—that data can be the model—we have journeyed through a landscape that connects data quality to prediction, optimal control, and even fundamental physical properties. And by embracing the realities of noise and uncertainty, the framework provides a mature and powerful path toward designing intelligent systems that can learn from the world and act upon it safely and reliably.

Applications and Interdisciplinary Connections

We have spent some time understanding the foundational principles of non-model based control, learning the notes and scales, so to speak. Now, we get to hear the music. And what a symphony it is! The real world, in all its glorious complexity, rarely conforms to the neat, clean mathematical models we write down in textbooks. Friction is not a simple constant, air resistance is a mischievous function of a dozen variables, and the subtle wear and tear on a machine introduces dynamics that no engineer could predict from day one.

If our control methods relied solely on perfect, pre-written scores (mathematical models), they would often play out of tune. Non-model based control is the art and science of playing by ear. It is about entering into a direct dialogue with the system, listening to the data it produces, and adjusting our performance in real time. This philosophy opens up a breathtaking range of applications and forges deep connections with fields like machine learning, statistics, and optimization. Let us explore this new world of possibilities.

The Direct Approach: Letting the Data Speak

Perhaps the most intellectually pure form of non-model based control is to work with the raw data itself, without any attempt to first distill it into an explicit model. The idea is to let the system's recorded behavior directly inform the controller's design.

Imagine you have a recording of an expert operating a machine, a single stream of input and output data $(u, y)$ . You want to design a simple automatic controller that mimics this expert performance. A traditional approach would be to first use this data to build a mathematical model of the machine, and then design a controller for that model—a two-step, indirect process. But what if we could be more direct?

This is the magic of a technique like Virtual Reference Feedback Tuning (VRFT). We start with the goal: we want our final closed-loop system to behave like a chosen, ideal reference model, $M$ . We have the data $(u,y)$ that was produced by the real, unknown system. VRFT invites us to ask a wonderfully counter-intuitive question: if the system had been operating in a perfect closed loop with our ideal model $M$ , what reference signal and what error signal must have existed to produce the very output $y$ and input $u$ that we observed? By inverting the reference model, we can actually calculate these "virtual" signals from our data. Once we have the virtual error and the real input $u$ , designing the controller becomes a simple problem of finding a function that maps one to the other—a straightforward regression or curve-fitting task. We have tuned a controller directly from a batch of data, without ever writing down a model of the plant itself.

This idea can be scaled up dramatically. Instead of assuming a simple controller structure, what if we use the entire history of data as our implicit model? This is the foundation of Data-enabled Predictive Control (DeePC), a powerful modern technique. It relies on a profound insight from systems theory known as Willems' Fundamental Lemma, which, in essence, states that a sufficiently long data trajectory collected from a linear system contains all the necessary information to predict any future behavior of that system. The controller then works by finding a combination of past behaviors from the data that explains the current state and can be stitched together to achieve a future objective.

Of course, reality is never so clean. Real-world data is corrupted by measurement noise. If we treat this noisy data as gospel, our controller will "overfit"—it will learn the noise, not just the system's true dynamics. This is where a deep connection to statistics becomes vital. To create a robust controller, we must regularize our solution. Techniques like Tikhonov regularization or introducing slack variables are ways of telling the algorithm: "Don't trust the data perfectly. Find a simpler explanation that captures the general trend, even if it doesn't match every single noisy data point." This introduces a classic bias-variance trade-off. A heavily regularized controller might have a slight systematic bias but will be far less sensitive to random noise (low variance), making it more reliable in practice. This statistical discipline is what transforms a clever theoretical idea into a workable engineering solution.

The Art of Abstraction: Finding Simplicity in Complexity

While using data directly is powerful, another approach is to use data to find a simpler, abstract representation of the system. The universe is profoundly nonlinear, but our most powerful and elegant control theories are built on the bedrock of linear systems. Can data help us bridge this chasm?

Consider the motion of a spinning top. Describing the trajectory of a single point on its surface is a nightmare of nonlinear equations. Yet, if we change our perspective and look at conserved quantities like its angular momentum, the description becomes beautifully simple and linear. The Koopman operator framework generalizes this idea. It posits that even for a highly nonlinear system, there may exist a set of "observables" or "lifting functions" in which the dynamics evolve linearly. The challenge is that we don't know what these magical functions are.

This is where data comes to the rescue. Methods like Extended Dynamic Mode Decomposition (EDMD) can analyze data from a nonlinear system and automatically learn an approximate linear model in a high-dimensional "lifted" space. It's as if the algorithm stares at the complex dance of the system and discovers the hidden linear choreography that governs it. Once this linear model is learned from data, the full arsenal of linear control theory—from pole placement to optimal control—can be deployed to control the original nonlinear system.

Naturally, this raises a critical question: how do we choose the candidate observables for our dictionary? Do we use polynomials, radial basis functions, or Fourier series? This is the "art" in the art of abstraction. A richer dictionary gives the algorithm more freedom to find a good linear representation, reducing the ultimate approximation error. However, with a finite amount of data, an overly rich dictionary increases the risk of finding a spurious linear model that fits the training data perfectly but fails to generalize—the classic curse of overfitting.

The solution, once again, lies in statistical discipline. We must use rigorous validation techniques to choose the right level of complexity. Crucially, for time-series data from a dynamical system, we cannot simply use standard random cross-validation, as this would be like trying to predict the stock market by training on Monday's data and testing on Sunday's. We must respect the arrow of time, using methods like blocked cross-validation. Furthermore, we must evaluate our learned model not just on its one-step-ahead prediction accuracy, but on its ability to generate long-term "rollouts," as this is what truly matters for control.

The Humble Controller: Learning What We Don't Know

In many engineering systems, we already have a reasonably good model based on first principles, but we acknowledge its imperfections. Perhaps we have a great model of a robot arm's kinematics, but the friction in its joints is a mysterious, nonlinear mess. Instead of throwing away our good model, we can adopt a philosophy of humility and use data to learn only the part we don't know: the model error, or the residual dynamics.

Gaussian Process (GP) regression is an exceptionally powerful tool for this task, forging a deep link between control and Bayesian statistics. A GP does something remarkable: when it learns from data, it provides not only a prediction of the unknown function but also a principled measure of its own uncertainty—credible intervals, or "error bars." It tells you not only what it thinks the answer is, but also how confident it is in that answer.

This uncertainty quantification is a game-changer for control. Imagine a self-driving car using a GP to learn the complex interaction between its tires and the road. In regions where it has a lot of data (e.g., dry asphalt), its predictive uncertainty will be low. On a patch of black ice it has never seen before, its uncertainty will be huge. A GP-based Model Predictive Controller (GP-MPC) can use this information to be "cautiously optimistic." It can plan aggressive, high-performance maneuvers on the dry road but will automatically become more conservative and slow down when its uncertainty is high, ensuring it satisfies safety constraints with a specified probability. The controller becomes aware of the limits of its own knowledge.

This principle of "safety-aware learning" is critical for deploying adaptive algorithms in the real world. In another advanced application, a controller can be designed around a Control Lyapunov Function (CLF), which provides a formal certificate of stability for a nominal model. A GP is then used to learn the unmodeled dynamics. The controller uses the GP's uncertainty bound to robustify its decisions, ensuring stability with high probability. Even more beautifully, the system can monitor its actual performance and compare it to the predicted performance. When a significant discrepancy occurs—when the system is "surprised"—it can trigger a new data collection episode to update its model precisely in the region where it was wrong. This creates an elegant feedback loop between acting, observing, learning, and self-improvement.

The Ultimate Dialogue: Reinforcement Learning

If we take the idea of learning from interaction to its logical conclusion, we arrive at the burgeoning field of Reinforcement Learning (RL). Here, the algorithm starts with little to no prior knowledge of the system. It learns to achieve a goal simply by trial and error, guided by a scalar "reward" signal that tells it when it is doing something good or bad.

Modern RL algorithms for continuous control, such as the Deep Deterministic Policy Gradient (DDPG) algorithm, often employ a beautiful "actor-critic" architecture. The "actor" is the controller, the policy that decides what to do in a given state. The "critic" is an evaluator, a connoisseur who learns to predict the long-term future rewards that will result from taking a certain action in a certain state.

A central challenge in continuous control is credit assignment. If the actor decides to increase a motor's torque by a tiny amount, how does it know if that was a good move? Will it lead to more reward down the line? This is where the critic's role becomes indispensable. By learning the value function, the critic can provide the actor with the crucial gradient information: it can tell the actor the "slope" of the value landscape with respect to its actions. The actor then simply needs to take a small step "uphill" on this landscape to improve its policy.

Making these algorithms work in practice involves overcoming significant stability challenges. The critic is learning a moving target, because as the actor improves, the value function itself changes. To stabilize this process, techniques like "target networks" are used, where the critic learns from a more stable, slowly-changing copy of itself—akin to a student learning from a teacher who doesn't change the curriculum every five seconds. These methods, born at the intersection of control theory, neuroscience, and computer science, are enabling breakthroughs in robotics, game playing, and autonomous systems.

The Real-Time Tinkerer: Extremum Seeking

Finally, not all data-driven control requires massive neural networks and offline datasets. Sometimes, we need a simple, nimble method to optimize a system's performance in real time, without a model of how that performance depends on our tuning knobs.

Enter Extremum Seeking Control (ESC), a classic and wonderfully intuitive model-free method. Imagine you are tuning an old analog radio. You don't have a model of the circuitry; you just want the clearest signal. What do you do? You wiggle the tuning dial. If turning it slightly clockwise makes the signal clearer, you keep turning it clockwise. If it gets worse, you turn it the other way. This is precisely the logic of ESC. It adds a tiny, sinusoidal "dither" to a parameter it wants to tune and observes the effect on a performance metric. By correlating the output with the dither signal, it can estimate the gradient of the performance metric and slowly "climb" to the optimum.

This simple idea can be used to solve complex engineering problems. For example, in high-performance motion systems, sliding mode controllers are often used for their robustness, but they can suffer from "chattering," a high-frequency vibration in the control signal. ESC can be used to tune the controller's parameters online to find the sweet spot that minimizes this chattering while still guaranteeing a required level of tracking accuracy. It is a testament to the power of simple, data-driven feedback.

From the elegant inversion of VRFT to the ambitious trial-and-error of RL, from the uncertainty-aware caution of GPs to the real-time tinkering of ESC, the world of non-model based control is vast and vibrant. The unifying thread is a philosophical shift: an admission that our abstract models are always incomplete, and that the most faithful source of information about the world is the world itself. The future of intelligent systems lies in mastering this continuous, creative, and ever-deepening dialogue with reality.