Operator Learning

SciencePedia

Key Takeaways

Operator learning shifts from learning functions on fixed grids to learning the underlying continuous operators, achieving discretization-invariance.
The feasibility of operator learning stems from the fact that many physical operators are approximately low-rank, which circumvents the curse of dimensionality.
Architectures like DeepONet and the Fourier Neural Operator (FNO) offer practical methods for learning operators by leveraging approximation theory and the convolution theorem.
Integrating known physical laws into the training process enables the creation of data-efficient and robust models for complex scientific and engineering applications.

Introduction

For decades, machine learning has excelled at learning functions that map finite data points to predictions, like identifying an image. However, the fundamental laws of nature are not described by such mappings but by operators—rules that transform entire functions into other functions, as seen in the partial differential equations (PDEs) that govern physics and engineering. Traditional neural networks, trained on a specific data grid, fail when the resolution changes, fundamentally limiting their ability to capture these underlying physical laws. This gap highlights the need for a new approach that can learn the timeless, continuous rules of a system, independent of how we choose to measure it.

This article introduces operator learning, a revolutionary paradigm that aims to learn the operators themselves. By doing so, these models can operate independent of the data's discretization, allowing them to generalize across different resolutions and enabling applications like zero-shot super-resolution. In the chapters that follow, we will delve into this powerful concept. The first chapter, "Principles and Mechanisms," will uncover the mathematical theory that makes learning in infinite dimensions possible and explore the architecture of two cornerstone models: the Deep Operator Network (DeepONet) and the Fourier Neural Operator (FNO). Subsequently, the "Applications and Interdisciplinary Connections" chapter will showcase how these learned operators are being used to build digital twins, drive scientific discovery, engineer future technologies, and connect with deep concepts in dynamical systems theory.

Principles and Mechanisms

A New Kind of Learning: From Functions to Operators

For decades, the marvel of machine learning has been its ability to learn functions. We show a neural network a million pictures of cats, and it learns a function that maps the vector of pixels from a new picture to a single number representing the probability of it being a cat. This is, in essence, learning a map from a high-dimensional space to a lower-dimensional one, say from $\mathbb{R}^n$ to $\mathbb{R}^m$ . This paradigm has been fantastically successful. But when we turn our gaze from identifying cats to deciphering the universe, we hit a wall.

The laws of nature are not written as mappings between finite vectors. They are written in the language of calculus—as differential equations. They describe the relationship between functions. Think of a guitar string. Its initial shape is a function, $u_0(x)$ , describing the displacement at each point $x$ along its length. The law of physics governing its vibration is an operator—a kind of abstract machine—that takes this initial function $u_0(x)$ as its input and produces a new function, $u(x, t)$ , which describes the string's shape at any future time $t$ . The solution to a partial differential equation (PDE) is precisely this: an operator that maps some input functions (initial conditions, boundary conditions, or source terms) to an output solution function.

Here lies the problem with the traditional approach. If we simulate the vibrating string by discretizing it into 100 points and train a neural network to predict its motion, that network learns a map for a 100-dimensional vector. If we then want to run a more accurate simulation with 1000 points, our network is useless. It was trained for a specific discretization and has no concept of the underlying, continuous reality. This dependence on the grid is a fundamental limitation.

Operator learning, therefore, represents a monumental shift in ambition. Instead of learning a disposable approximation for a single grid, we aim to learn the operator itself—the timeless, continuous law. We want to build a neural network that, like nature, takes an entire function as input and returns another function as output. Such a learned operator would be discretization-invariant. You could train it using low-resolution simulations and then apply it to predict the outcome on a much finer grid, or even at any continuous point in space you desire. It's the difference between memorizing the answer to $123 \times 456$ and learning the algorithm of multiplication itself.

The Secret to Infinite Dimensions: Breaking the Curse

At first glance, this ambition seems foolhardy. Functions are infinite-dimensional objects. How can a finite computer, learning from a finite number of examples, possibly learn a mapping between them? Wouldn't this be the ultimate victim of the "curse of dimensionality," where the amount of data needed grows exponentially with the dimension?

The escape route from this paradox lies in a beautiful secret about the operators that govern the physical world: while they operate in infinite-dimensional spaces, their essential behavior is often surprisingly low-dimensional.

Many operators in physics, especially those involving diffusion, smoothing, or integration, are what mathematicians call compact operators. A compact operator has a remarkable property that can be understood through an analogy with data analysis. When analyzing a complex dataset, we often use Principal Component Analysis (PCA) to find the most important directions of variation. We can capture most of the dataset's structure with just a few principal components.

A compact operator has a similar structure, revealed by the Singular Value Decomposition (SVD). The SVD breaks down the operator into a series of simple actions along "principal" input and output functions, called singular functions. Each action is weighted by a number called a singular value. For a vast number of physical operators, these singular values decay rapidly. The first singular value might be large, the second smaller, the third much smaller, and so on, quickly approaching zero. This means that the operator's behavior is dominated by its first few singular components. It might be technically infinite-dimensional, but it is approximately low-rank.

This is the key that unlocks the puzzle. We don't need to learn the operator's behavior on every conceivable input function. We only need to learn its action on the handful of "principal" functions that matter. The effective dimension of the problem is not infinite, but rather the small number of singular components, $r$ , needed to approximate the operator to our desired accuracy, $\varepsilon$ . This effective rank $r$ depends on how quickly the singular values decay, not on the dimension of the space we happen to be in. Statistical learning theory confirms this intuition, showing that the number of samples needed to learn such an operator scales with this effective rank $r$ , not the ambient dimension of our grid. The curse of dimensionality is broken.

Two Master Strategies: DeepONet and FNO

Knowing that operator learning is possible in principle, how do we actually build a neural network to do it? Two brilliant strategies have emerged, each embodying a deep mathematical idea.

DeepONet: The Universal Builder

The first architecture, the Deep Operator Network (DeepONet), is based on a classic idea in approximation theory: any reasonable function can be represented as a weighted sum of some basis functions. For example, a sound wave can be represented as a sum of sines and cosines in a Fourier series. DeepONet learns to discover this kind of representation on the fly.

It does this with a clever dual architecture consisting of a branch network and a trunk network.

The branch network is the "sensor." It looks at the input function (for example, by sampling it at a few fixed locations) and its job is to compute the coefficients of the basis expansion. It answers the question: "For this specific input function, how much of each basis function do I need?"
The trunk network is the "pattern generator." It takes a coordinate in the output domain (say, a point in space, $x$ ) and its job is to produce the value of the basis functions at that very point. It answers the question: "What do my basis patterns look like right here?"

The final prediction is simply the dot product of the outputs of the two networks: $G(u)(y) \approx \sum_{k=1}^{p} b_k(u) t_k(y)$ , where the coefficients $b_k$ come from the branch net processing input $u$ , and the basis function values $t_k$ come from the trunk net processing output coordinate $y$ . This elegant structure is a direct implementation of what mathematicians call a separable approximation.

The power of this design is its immense flexibility. If the problem depends on other parameters—like a varying diffusion coefficient or even a changing domain geometry—we can simply feed that information to the appropriate network. Global parameters that define the entire problem instance go into the branch network; local features that describe the space around a query point go into the trunk network.

Fourier Neural Operator: The Master of Waves

The second strategy, the Fourier Neural Operator (FNO), is inspired by a different but equally profound principle: the convolution theorem. Many physical processes, like the spreading of heat, are described by convolution. A convolution is an operation where the output at a point is a weighted average of the inputs around it, with the weights defined by a "kernel" function.

While convolution in physical space can be computationally expensive, the convolution theorem tells us that in Fourier space, this complex operation becomes a simple element-wise multiplication. An FNO layer brilliantly exploits this:

Transform: It takes an input function (represented on a grid) and computes its Fourier transform using the Fast Fourier Transform (FFT). This converts the function into its constituent frequencies or "modes."
Filter: In Fourier space, it multiplies a subset of the low-frequency modes by a set of learned weights. This is the heart of the FNO, where it learns the spectral signature of the convolution kernel. High-frequency modes are often discarded, which has a regularizing effect, akin to assuming the mapping is smooth.
Inverse Transform: It applies the inverse FFT to transform the filtered modes back into physical space, yielding the result of a learned global convolution.

This sequence—FFT, learned linear transform, inverse FFT—is called a spectral convolution. Between these spectral convolution layers, a simple, pointwise nonlinear activation function is applied. This nonlinearity is critical; it allows the FNO to build up approximations to highly complex, nonlinear operators, far beyond simple convolution.

The genius of FNO is that the learned weights are in Fourier space, independent of the grid resolution. This means we can train the model on a coarse $64 \times 64$ grid, and because the kernel is defined in the continuous Fourier domain, we can apply it at test time on a fine $1024 \times 1024$ grid without any retraining. This property, sometimes called zero-shot super-resolution, is a direct consequence of learning the operator in a discretization-invariant way.

Teaching the Laws of Physics

Learning from data is powerful, but what if the data is noisy or incomplete? How can we ensure our learned operator respects the fundamental physical laws we know to be true? This is where we move from pure data-driven learning to physics-informed operator learning.

One subtle but deep consideration is a consistency check between the operator and the data, known as the Picard condition. In essence, it says that for a stable learning process, the training data must "make sense" for the operator being learned. An operator that smooths things out (like heat diffusion) has rapidly decaying singular values. If we try to train it on data where smooth inputs map to noisy outputs, the learned operator will be forced to amplify high-frequency components, becoming unstable and useless for generalization. A successful learned operator must have its structure, particularly its singular value decay, matched by the statistical properties of the training data.

More directly, we can bake physical constraints right into the training objective. Imagine learning the operator for an incompressible fluid, like water. A fundamental law is that the velocity field $u$ must be divergence-free: $\nabla \cdot u = 0$ . We can teach this to our neural operator by adding a penalty to its loss function. Alongside the usual term that measures the error against the training data, we add a term, like $\rho \|\nabla \cdot u_{\text{pred}}\|^2$ , that penalizes the network any time it produces a velocity field that is compressible. By minimizing this combined loss, the network learns to find a solution that not only fits the data but also obeys the laws of physics. This approach differs from a Physics-Informed Neural Network (PINN), which typically learns the solution to a single PDE instance; here, we are learning the entire operator for a family of problems in a physics-informed way.

These principles even guide us in overcoming practical challenges. The FNO, in its purest form, is designed for periodic domains (like the surface of a donut). To apply it to real-world, non-periodic problems, we can use clever strategies: we can "lift" the problem by reformulating it into one with homogeneous (zero) boundary conditions, or we can replace the Fourier basis altogether with one better suited for bounded domains, like Chebyshev polynomials, turning a Fourier Neural Operator into a Chebyshev Neural Operator. In each case, the core idea remains: by understanding the deep principles of the physics and the mathematics, we can design learning architectures that are not just powerful, but also elegant and true to the structure of the natural world.

Applications and Interdisciplinary Connections

In the last chapter, we uncovered the central idea of operator learning: we can teach a machine not just to find patterns in data points, but to learn the rules themselves—the mathematical operators that govern how a system behaves. Instead of learning a function that maps a number to a number, we learn an operator that maps an entire function, like the distribution of heat in a room, to another function, like how that distribution will look a moment later. This is a profound shift in perspective. But what is it good for? Where does this abstract idea meet the real world?

As it turns out, everywhere. The language of operators is the language of physics, of engineering, of biology. Once you start looking, you see them all around you. In this chapter, we will go on a tour of some of these remarkable applications. We will see how learning operators allows us to build virtual laboratories, to turn prediction into scientific discovery, to engineer the technologies of the future, and even to connect with some of the deepest and most beautiful ideas in the theory of dynamics.

The Digital Twin: Building Virtual Laboratories

Imagine you want to understand how heat spreads through a complex object, say, a processor chip with an intricate layout of components. The temperature at any single point on that chip doesn't just depend on its own properties; it depends on the thermal conductivity of the entire chip, the location of all heat sources, and the temperature at the boundaries. Change the material in one corner, and the temperature profile everywhere shifts. The solution is inherently nonlocal; everything affects everything else. This is the signature of a problem described by a partial differential equation (PDE), and the mapping from the input functions (conductivity, heat sources) to the output function (the temperature field) is a classic example of an operator.

A learned operator model can act as a digital twin of this chip. After training, it becomes a lightning-fast simulator. An engineer can propose a new layout or a different material, and the operator model can instantly predict the resulting steady-state temperature, bypassing a slow and costly traditional simulation.

But how do we train such a digital twin to be trustworthy? We can show it examples from experiments or high-fidelity simulations, but what if our data is sparse or noisy, as is often the case in the real world? This is where a beautiful idea comes into play: we can teach the model the rulebook directly. We can build the laws of physics right into the training process. This approach, often associated with Physics-Informed Neural Networks (PINNs), combines two sources of information. The training objective penalizes the model for two things: mismatch with the observed data, and any violation of the governing physical laws—the PDE itself, the boundary conditions (e.g., "no heat escapes from this side"), and the initial conditions. This physics-informed regularization acts as a powerful guide, forcing the model to find solutions that are not just consistent with the data but are also physically plausible. It's like telling a student, "Your answer must not only match the back of the book, but you must also show your work, and your work must obey the laws of algebra." This makes the learned operator incredibly data-efficient and robust.

To gain confidence in these methods, we first test them on problems where we know the exact answer. Consider a perfect circle. If we specify a temperature profile on its boundary, what is the heat flux flowing out at every point? This is the famous Dirichlet-to-Neumann (DtN) map, a fundamental operator in mathematical physics. For a circle, this operator has a wonderfully simple structure when viewed through the lens of Fourier analysis: it simply multiplies each frequency component of the boundary temperature by a number proportional to its frequency. When we train a Fourier Neural Operator (FNO) on examples of this DtN map, it learns to do exactly that! It discovers the correct spectral multipliers from data alone, perfectly replicating the analytical solution and generalizing to unseen boundary conditions and even different grid resolutions. Seeing a neural network independently discover a classic piece of physics is a truly inspiring moment, and it gives us the confidence to tackle problems where we don't know the answer.

From Prediction to Discovery

The ability to create fast surrogate models is powerful, but operator learning can take us a step further—from just making predictions to enabling new scientific discoveries. A trained operator is not a black box; it is a mathematical object whose internal structure can hold clues about the system it has learned.

Imagine again our heat conduction problem, but now we don't know the properties of the material. All we have are measurements of temperature fields resulting from various heat sources. Suppose the material is anisotropic, like a piece of wood or a composite crystal, where heat flows more easily along the grain than across it. Can we discover this hidden property from the data? Remarkably, yes. We can train an FNO to learn the operator mapping the heat source to the temperature field. The learned operator will have a specific structure in its Fourier-space filter. By "looking inside" this learned filter and performing a kind of inverse-engineering, we can reconstruct the underlying anisotropic diffusion tensor of the material. The shape of the learned filter reveals the principal directions of heat flow and the degree of anisotropy. The model has acted like a computational microscope, allowing us to see the invisible internal structure of the material. This is a paradigm shift: the learned model is no longer just a predictor; it's an instrument for system identification.

Now, let's turn to one of the grand challenges of classical physics: turbulence. The swirling, chaotic motion of a fluid, from the cream in your coffee to the airflow over an airplane wing, is governed by the Navier-Stokes equations. While the equations are known, simulating them directly is so computationally expensive that it's impractical for most engineering applications. For decades, engineers have relied on simplified models, like the Reynolds-Averaged Navier-Stokes (RANS) equations, which are faster but often inaccurate because they fail to capture the complex effects of turbulent eddies.

Operator learning offers a new way forward. We can frame the problem as learning a correction operator. This operator takes as input a description of the mean flow (represented by certain physical quantities that are invariant to the observer's frame of reference) and outputs a correction to the deficient terms in the RANS model. An FNO is a natural choice here because turbulence involves interactions across many scales, a non-local phenomenon that FNOs are designed to capture. Moreover, the convolutional structure of an FNO is naturally translation-equivariant, respecting the physical principle that the laws of physics don't depend on where you are in space. Learning this closure operator is at the frontier of computational fluid dynamics and could revolutionize the design of everything that moves through a fluid.

Engineering the Future

The ultimate promise of learning physical operators is to accelerate the cycle of design and innovation.

Consider the immense challenge of designing a modern jet engine turbine blade. It must withstand extreme temperatures and mechanical stresses. Its performance depends on a complex interplay of thermal and mechanical properties that can vary from point to point within the material. This is a coupled, multiphysics problem of the highest order. The governing equations involve mechanics (stress and strain), heat transfer, and irreversible plastic deformation, where the plastic work itself generates more heat, creating a tight feedback loop.

Traditionally, an engineer might propose a new design, and then wait hours or days for a simulation to finish. With operator learning, we can build a model, like a DeepONet trained with physics constraints, that learns the solution operator mapping the material property fields to the resulting stress and temperature fields. An engineer could then query the model with a new material layout, and it would provide a near-instantaneous prediction of the component's performance. The trained operator becomes a true partner in the creative process, allowing for rapid exploration of the design space. To make this work, the model must be trained to respect all the intricate physics, including the non-smooth "if-then" logic of plasticity—materials behave elastically until they reach a yield stress, after which they deform permanently. These conditions must be encoded as penalties in the physics-informed loss function, guiding the network to learn the correct, complex material behavior.

Of course, the world is not always a neat, rectangular grid that is friendly to Fourier transforms. What if we want to model airflow over a complex, curved airplane wing, or blood flowing through a tangled network of arteries? For such problems with irregular geometries, the FNO is not the ideal tool. Here, we turn to another member of the operator learning family: the Graph Neural Operator (GNO). A GNO represents the domain as a graph, a collection of nodes (points in space) connected by edges. It learns the operator by mimicking the structure of an integral, passing messages between neighboring nodes on the graph. This makes GNOs incredibly flexible and the natural choice for problems on non-uniform meshes or complex, real-world geometries. The existence of different architectures like FNOs and GNOs shows the richness of the field, providing a toolbox of specialized instruments for different kinds of physical problems.

Deeper Connections: The Language of Dynamics

So far, we have viewed operator learning through the practical lens of science and engineering. But it also connects to some of the most elegant and profound ideas in the mathematical theory of dynamical systems.

In the 1930s, the mathematician Bernard Koopman had a brilliant insight. When studying a nonlinear dynamical system—say, the state of a gene regulatory network evolving in time according to $x_{t+1} = f(x_t)$ —instead of tracking the state $x_t$ itself, which evolves nonlinearly, why not track some observable quantities of the state, $g(x_t)$ ? Koopman showed that it's possible to find special observables—called Koopman eigenfunctions—that evolve linearly in time, even when the underlying system is highly nonlinear. The evolution of all observables is governed by a linear operator, the Koopman operator.

From this viewpoint, much of operator learning can be seen as a data-driven quest to find an approximation of this magical, linearizing Koopman operator. When we train a model to find an embedding $z_t = \Phi(x_t)$ where the dynamics become linear, $z_{t+1} \approx K z_t$ , we are essentially trying to learn the Koopman eigenfunctions. These eigenfunctions are not just a mathematical curiosity; they are deeply interpretable. Their corresponding eigenvalues tell us the characteristic timescales of the system—the natural frequencies and decay rates of its fundamental modes. For a biological system, this could reveal the relaxation rates of different functional modules within a cell. Furthermore, any conserved quantity of the system, like total energy or mass, corresponds to a Koopman eigenfunction with an eigenvalue of exactly 1. A learned operator that captures this structure can therefore uncover the fundamental conservation laws of a system directly from time-series data.

This pursuit of stability and structure provides a powerful inductive bias for learning. If we know a biological system is homeostatic and returns to equilibrium after a perturbation, we can enforce this by constraining the learned linear operator $K$ to be stable (i.e., its spectral radius must be less than 1). This helps the model make more reliable long-term predictions and avoid spurious instabilities, a common pitfall for more generic models like standard Recurrent Neural Networks (RNNs).

However, we must end on a note of caution. A learned operator is still an approximation, and using it to prophesy the distant future is a delicate business. When we iterate our learned one-step model, tiny errors made at each step can accumulate. The nature of this accumulation depends on the physics of the system itself. For dissipative systems that naturally lose energy, like a cooling cup of coffee, the dynamics are often self-correcting, and the long-term prediction error can remain bounded by a constant. But for conservative or energy-preserving systems, like an idealized wave, there is no such damping mechanism. Errors can accumulate, often growing linearly with the number of prediction steps, leading to a steady drift from the true solution. Understanding this error behavior is a crucial part of the science, reminding us that even with our most powerful tools, we must remain humble about the limits of prediction.

From practical engineering design to the abstract beauty of dynamical systems theory, operator learning provides a unifying language. It is a framework for teaching machines to understand the rules of the game, not just the final score. As this field matures, we will undoubtedly find it speaking to us in surprising new ways, revealing connections and enabling discoveries we can currently only imagine.