Hamiltonian Neural Networks

SciencePedia

Key Takeaways

Hamiltonian Neural Networks (HNNs) learn a system's scalar energy function (the Hamiltonian) rather than its complex vector dynamics directly.
By enforcing a mathematical structure known as symplecticity, HNNs are guaranteed by their construction to conserve their learned energy over time.
The HNN framework can be extended to model real-world dissipation and external forces through port-Hamiltonian systems.
Applications of HNNs span a wide range of fields, from accelerating molecular simulations and solving quantum problems to analyzing complex networks.

Introduction

Machine learning models, particularly neural networks, have shown remarkable success in learning complex dynamics directly from data. However, standard models are often "blind watchmakers," learning correlations without understanding the underlying physical principles governing a system. This leads to a critical knowledge gap: models may violate fundamental laws like the conservation of energy, resulting in physically implausible long-term predictions. How can we build models that not only predict but also respect the foundational structure of the physical world?

This article delves into Hamiltonian Neural Networks (HNNs), a class of physics-informed models designed to solve this very problem. By integrating the principles of classical mechanics into the neural network architecture, HNNs offer a powerful new paradigm for scientific modeling. The first chapter, "Principles and Mechanisms," will explore how HNNs move beyond learning arbitrary dynamics to instead learn a single, unifying quantity—the system's energy, or Hamiltonian. You will learn how this approach mathematically guarantees energy conservation and how it can be extended to handle real-world complexities like friction. The second chapter, "Applications and Interdisciplinary Connections," will showcase the transformative impact of this approach across diverse scientific fields, from simulating molecules and materials to probing the quantum realm and uncovering structures in complex networks.

Principles and Mechanisms

Imagine you want to predict the weather. You could build a giant, intricate machine learning model, feed it decades of atmospheric data, and ask it to predict tomorrow's temperature. With enough data and a powerful enough computer, it might even do a respectable job. But would it understand the principles of thermodynamics, fluid dynamics, and radiative transfer? Probably not. It would be a "blind watchmaker," a master of correlation but a novice at causation. This is the promise and peril of learning dynamics from data.

The modern approach to learning dynamics often starts with a model called a Neural Ordinary Differential Equation (Neural ODE). Suppose we're biologists trying to model the intricate dance of proteins in a gene regulatory network, but the underlying rules of their interaction are a complete mystery. We can represent the concentrations of our proteins as a state vector $\mathbf{x}$ , and then train a neural network to learn the time derivative of that state, $\frac{d\mathbf{x}}{dt} = \text{NN}(\mathbf{x}, \theta)$ . The network, with its millions of parameters $\theta$ , becomes a universal function approximator, capable of learning virtually any system of dynamics just by observing how it behaves over time.

This is a fantastically powerful idea. It frees us from having to guess the mathematical form of the interactions, which is often an impossible task. The neural network simply learns the vector field—the arrows that tell the system where to go next at every point in its state space.

But this power comes at a cost. The network is a blank slate; it knows nothing of the fundamental laws of nature. It doesn't know about the conservation of energy, momentum, or mass. If you train it to predict the motion of a planet, it might learn a trajectory where the planet slowly spirals into the sun, or gains energy from nowhere and flies off into interstellar space. It has no built-in respect for the bedrock principles of physics. For many applications, this might be acceptable. But for a physicist, this is deeply unsatisfying. Nature isn't just a collection of arbitrary rules; it has a deep, underlying structure. Can we teach our models to respect this structure?

A Physicist's Plea: The Sanctity of Structure

The laws of physics are built on symmetries. If you take an isolated molecule and move it a few feet to the left, its internal energy doesn't change. If you rotate it, its energy doesn't change. If the molecule is water ( $\text{H}_2\text{O}$ ) and you swap the two hydrogen atoms, it's still the exact same molecule with the exact same energy. These are not just convenient facts; they are fundamental symmetries—invariance under translation, rotation, and permutation of identical particles.

A generic neural network trained on Cartesian coordinates has no idea about these symmetries. Without being explicitly told, it might predict that the molecule's energy depends on its orientation in the lab, which would imply the existence of unphysical forces and torques that would cause the molecule to start spinning out of nothing. To build a faithful model of a physical system, we must "bake in" these symmetries. We must design an architecture that is guaranteed, by its very construction, to obey these rules. This can be done by using inputs that are themselves invariant (like interatomic distances) or by designing network layers that are mathematically equivariant to these transformations.

But the most profound structure in classical mechanics is not just about these geometric symmetries. It's about a single, all-encompassing quantity that governs the entire evolution of a system: energy.

The Hamiltonian Revelation: Learning the Energy, Not the Action

In the late 18th and early 19th centuries, mathematicians Joseph-Louis Lagrange and William Rowan Hamilton discovered a revolutionary way to reformulate classical mechanics. Instead of thinking about forces and accelerations ( $F=ma$ ), they found you could describe the universe with a single scalar function. For a conservative system, this function is the total energy—the sum of kinetic and potential energy—which they named the Hamiltonian, denoted as $H$ .

This is a breathtaking shift in perspective. Instead of learning the complex, vector-valued rules of motion directly, what if we could just learn the simple, scalar energy function? Imagine the state of a system is described by its generalized coordinates $q$ (like positions) and their conjugate momenta $p$ (mass times velocity). The Hamiltonian $H(q, p)$ is a landscape defined over this "phase space." The genius of Hamiltonian mechanics is that if you know the shape of this energy landscape, you know everything about how the system will evolve in time. You don't need to specify the dynamics separately; the dynamics are encoded within the Hamiltonian itself.

The Recipe for Motion: How the Energy Landscape Guides the Universe

So, how does the Hamiltonian landscape dictate motion? Through a set of beautifully symmetric equations known as Hamilton's equations:

\dot{q} = \frac{\partial H}{\partial p}, \qquad \dot{p} = -\frac{\partial H}{\partial q}

Let's pause and appreciate what this says. The first equation tells us that the rate of change of position ( $\dot{q}$ , the velocity) is given by how the energy changes with respect to momentum. The second equation tells us that the rate of change of momentum ( $\dot{p}$ , which is the force) is given by minus how the energy changes with respect to position. Since the force is the negative gradient of the potential energy, this makes perfect sense. These two simple-looking equations are the engine of all of classical mechanics for systems without friction.

This is the principle behind the Hamiltonian Neural Network (HNN). Instead of training a network to learn the complex functions for $\dot{q}$ and $\dot{p}$ directly, we use a neural network to learn a single, simpler scalar function: the Hamiltonian $H_{\theta}(q, p)$ . Once we have our learned energy landscape, we don't need the network to tell us how to move. We use the ironclad, time-tested recipe of Hamilton's equations to compute the dynamics. The network learns the "what" (the energy), and the laws of physics provide the "how" (the equations of motion).

This is a beautiful marriage between machine learning and physics. But the true magic, the reason this whole enterprise is so profound, lies in a hidden mathematical property that guarantees energy conservation.

The Unbreakable Vow: How Structure Guarantees Conservation

Let's write Hamilton's equations in a more compact form. Let the state vector be $x = (q, p)$ . Then Hamilton's equations can be written as a single matrix equation:

\dot{x} = \begin{pmatrix} \dot{q} \\ \dot{p} \end{pmatrix} = \begin{pmatrix} 0 & I \\ -I & 0 \end{pmatrix} \begin{pmatrix} \frac{\partial H}{\partial q} \\ \frac{\partial H}{\partial p} \end{pmatrix} = \mathbf{J} \nabla_x H

Here, $\mathbf{J}$ is the famous symplectic matrix. It has a very special property: it is skew-symmetric, which means that its transpose is equal to its negative ( $\mathbf{J}^{\top} = -\mathbf{J}$ ). This one little property is the key to everything.

Let's see what happens to our learned energy, $H_{\theta}$ , as the system evolves according to these dynamics. The rate of change of energy is given by the chain rule:

\frac{dH_{\theta}}{dt} = (\nabla_x H_{\theta})^{\top} \dot{x}

Now, substitute our structured equation for $\dot{x}$ :

\frac{dH_{\theta}}{dt} = (\nabla_x H_{\theta})^{\top} (\mathbf{J} \nabla_x H_{\theta})

This expression, a vector transposed, times a matrix, times the original vector, is a scalar. A scalar is always equal to its own transpose. So let's take the transpose of the whole thing:

((\nabla_x H_{\theta})^{\top} \mathbf{J} \nabla_x H_{\theta})^{\top} = (\nabla_x H_{\theta})^{\top} \mathbf{J}^{\top} (\nabla_x H_{\theta})

But since $\mathbf{J}$ is skew-symmetric, $\mathbf{J}^{\top} = -\mathbf{J}$ . So we find:

\frac{dH_{\theta}}{dt} = - \frac{dH_{\theta}}{dt}

The only number that is equal to its own negative is zero. Therefore, we have proven that:

\frac{dH_{\theta}}{dt} = 0

This is not an approximation. It is not something that happens "on average" or only for the data we trained on. It is a mathematical certainty, an unbreakable vow enforced by the very structure of the equations. By forcing our neural network to predict the Hamiltonian and then plugging it into this symplectic structure, we have built a model that is guaranteed to conserve its learned energy, no matter what the weights and biases of the network are. The model has learned a fundamental law of physics.

The Taming of the Real World: Dissipation and Open Systems

Of course, the real world is not a perfect, frictionless paradise. Things slow down due to friction, and systems are often acted upon by external forces. Can our elegant Hamiltonian framework handle this messiness?

Wonderfully, yes. The framework can be extended into what is called a port-Hamiltonian system. The dynamics equation is modified to include terms for dissipation (energy loss) and external ports (energy input/output):

\dot{x} = \big(\mathbf{J} - \mathbf{R}(x)\big)\,\nabla_x H(x) + \mathbf{G}(x)\,u

Let's break this down.

$\mathbf{J}\nabla_x H$ is our old friend, the conservative part that shuffles energy around without changing the total.
$-\mathbf{R}(x)\nabla_x H$ is the dissipation term. The matrix $\mathbf{R}(x)$ is constrained to be positive semidefinite, meaning it always acts to remove energy or leave it the same, just like friction. It always pushes the system "downhill" on the energy landscape.
$\mathbf{G}(x)u$ is the port. It describes how external controls or forces, $u$ , can pump energy into or out of the system.

Now, our neural network can be tasked with learning not just the Hamiltonian $H$ , but also the dissipation matrix $\mathbf{R}$ and the input matrix $\mathbf{G}$ . By enforcing the mathematical constraints on these matrices (skew-symmetry for $\mathbf{J}$ , positive semidefiniteness for $\mathbf{R}$ ), we can build models that are guaranteed by construction to be passive and stable, perfectly describing the physics of real-world machines, circuits, and robots.

Preserving the Ghost in the Machine: From Equations to Algorithms

We have built a beautiful mathematical object that lives in the continuous world of calculus. But our computers live in a discrete world of finite time steps. When we simulate our HNN, we must replace the smooth flow of time with a sequence of discrete jumps. Does this preserve the magic?

The answer is a resounding "it depends on how you jump!" If we use a standard, off-the-shelf numerical integrator like the common fourth-order Runge-Kutta (RK4) method, the symplectic structure is broken. The discrete-time map produced by the integrator is not symplectic, and our carefully conserved energy will begin to drift over time.

To preserve the physics, we need to use a numerical method that also respects the structure of the problem. These are called symplectic integrators. The most famous of these is the velocity Verlet algorithm, a workhorse of molecular dynamics. Symplectic integrators don't conserve the exact Hamiltonian $H_{\theta}$ at every step. Instead, they exactly conserve a nearby "shadow Hamiltonian," $\tilde{H}$ . Because this shadow Hamiltonian is conserved, the true energy $H_{\theta}$ doesn't drift away; it just oscillates tightly around a constant value, leading to fantastic long-term stability. Other methods, such as using generating functions parameterized by neural networks, can also create exactly symplectic maps from one time step to the next.

The ultimate lesson is profound. To build machine learning models that capture the soul of a physical system, it is not enough to simply show them data. We must imbue them with the fundamental principles and structures of physics—from the symmetries of the problem, to the Hamiltonian form of the dynamics, all the way down to the symplectic nature of the simulation algorithm itself. By doing so, we move beyond blind watchmakers and begin to create models that truly understand.

Applications and Interdisciplinary Connections

We have spent some time admiring the elegant machinery of Hamiltonian Neural Networks, marveling at how they bake the deep-seated laws of energy conservation right into their very architecture. Learning the rules of a game is one thing; playing it is another entirely. Now, the real fun begins. What can we do with these physics-respecting tools? What new worlds can we simulate, what hidden laws can we uncover, and what surprising connections can we draw between seemingly disparate fields?

You will see that the power of this idea—building models that speak the language of physics—extends far beyond just getting the right answer. It allows us to forge new digital universes that behave just like our own, from the frantic dance of atoms to the subtle whispers of the quantum realm. It turns our computers into discovery engines, capable of reverse-engineering the laws of nature from observation. And, in the ultimate testament to the unity of scientific thought, it provides a lens through which we can understand complex systems of all kinds, even those far removed from classical mechanics. Let's embark on this journey and see where the principle of the Hamiltonian takes us.

The Modern Alchemist's Toolkit: Simulating Molecules and Materials

The most immediate and perhaps most impactful application of Hamiltonian-inspired machine learning lies in the world of chemistry and materials science. For centuries, we have dreamed of being able to predict the properties of a material or the outcome of a chemical reaction before ever stepping into a lab. The key to this dream is to accurately simulate the ceaseless ballet of atoms, governed by the forces between them.

The potential energy of a collection of atoms as a function of their positions, the Potential Energy Surface (PES), is precisely the potential energy term $U(\mathbf{R})$ in the system's classical Hamiltonian. If we can learn this function, we can simulate the system. Neural Network Potential Energy Surfaces (NN-PES) do exactly this. Trained on data from high-accuracy quantum mechanical calculations, these networks learn the intricate, high-dimensional energy landscape. Once trained, they can be used in Molecular Dynamics (MD) simulations to predict the motion of thousands of atoms over time, millions of times faster than the original quantum methods.

But what's the use of a simulation if it doesn't connect to the real world? The true magic lies in bridging the gap from the microscopic to the macroscopic. In a well-designed NN-PES simulation, we can measure the fluctuations of the total energy of our simulated system. According to the foundational principles of statistical mechanics, these microscopic energy fluctuations are directly related to a bulk, measurable property of the material: its heat capacity, $C_V$ . By analyzing the simulation's output, we can thus compute real-world material properties. Of course, this is not without its challenges; the neural network itself introduces a tiny amount of prediction error, or 'noise', on the energy. A careful researcher must account for this noise, as well as for the artifacts of running a finite-sized simulation, to arrive at a truly accurate estimate of the heat capacity.

To build a reliable NN-PES, however, one must respect some deep, non-negotiable principles. Imagine two water molecules, separated by a mile. The energy of the combined system should, for all practical purposes, be the sum of the energies of the two individual molecules. This seemingly obvious property is called size consistency. Similarly, if you have a box containing $M$ identical, non-interacting systems, the total energy should be $M$ times the energy of one system. This is size extensivity. A model that violates these principles is fundamentally broken—it would incorrectly believe that distant, non-interacting atoms somehow affect each other. To ensure our models are physically sound, we must build them in a way that guarantees these properties. The most successful architectures, from the pioneering Behler-Parrinello networks to modern Graph Neural Networks, achieve this by constructing the total energy as a sum of individual atomic energy contributions, where each atom's energy depends only on its local neighborhood of atoms within a fixed cutoff distance. This architectural choice is not a convenience; it is a direct implementation of a fundamental physical requirement.

The universe is rich with symmetry, and our models must be too. Consider your hands. They are mirror images of each other, a property known as chirality. Many molecules share this property, existing as 'left-handed' and 'right-handed' versions called enantiomers. An isolated molecule and its mirror image have the exact same energy. If our NN-PES is to be believed, it must assign the same energy to both. Yet how can a network, a jumble of numbers and functions, perceive something as subtle as 'handedness'? The answer, once again, comes from physics and mathematics. We can design the input features, or 'descriptors', to be sensitive to the local geometry's handedness. One elegant way is to use the scalar triple product of the vectors pointing to three neighboring atoms. This quantity, related to the signed volume of the tetrahedron they form, is a pseudoscalar: it remains the same if the molecule is rotated, but it flips its sign if the molecule is reflected into its mirror image. By including such features, the network can distinguish between enantiomers. It then becomes the model's job to learn that the energy, a true scalar, must be an even function of these pseudoscalar features. This can be enforced by the training procedure. In more complex scenarios, such as a molecule interacting with a chiral surface, the energies of the two enantiomers are different, and these very same pseudoscalar features become essential for capturing this physically crucial difference.

The 'Hamiltonian' philosophy can also be generalized. In our universe, a moving object left to itself doesn't just conserve energy; if there is friction or drag, its mechanical energy dissipates. When modeling systems at the nanoscale, like the vibrating tip of an Atomic Force Microscope (AFM), it is crucial that our model captures this. We can design a neural network that describes the cantilever's motion, but instead of a purely conservative Hamiltonian system, we can add a damping term. The beauty is that we can parameterize this term—for instance, by ensuring its coefficient is always positive using a function like $\mathrm{softplus}$ —so that the model is guaranteed by its very structure to dissipate energy, never spontaneously gain it. This is a powerful extension of the core idea: we identify the structural properties of the physics (conservation, dissipation, bounded forces) and build them directly into our network's mathematics.

Peeking into the Quantum Realm

The Hamiltonian is the undisputed sovereign of the quantum world. The central equation of quantum dynamics, the Schrödinger equation, is nothing less than a statement about the Hamiltonian operator $\hat{H}$ : $i\hbar \frac{\partial \psi}{\partial t} = \hat{H}\psi$ . It is only natural, then, that Hamiltonian-inspired machine learning finds some of its most profound applications here.

One direct approach is to use a Physics-Informed Neural Network (PINN) to solve the Schrödinger equation itself. A PINN is trained to minimize a loss function that includes how well the network's output satisfies the differential equation. But an even more elegant method exists. What if we design our network so that its very components are already exact solutions to the equation? For a free particle, the solutions are plane waves of the form $e^{i(kx - \omega t)}$ . We can construct our network's prediction as a linear combination of these plane waves, where the relationship between $k$ and $\omega$ is fixed by the physics (the dispersion relation $\omega = \frac{1}{2}k^2$ ). With this architecture, the network is guaranteed to satisfy the Schrödinger equation everywhere. The complex learning problem is brilliantly reduced to a much simpler one: just find the right combination of these waves to match the initial state of the particle.

Quantum reality is often more complex than a single particle. Chemical reactions, especially those triggered by light, can involve a molecule navigating a landscape of multiple, interacting electronic states. As the atoms in the molecule move, the system can 'hop' from one potential energy surface to another. This is called nonadiabatic dynamics. Modeling this is a formidable challenge. A naive approach of training two separate neural networks for the two energy surfaces is doomed to fail, as it ignores the crucial physical coupling between them. A far more powerful and physically sound strategy, born from the Hamiltonian viewpoint, is to learn the underlying object that gives rise to both surfaces: the diabatic Hamiltonian matrix. This is a $2 \times 2$ matrix whose elements are functions of the nuclear geometry, learned by a neural network. The adiabatic energies that we observe are simply the eigenvalues of this matrix. By diagonalizing the learned matrix, we obtain both energy surfaces and the nonadiabatic coupling vectors that govern the hops between them, all in a mutually consistent framework. This is a beautiful example of how choosing the right physical representation transforms a difficult problem into a solvable one.

Beyond dynamics, a central task in quantum physics is to find the lowest-energy configuration of a system—its ground state. The variational principle provides a powerful path: any trial wavefunction we can imagine will have an energy expectation value that is greater than or equal to the true ground state energy. So, the game is to guess a wavefunction, calculate its energy, and then tweak the guess to lower the energy, iterating until we can go no lower. What if our "guess" is a neural network? This is the revolutionary idea behind Neural Quantum States (NQS). The network's parameters become the variational parameters of the wavefunction ansatz. By feeding the network a description of the quantum state (like the configuration of a chain of spins) and using a gradient-based optimizer to minimize the expectation value of the Hamiltonian, we can find incredibly accurate approximations to the ground states of complex many-body quantum systems.

From Physics to Universal Principles

The true power of a great idea is its universality. The Hamiltonian framework, at its heart, is a structured way of thinking about dynamics, conservation, and optimization. This mode of thinking is so powerful that it transcends the boundaries of traditional physics.

In our journey so far, we have assumed we knew the form of the Hamiltonian and used a neural network to learn its parameters. But what if we don't know the physical laws governing a system at all? What if all we have is data—observations of the system's state and how it changes over time? Here, we can turn the problem on its head. Instead of using a known Hamiltonian to generate data, we can use data to discover the unknown Hamiltonian. By designing a loss function that penalizes any mismatch between the observed dynamics and the dynamics predicted by a trial Hamiltonian (represented by a neural network), we can train the network to become an effective representation of the system's true Hamiltonian. This approach, which can be elegantly combined with other mathematical frameworks like Koopman operator theory, effectively creates a 'discovery machine' for physical laws, powered by data.

And finally, who said a Hamiltonian has to represent physical energy? A Hamiltonian can be any function we wish to study or optimize. Consider the world of networks—social networks, biological networks, the internet. A common task is to find 'communities', or densely connected clusters of nodes. It turns out that this problem can be rephrased in the language of statistical physics. We can define a quantity called 'modularity' that measures the quality of a given partition of the network into two communities. Maximizing this modularity is mathematically equivalent to finding the lowest-energy state, or 'ground state', of an abstract object we can call the modularity Hamiltonian. A problem from computer science is thus mapped onto a problem from physics. We can then deploy the very same tools, like a Graph Neural Network trained to find low-energy configurations, to solve this problem and uncover the hidden structure within the network.

Conclusion

Our tour is complete. We have journeyed from the tangible world of molecules and materials to the abstract realm of quantum wavefunctions and on to the interconnected webs of complex networks. Through it all, a single, unifying thread has guided us: the principle of the Hamiltonian.

We have seen that by building the fundamental laws of nature—conservation of energy, symmetries, dissipation, and the structure of quantum mechanics—into the design of our machine learning models, we create tools of uncanny power. These models are not just black-box mimics; they are more accurate, more robust, and far more insightful. They learn not only to predict what will happen but begin to capture why it happens, in a language consistent with centuries of physical law. Whether we are calculating the properties of a new material, discovering the ground state of a magnet, or revealing the communities in a social network, this fusion of physics and machine learning represents a new and exciting way of doing science. It is a symphony of simulation and discovery, and its most beautiful music is yet to be written.

Hamiltonian Neural Networks

Introduction

Principles and Mechanisms

The Blind Watchmaker: Learning Dynamics Without Insight

A Physicist's Plea: The Sanctity of Structure

The Hamiltonian Revelation: Learning the Energy, Not the Action

The Recipe for Motion: How the Energy Landscape Guides the Universe

The Unbreakable Vow: How Structure Guarantees Conservation

The Taming of the Real World: Dissipation and Open Systems

Preserving the Ghost in the Machine: From Equations to Algorithms

Applications and Interdisciplinary Connections

The Modern Alchemist's Toolkit: Simulating Molecules and Materials

Peeking into the Quantum Realm

From Physics to Universal Principles

Conclusion

Hamiltonian Neural Networks

Introduction

Principles and Mechanisms

The Blind Watchmaker: Learning Dynamics Without Insight

A Physicist's Plea: The Sanctity of Structure

The Hamiltonian Revelation: Learning the Energy, Not the Action

The Recipe for Motion: How the Energy Landscape Guides the Universe

The Unbreakable Vow: How Structure Guarantees Conservation

The Taming of the Real World: Dissipation and Open Systems

Preserving the Ghost in the Machine: From Equations to Algorithms

Applications and Interdisciplinary Connections

The Modern Alchemist's Toolkit: Simulating Molecules and Materials

Peeking into the Quantum Realm

From Physics to Universal Principles

Conclusion