Cycle-Skipping in Full-Waveform Inversion

SciencePedia

Key Takeaways

Cycle-skipping in Full-Waveform Inversion is an optimization problem where the algorithm gets trapped in a local minimum, typically because the initial model's predicted wave arrival is off by more than half a wavelength.
The standard L2 misfit function's landscape is filled with these false minima because it is mathematically related to the negative autocorrelation of the waveform, which is naturally oscillatory.
A primary strategy to avoid cycle-skipping is multi-scale inversion, which starts with low-frequency data (creating a smoother misfit landscape) and gradually incorporates higher frequencies.
Advanced solutions involve changing the misfit function itself, using concepts from Optimal Transport theory like the Wasserstein distance to create a convex landscape that eliminates local minima related to time shifts.

Introduction

Full-Waveform Inversion (FWI) represents a pinnacle of geophysical imaging, promising to create high-resolution maps of the Earth's subsurface by matching simulated seismic waves to real-world recordings. However, this powerful technique is plagued by a fundamental challenge that can derail the entire process: a phenomenon known as cycle-skipping. This issue arises when the initial guess of the Earth's structure is too far from the truth, causing the optimization algorithm to lock onto a fundamentally incorrect solution, much like a musician losing their place in a song by a full beat. This article addresses this critical knowledge gap, providing a clear explanation of what cycle-skipping is and how it can be overcome.

The following chapters will guide you through the core of this complex problem. First, we will explore the Principles and Mechanisms behind cycle-skipping, visualizing the problem as a treacherous "misfit landscape" and uncovering its mathematical roots in wave autocorrelation. We will then examine Applications and Interdisciplinary Connections, showcasing the practical strategies used in geophysics—from multi-scale approaches to revolutionary new methods inspired by abstract mathematics—and revealing surprising parallels in the field of synthetic biology. By the end, you will have a comprehensive understanding of not just the problem, but the elegant solutions that science has devised to master it.

Principles and Mechanisms

At its heart, Full-Waveform Inversion (FWI) is a matching game of cosmic proportions. We have a recording of seismic waves that have traveled through the Earth—squiggles on a chart that hold the secrets of the planet's interior. We also have a computer model of the Earth, a digital guess about its structure. We use this model to simulate our own seismic waves. The game is to tweak the model until our simulated squiggles perfectly match the real ones.

But how do we keep score in this game? The most natural way is to measure the difference between the two sets of squiggles at every single moment in time, square these differences to make them all positive, and add them all up. This total score is what we call the  $L_2$ misfit or least-squares misfit. A perfect match gives a score of zero. Any mismatch gives a positive score. Our goal is to find the Earth model that minimizes this score. We are, in essence, trying to find the lowest point in a vast, complex "misfit landscape" where each point corresponds to a different Earth model and its height is the misfit score.

If this landscape were a simple, smooth bowl, finding the bottom would be easy. We could start anywhere, feel which way is "downhill" (by calculating the gradient of the misfit), and just walk in that direction until we reach the bottom. But the landscape of FWI is rarely so simple. Instead, it is often a treacherous terrain full of hills, valleys, and potholes—a landscape shrouded in fog. And the most common and frustrating trap in this landscape is the phenomenon of cycle-skipping.

The Misfit Landscape: A Walk in the Fog

Imagine you are trying to park your car in a specific, designated parking spot, but the entire lot is covered in a thick fog. If you are already very close to your spot, you can see its outlines and easily steer your car into place. But if you start far away, you might see the faint outline of a parking spot and, thinking it's the right one, confidently park there. You have found a place to park—a low point in the local terrain—but it is the wrong one.

This is precisely what happens in FWI. The true Earth model corresponds to the deepest valley in the misfit landscape, the global minimum. However, the landscape is riddled with other, shallower valleys—local minima—that can easily trap our optimization algorithm. If our initial guess of the Earth model is too far from the truth, the "downhill" direction might point not towards the true answer, but towards one of these false parking spots. Converging to a local minimum is cycle-skipping. The algorithm has found a solution that looks locally optimal, but it is fundamentally wrong, often because it has misaligned an entire cycle (or "wiggle") of a wave.

Unpacking the Fog: The Autocorrelation Secret

Why does this landscape of false valleys even exist? The answer lies in a surprisingly beautiful and simple mathematical relationship. Let's strip the problem down to its absolute essence, a "toy model" where the only difference between our observed data, $d(t)$ , and our synthetic data, $s(t; \tau)$ , is a simple time shift, $\tau$ . Let's say the true wave is $w(t)$ and our simulation produces $w(t-\tau)$ . The misfit, $J(\tau)$ , is the squared difference integrated over time:

J(\tau) = \frac{1}{2} \int \left( w(t) - w(t-\tau) \right)^2 dt

If we expand this expression, a wonderful simplification occurs. The misfit turns out to be nothing more than the total energy of the wave minus its own autocorrelation function, $C_{ww}(\tau)$ :

J(\tau) = \text{Energy} - C_{ww}(\tau) = \left(\int w(t)^2 dt\right) - \left(\int w(t)w(t-\tau) dt\right)

The autocorrelation function is a measure of how similar a signal is to a shifted version of itself. To minimize the misfit $J(\tau)$ , we must maximize the autocorrelation $C_{ww}(\tau)$ . Of course, a wave is most similar to itself when there is no shift ( $\tau=0$ ), so the autocorrelation has its global peak there. This corresponds to the global minimum (a score of zero) in our misfit landscape.

But what happens for an oscillatory, wavy signal? Think of sliding two identical corrugated metal sheets over each other. They line up perfectly at zero shift. But they also line up quite well when you shift one sheet by a full corrugation, or two, or three. For a seismic wave with a dominant period $T_0$ , its autocorrelation will also be oscillatory. It will have a main peak at $\tau=0$ and smaller, secondary peaks at integer multiples of the period, $\tau \approx \pm T_0, \pm 2T_0, \dots$ .

Since the misfit is the negative of the autocorrelation (plus a constant), every one of these secondary peaks in the autocorrelation function creates a false valley—a local minimum—in the misfit landscape. For the simplest case of a pure sinusoid, $d(t) = \sin(2\pi f t)$ , the misfit function becomes a perfect cosine shape:

J(\Delta t) = C \left(1 - \cos(2\pi f \Delta t)\right)

Here, $C$ is a constant, $f$ is the frequency, and $\Delta t$ is the time shift error. The minima of this function occur whenever the cosine term is 1, i.e., when the time-shift error is an integer number of periods: $\Delta t = k/f$ for any integer $k$ . These are the false parking spots.

The Half-Wavelength Rule of Thumb

This simple cosine landscape immediately reveals a crucial rule. The basin of attraction for the true solution at $\Delta t = 0$ is the central valley. This valley is bounded by the nearest peaks of the misfit function, which occur at $\Delta t = \pm 1/(2f)$ . This means if your initial guess is off by more than half a period, or "half a wavelength" in time, the local "downhill" direction will point you away from the true solution and towards a wrong one. This is the famous half-wavelength criterion that haunts geophysicists. It tells us that for FWI to succeed with a standard $L_2$ misfit, our initial model must already be accurate enough to predict wave arrival times to within half a period of the highest frequency we are trying to match.

The Frequency-Wavelength-Convexity Connection

This rule immediately highlights the profound importance of frequency.

Low frequencies correspond to long wavelengths and long periods. In our landscape analogy, this means the valleys are very wide and gently sloped. It's much easier to start in the correct valley, as the half-wavelength criterion is very generous. If you are trying to match waves with a period of 1 second, your initial model can be off by almost half a second and still find its way home. A numerical experiment clearly shows that a low-frequency misfit function can have a single, broad minimum over a wide range of model parameters.

High frequencies, on the other hand, correspond to short wavelengths and short periods. The valleys in the landscape become extremely narrow and numerous. The terrain is "wiggly" and non-convex. If you are trying to match a wave with a period of 0.1 seconds, your timing must be correct to within 0.05 seconds—a daunting requirement.

This insight leads to a powerful practical strategy: multi-scale inversion. We start the game using only the lowest frequencies in our data. The landscape is smooth, the valleys are wide, and we can easily find the approximate location of the global minimum. This gives us an improved model. Then, we gradually introduce higher frequencies. At each stage, our model is already accurate enough to fall within the now-narrower basin of attraction for the higher-frequency data. We are essentially using the low frequencies to get out of the fog and into the right neighborhood, and then using the high frequencies to read the house numbers and park perfectly in our designated spot.

From a frequency-domain perspective, the "sharpness" or curvature of the central valley is proportional to the second moment of the source's power spectrum, $\int (2\pi f)^2 |S(f)|^2 df$ . High frequencies ( $f$ ) contribute disproportionately to this sharpness. A sharp valley is good for precision once you're inside it, but it's a sign that the surrounding landscape is highly oscillatory and full of traps.

From Toy Models to the Real Earth

So far, we have explored a simplified world. The real Earth is far more complex. A change in the velocity model, $c(\mathbf{x})$ , doesn't just shift waves uniformly; it stretches, squeezes, and scatters them in complicated ways. The mapping from the Earth model to the seismic data is profoundly nonlinear.

Furthermore, in a realistic 3D Earth, waves don't just travel on a single path. They bounce off multiple layers, scatter off small objects, and bend around large ones. The seismogram we record is a superposition of countless arrivals from different paths, creating a complex interference pattern. This means that two very different Earth models could, by chance, produce similar-looking seismograms due to different combinations of interfering waves. This adds even more local minima to our misfit landscape, creating traps that go beyond simple cycle-skips.

Even so, the fundamental principles we learned from our simple toy model—the oscillatory nature of the misfit and the critical role of frequency—remain the guiding lights for understanding and navigating this complex reality.

Changing the Rules of the Game

Given the treacherous nature of the standard $L_2$ landscape, a brilliant question arises: are we playing the right game? Is comparing waveforms point-by-point the smartest way to measure their similarity? This question has led to a paradigm shift in FWI, drawing inspiration from other fields of mathematics.

Practical optimization algorithms like L-BFGS are smarter than simple gradient descent. They learn the local shape of the valley as they descend, allowing them to take more efficient steps. They are like a skilled driver who can navigate a chosen valley quickly. However, they are still local explorers; they cannot see over the hills to find a better valley elsewhere. They don't change the landscape itself.

To truly solve the problem, we must change the landscape. One of the most promising approaches is to use a different misfit function, one derived from the mathematical theory of Optimal Transport. Instead of the $L_2$ norm, we can use the Wasserstein distance.

Imagine two piles of sand with different shapes, representing the energy of our two waveforms.

The  $L_2$ distance is like measuring the height difference between the two piles at every single location. It is a local, point-wise comparison. If one pile is just shifted slightly, the $L_2$ distance can be huge because you're comparing sand to bare ground at many points.
The Wasserstein distance measures the minimum "work" (mass times distance) required to shovel the sand from the first pile to form the shape of the second pile. It is a global comparison of the overall distribution of mass.

For a simple time shift, the work required to move one wave to match the other is simply proportional to the amount of the shift. The squared Wasserstein distance, $W_2^2$ , becomes a perfect quadratic bowl: $(\Delta t)^2$ . All the false valleys vanish! The landscape becomes perfectly convex for time shifts, eliminating the cycle-skipping problem at its root.

This choice of misfit is not arbitrary; it corresponds to choosing a different statistical model for the noise in our data—one that assumes errors are better described as redistributions of energy rather than point-wise additive noise. Viewing the problem through the lens of Bayesian inference, changing the misfit is equivalent to changing our likelihood function, $p(\text{data}|\text{model})$ , which reshapes the entire posterior probability landscape we are trying to explore.

While advanced misfits like the Wasserstein distance don't solve all of FWI's challenges, they represent a profound shift in perspective. By understanding the deep mathematical and physical principles behind why our methods fail, we can invent new ones that are fundamentally better. The problem of cycle-skipping, once seen as an insurmountable curse, has become a driving force for innovation, revealing a beautiful unity between wave physics, optimization theory, and abstract mathematics.

Applications and Interdisciplinary Connections

Now that we have grappled with the thorny nature of cycle skipping—this vexing tendency of our optimization algorithms to get caught in the wrong rhythm—we can take a step back and ask a more rewarding question. Where does this problem appear in the real world, and what clever strategies have scientists and engineers devised to tame it? The journey to answer this is a fascinating one, revealing deep connections that stretch from the solid crust of our planet all the way into the heart of a living cell. It is a story not just of avoiding error, but of learning to listen to the world in a more profound and intelligent way.

Listening to the Earth's Heartbeat

Imagine you are a doctor trying to use ultrasound to see inside a patient. You send sound waves in and listen to the echoes that return. The timing and shape of these echoes tell you about the tissues and organs inside. Now, imagine your patient is the entire Earth, and you want to map out its complex interior—to find reservoirs of oil and gas, to understand the plumbing of volcanoes, or to map the fault lines that generate devastating earthquakes. This is the grand challenge of geophysics, and one of its most powerful tools is Full Waveform Inversion (FWI).

The idea behind FWI is breathtakingly ambitious: we create a miniature earthquake (using, for example, a powerful vibrator truck), record the resulting seismic waves at thousands of locations, and then try to build a computer model of the Earth's crust that can perfectly replicate those recordings. The problem is, FWI is exquisitely sensitive to cycle skipping. If our initial guess of the Earth's structure is too far off, the predicted waves will be out of sync with the observed ones by more than half a cycle. The algorithm, trying to match the waves peak-for-peak, gets confused. It tries to match a peak in our simulation with the wrong peak in the real data, leading it down a path to a completely nonsensical picture of the Earth.

So, how do we proceed? The most intuitive and widely used strategy is to follow a simple, profound principle: start simple, then add complexity. Instead of trying to match the full, intricate waveform from the outset, we first listen only to the lowest notes—the lowest frequencies in our data. Low-frequency waves have very long periods. This means that even if our initial guess of the wave travel time is quite wrong, the error is still likely to be less than half a period. We are safe from cycle skipping. These low frequencies allow us to build a blurry, long-wavelength picture of the Earth—a coarse but kinematically correct background.

Once we have this blurry picture, we can use it as a starting point for the next stage, where we start listening to slightly higher frequencies. Because our model is now more accurate, the time errors are smaller, and we can safely use these shorter-period waves to add more detail without skipping a cycle. We continue this process, progressively incorporating higher and higher frequencies, sharpening the image at each step until we are using the full bandwidth of our data to see the Earth in glorious detail. This multiscale strategy, often called frequency continuation, is not just a clever trick; it can be engineered into a precise schedule, calculating the maximum safe frequency we can introduce at each stage based on our current model's accuracy. This method, however, also reveals a deeper connection between the physics of waves and the mathematics of optimization. As we add higher frequencies to resolve finer details, the underlying mathematical problem becomes more "ill-conditioned," meaning the solution becomes more sensitive and potentially unstable, demanding stronger regularization and more sophisticated numerical methods to keep it under control.

The Art of Comparison: Beyond Simple Subtraction

The multiscale approach is powerful, but it relies on having good data at low frequencies, which is not always the case. This forced scientists to ask a deeper question: Is our method of comparing waves the problem? The standard least-squares ( $L_2$ ) misfit simply subtracts the predicted wave from the observed one at every single instant in time and adds up the squares of the differences. It is, in a sense, a "stupid" comparison. It is blind to the nature of the error. If a beautiful symphony is played perfectly but starts one second too late, a simple subtraction would judge it as a cacophony. A human listener, however, would instantly recognize the error as a simple time shift.

This insight led to a revolution in FWI: the design of smarter misfit functions that are less sensitive to time shifts. The goal is to separate the kinematic error (getting the timing wrong) from the dynamic error (getting the shape or amplitude wrong). One way to do this is to decompose the signal into its amplitude and phase. Using a mathematical tool called the Hilbert transform, we can construct the "analytic signal," which allows us to define an instantaneous phase at every moment in time. We can then design an objective function that compares only the phase of the predicted and observed waves, while ignoring their amplitudes. This is particularly powerful in complex situations, like in anisotropic rocks where seismic waves travel at different speeds in different directions. In such media, some physical parameters strongly affect the wave's amplitude but have little effect on its travel time. A standard $L_2$ misfit gets hopelessly confused by this, but a phase-only misfit neatly decouples the problems, allowing the inversion to first fix the timing and then worry about the shape.

An even more elegant idea comes from a beautiful corner of pure mathematics: optimal transport theory. Imagine you have two piles of sand with different shapes. The $L_2$ approach would be to measure the height difference at every point. Optimal transport asks a more physical question: "What is the minimum amount of work required to shovel the sand from the first pile to match the shape of the second pile?" The answer, known as the Wasserstein distance, is a much more natural measure of how "different" the two piles are.

When this concept is applied to seismic traces (which are first converted into non-negative "energy densities"), the result is magical. The Wasserstein-2 ( $W_2$ ) distance between a signal and its time-shifted version turns out to be a perfect parabola as a function of the time shift. The misfit function $J_{W2}(\tau) = \frac{1}{2}\tau^2$ is perfectly convex, with a single global minimum at zero shift. It has no other local minima, and thus, cycle skipping with respect to time shifts is completely eliminated! This is a stunning example of how an abstract mathematical tool can provide a powerful, practical solution to a stubborn physical problem. Other related ideas, such as Dynamic Time Warping, which algorithmically stretches and compresses the time axis to find the best alignment, also draw from this same well of inspiration.

Changing the Rules of the Game

So far, we have seen two grand strategies: simplify the data (multiscale methods) or change how we measure the error (advanced misfits). There is a third, even more audacious approach: change the rules of the game itself.

In classical FWI, we are very strict. We demand that our simulated wavefield obey the wave equation perfectly. The optimization problem is: "Find the Earth model $m$ such that the solution $u$ to the wave equation $A(m)u=f$ best matches the observed data $d$ ." The wave equation is a hard constraint.

Wavefield Reconstruction Inversion (WRI) plays a different game. It says: let's soften that constraint. Instead of demanding that the wave equation is perfectly satisfied, let's just encourage it. The new problem becomes: "Find an Earth model $m$ and a wavefield $u$ that, together, do two things: (1) the wavefield $u$ should match our observed data $d$ , and (2) the pair $(u, m)$ should come close to satisfying the wave equation." The hard constraint becomes a penalty term in the objective function. This seemingly small change has a profound effect. By relaxing the physics, WRI can often dramatically smooth out the objective function's landscape, expanding the basin of attraction around the true solution and making the problem much easier to solve. It is a beautiful lesson in problem-solving: sometimes, the path to a better solution lies not in stricter adherence to the rules, but in a judicious relaxation of them.

From Rocks to Life: A Universal Principle

We began our journey deep in the Earth's crust, but the principle of phase locking and cycle skipping is far more universal. Does this idea appear anywhere else? Let's look at one of the most exciting frontiers of modern science: synthetic biology.

Scientists are now learning to engineer living cells to perform new functions, like producing drugs, detecting diseases, or acting as tiny biological computers. One of the fundamental components they wish to build is a biological clock or counter. Imagine an engineered bacterium with a genetic circuit that causes it to fluoresce. This circuit is an oscillator, with its own natural period, say $T_0$ . Now, we periodically shine a pulse of light on the cell. Each pulse of light "kicks" the oscillator, advancing its internal phase. The goal is to design the system so that for every pulse of light, the oscillator's phase advances by exactly one full cycle. If it does, the cell has successfully "counted" the pulse.

But here we meet our old adversary. If the period of the light pulses is too different from the oscillator's natural period, the system can fail. The kick from the light might not be enough to push the oscillator through a full cycle before the next pulse arrives. It gets stuck, failing to advance properly—it "skips a cycle" and miscounts. The mathematics describing the conditions for stable, 1:1 phase locking—the range of pulse periods for which the cell will count reliably—is precisely the same mathematics that describes the conditions for avoiding cycle skipping in seismic inversion. The "Arnold tongue" of synchronization theory, which defines the boundaries of stable entrainment for an oscillator, is the direct analogue of the convergence basin in FWI.

This is a truly remarkable revelation. The challenge of getting the rhythm right, of avoiding the trap of the skipped cycle, is a fundamental principle that nature confronts at all scales. The same mathematical laws that govern our ability to image a tectonic plate also govern our ability to engineer a bacterium to count. From the silent, slow dance of geology to the vibrant, rapid pulse of life, the universe seems to find a beautiful and unified harmony in its rhythms. The struggle to understand and master these rhythms is, in essence, the very heart of science.