Subhalo Abundance Matching

SciencePedia

Key Takeaways

Subhalo Abundance Matching (SHAM) establishes a direct, rank-ordered relationship between galaxy properties, like stellar mass, and dark matter subhalo properties.
To overcome the effects of tidal stripping on satellite galaxies, SHAM uses historical halo properties like peak mass ( $M_{\rm peak}$ ) or peak velocity ( $V_{\rm peak}$ ).
The model successfully predicts large-scale galaxy clustering and explains phenomena like galaxy assembly bias by linking galaxy properties to halo formation history.
Modern SHAM incorporates complexities like scatter, a central/satellite galaxy split, and "orphan" galaxies to create highly realistic mock universe catalogs.

Introduction

The modern understanding of the cosmos reveals a universe dominated by an invisible substance: dark matter. This dark matter forms a vast, intricate "cosmic web" of filaments, nodes, and voids that provides the gravitational scaffolding for everything we can see. Yet, our telescopes see not dark matter, but the brilliant light of galaxies. A fundamental challenge in cosmology is bridging the gap between this invisible structure and the visible galactic population. How do we determine which galaxy lives in which dark matter halo? And how does this connection shape the universe we observe?

Subhalo Abundance Matching (SHAM) provides an elegant and remarkably powerful answer to this question. It addresses the knowledge gap by proposing a simple, intuitive "cosmic sorting hat" rule: the most luminous galaxies inhabit the most massive dark matter halos, the second-most luminous inhabit the second-most massive, and so on. This simple premise forms the basis of a sophisticated framework for populating cosmological simulations with galaxies, transforming theoretical dark matter maps into realistic mock universes that can be directly compared with observational data.

This article explores the Subhalo Abundance Matching framework in two parts. First, we will delve into the core "Principles and Mechanisms," examining the fundamental equation of abundance matching, the physical motivations for choosing specific halo properties, and the advanced techniques developed to account for real-world complexities like tidal stripping and simulation limits. Second, we will survey the model's diverse "Applications and Interdisciplinary Connections," discovering how SHAM is used to predict galaxy clustering, weigh dark matter halos, probe galaxy evolution across cosmic time, and even find surprising echoes in other scientific fields.

Principles and Mechanisms

Imagine you have two enormous piles of objects. One pile contains every galaxy we can see in a large patch of the universe, sorted by brightness or, more precisely, by stellar mass ( $M_\star$ ). The other pile contains every dark matter halo and subhalo found in a supercomputer simulation of that same patch of sky, sorted by some measure of their "size," let's call it property $X$ . How do we decide which galaxy lives in which halo? This is the central question that Subhalo Abundance Matching, or SHAM, sets out to answer.

The Cosmic Sorting Hat: A Simple, Powerful Idea

The simplest and most powerful idea you could have is a kind of cosmic sorting hat. You declare that the most massive galaxy in the universe must live in the most massive dark matter halo. The second-most massive galaxy lives in the second-most massive halo, and so on, all the way down the list. This is the heart of SHAM: a direct, rank-ordered correspondence between galaxies and halos.

We can state this more formally and powerfully. Instead of matching one by one, we can match in groups. We say that the number of galaxies more massive than some value $M_\star$ must be exactly equal to the number of (sub)halos with a property $X$ greater than some corresponding value. This is the fundamental equation of SHAM:

n_{\rm gal}(>M_\star) = n_{\rm (sub)halo}(>X)

Here, $n(>...)$ is the cumulative number density—a fancy term for "how many things per unit volume are bigger than this value?". So, if you tell me a stellar mass, say $10^{11}$ solar masses, this equation allows me to find the exact value of the halo property $X$ that cuts off the same number of objects. This creates a perfect, monotonic mapping between galaxy mass and halo property.

Now, an interesting subtlety arises. Should our list of dark matter homes include only the giant, isolated "host" halos, or should it also include the smaller "subhalos" that orbit within them? When we add subhalos to the list, our pool of potential homes for galaxies gets bigger and more crowded. To match the same number of galaxies, we have to move our cutoff to a higher value of $X$ . This means that in a more complete model that includes subhalos, any given galaxy gets assigned to a more "elite" or higher- $X$ halo than it would have otherwise. The competition is fiercer! This effect is most pronounced for the less massive galaxies, where subhalos are most numerous.

What is "Massive"? The Challenge of Cosmic Tides

This brings us to our first major puzzle. What halo property $X$ should we use for our ranking? The most obvious choice might be the halo's current mass. But this turns out to be a terrible idea, especially for satellite galaxies—those that have fallen into the gravitational grip of a larger host halo.

As a satellite orbits, it is pummeled by the immense tidal forces of its host. These cosmic tides rip stars and dark matter from the satellite's outer regions, a process called tidal stripping. A subhalo that was once massive can be stripped down to a pale shadow of its former self. Its current mass tells us very little about the grand potential well it once possessed—the potential well that was responsible for gathering the gas that formed its stars in the first place.

The solution is to use a halo property that is immune to this post-infall processing. Instead of a halo's current mass, we can use its mass or another property measured at the moment it was first accreted, or even better, the peak value that property ever reached in its entire history. Popular choices include the peak historical mass ( $M_{\rm peak}$ ) or, even more robustly, the peak maximum circular velocity ( $V_{\rm peak}$ ). The maximum circular velocity, $V_{\rm max} = \max_r \sqrt{G M(<r)/r}$ , is a measure of the depth of a halo's gravitational potential well and is less sensitive to the fluffy outer regions that are easily stripped. Using its historical peak value, $V_{\rm peak}$ , gives us a number that is "frozen in" from the time the halo was at its most magnificent, right before tidal stripping began to take its toll.

We can see why this is so important with a simple model. Imagine a galaxy's stellar mass, $M_\star$ , is fundamentally tied to its halo's $V_{\rm peak}$ . Now, suppose we try to use its current $V_{\rm max}$ to predict its stellar mass. Because of tidal stripping, the current $V_{\rm max}$ is some fraction of the original $V_{\rm peak}$ , and that fraction depends on the satellite's specific orbit—a plunging orbit causes more stripping than a wide, gentle one. This orbital diversity introduces an extra layer of uncertainty, or scatter, into the relationship. If we use $V_{\rm peak}$ directly, we bypass this environmental noise and uncover a much tighter, more fundamental connection between the galaxy and its dark matter home.

Embracing the Messiness: Scatter and Real-World Complexity

Of course, nature is never so simple as a perfect one-to-one mapping. Even two halos with the exact same peak mass might have slightly different formation histories, leading them to form galaxies with slightly different stellar masses. This is known as intrinsic scatter.

SHAM incorporates this by smearing out the perfect monotonic relationship. Instead of saying a halo of property $X$ always hosts a galaxy of mass $M_\star$ , we say it hosts a galaxy with a distribution of possible masses, typically a log-normal distribution centered on the value from the simple mapping. Mathematically, this involves a convolution, which correctly accounts for the fact that a steep mass function means more small objects will scatter up to high masses than large objects will scatter down.

In practice, implementing this involves a clever trick. One starts with the halos, calculates their "ideal" galaxy masses, adds some random scatter, and then... finds that the resulting galaxy mass distribution doesn't quite match the observed one! The solution is to re-rank. You take your list of halos with their scattered galaxy masses and sort them. You also take a list of galaxy masses drawn perfectly from the observed distribution and sort it. Then, you assign the #1 galaxy mass from the target list to the #1 halo from your scattered list, #2 to #2, and so on. This remarkable procedure, a form of inverse transform sampling, gives you the best of both worlds: it perfectly enforces the observed galaxy numbers by construction, while preserving the physically motivated, rank-ordered connection (with scatter) between individual galaxies and halos.

From Simple Rules to a Realistic Universe

The basic principles of SHAM are elegant, but applying them to build a truly realistic model of the universe requires confronting a few more layers of complexity.

Kings and Vassals: Centrals and Satellites

A galaxy sitting peacefully at the center of its own vast dark matter halo (a central) is in a very different situation from a satellite being buffeted by its host. Their formation and evolution are different, so it makes sense to treat them separately in our model. Modern SHAM does just this: it takes the observed populations of central and satellite galaxies and matches them independently to the populations of host halos and subhalos from the simulation. For centrals, a present-day property like $V_{\rm max}$ works fine. For satellites, a historical property like $V_{\rm peak}$ is essential. This two-pronged approach allows for a much more physically detailed and accurate model. Interestingly, because SHAM links galaxies to halo properties that are known to correlate with formation time (like concentration, which is related to $V_{\rm max}$ ), it naturally incorporates a subtle but important clustering effect known as galaxy assembly bias, a feature that simpler models often miss.

Ghosts in the Machine: The Orphan Galaxy Solution

Our computer simulations, for all their power, have finite resolution. A simulation is made of discrete particles of mass $m_p$ , and we typically require a subhalo to have at least $N_{\rm min}$ particles to be reliably identified. This imposes a minimum resolvable mass, $M_{\rm min} = N_{\rm min} m_p$ . The problem is that a real subhalo can be tidally stripped below this mass limit and vanish from our simulation catalog, even though its dense, star-filled galaxy would likely survive for billions of years longer. This numerical over-merging is most severe in the dense inner regions of clusters. If we are not careful, our mock catalogs will be missing a huge number of inner satellites, which would completely ruin our predictions for galaxy distributions and clustering on small scales.

The solution is another beautiful piece of scientific ingenuity: the orphan galaxy treatment. When a subhalo track disappears from the simulation below the resolution limit, we don't give up on its galaxy. We continue to track it as a "ghost" or orphan. We model its subsequent orbital decay using the physics of dynamical friction, which acts like a brake on the satellite as it moves through the sea of dark matter particles in its host. We follow this orphan until the dynamical friction model predicts it should finally merge with the central galaxy, at which point we remove it from our catalog. This procedure allows us to resurrect the galaxies whose subhalo homes were prematurely destroyed by our simulation's limitations, painting a much more accurate picture of the cosmos.

A Dynamic Universe: Abundance Matching Through Time

Putting all these pieces together—the central/satellite split, historical proxies, scatter, and orphan tracking—allows us to build incredibly sophisticated models. SHAM is not just a static picture of the universe today. By using simulation outputs from many different cosmic epochs and matching them to the observed galaxy populations at each of those epochs, we can create a fully dynamic model. We can populate halos with galaxies at high redshift and follow their descendants, tracking satellites as they are accreted, stripped, and eventually merge. This allows us to connect the evolving tapestry of galaxies we see across cosmic time to the underlying, evolving cosmic web of dark matter.

From a simple sorting-hat rule, SHAM has blossomed into a rich, physically motivated, and powerful framework. It is a testament to how a simple, intuitive idea, when confronted with the complex realities of the universe and the limitations of our tools, can evolve into a sophisticated instrument for understanding our cosmic origins.

Applications and Interdisciplinary Connections

Now that we have grappled with the machinery of Subhalo Abundance Matching, we can ask the most important question a physicist can ask: So what? Does this elegant, simple idea actually work? Does it tell us anything new about the universe, or is it just a clever game played with computer simulations? This is where the real fun begins. We are about to embark on a journey, using SHAM as our guide, to see how this one principle illuminates a vast and varied landscape of cosmic phenomena, from the grand architecture of the cosmic web to the life stories of individual galaxies, and even to surprising corners of other scientific fields.

The Cosmic Web in Focus: Predicting Galaxy Clustering

Imagine you have a perfect map of the dark matter universe from a simulation—a ghostly tapestry of filaments, nodes, and voids. How do you "light it up" with galaxies to see if it looks like the real sky? This is SHAM's first and most fundamental job. By postulating that the most massive galaxies live in the most massive (sub)halos, SHAM provides a direct recipe for populating our dark universe.

The immediate test is to see if these mock galaxies cluster in the right way. Astronomers measure clustering with the two-point correlation function, which, simply put, tells you the excess probability of finding two galaxies separated by a certain distance. If galaxies were scattered randomly like dust motes in a sunbeam, this function would be zero everywhere. But they are not. They trace the cosmic web.

SHAM gives us a beautiful physical picture of this clustering. The correlation function is naturally split into two parts. On small scales (less than a few million light-years), the signal is dominated by pairs of galaxies that live inside the same host dark matter halo—the 1-halo term. SHAM tells us these pairs must be either a central galaxy paired with one of its satellites, or two satellites orbiting together. There can be no central-central pairs in the same halo, because a halo, by definition, has only one center!. On larger scales, the signal comes from galaxies in two different halos—the 2-halo term. This term tells us how the halos themselves are clustered across the universe.

This simple model is surprisingly powerful. We can make it even more realistic by adding a bit of "scatter" to the mass-matching relation, acknowledging that nature isn't perfectly tidy. With this ingredient, we can use the model to derive, from first principles, a quantity of immense importance called galaxy bias ( $b_g$ ). This number tells us how much more strongly a certain type of galaxy clusters than the underlying dark matter itself. A high-bias galaxy is a faithful tracer of the densest cosmic peaks, while a low-bias galaxy is more democratically distributed. SHAM doesn't just predict clustering; it explains the origin of galaxy bias.

Beyond theoretical predictions, SHAM is a workhorse for observational cosmology. Astronomers use it to create vast, realistic mock galaxy catalogs. These mocks are indispensable tools for testing the complex analysis pipelines used to interpret data from telescopes. For instance, an observer must contend with redshift-space distortions—the "Fingers of God" effect where galaxy clusters appear stretched out along our line of sight due to the internal motions of their galaxies. By creating SHAM mocks and applying the same observational distortions, we can perfect our methods for measuring the true underlying clustering, ensuring we are not being fooled by these cosmic illusions.

Weighing the Universe: Galaxy-Galaxy Lensing and Assembly Bias

SHAM's reach extends beyond just mapping galaxy positions. It helps us weigh the very halos they live in. Through the magic of gravitational lensing—where the gravity of a foreground galaxy and its dark matter halo bends the light from a more distant background galaxy—we can measure the average mass profile around galaxies of a certain type. This is like putting a galaxy and its halo on a cosmic scale.

Here, we encounter a fascinating scientific detective story, a duel between SHAM and another popular model, the Halo Occupation Distribution (HOD). A simple HOD model posits that the number of galaxies in a halo depends only on the halo's mass. SHAM, however, is more subtle. By matching to a property like a subhalo's peak velocity ( $V_{\rm peak}$ ), it implicitly connects a galaxy to its halo's formation history. Why? Because two halos of the same mass today can have very different biographies. An "old" halo that assembled early will be more concentrated and have more processed subhalos than a "young" halo of the same mass that formed recently. This effect, where clustering and other properties at fixed mass depend on formation time, is known as assembly bias.

A mass-only HOD knows nothing of this history. SHAM, on the other hand, builds it in for free. This leads to a stunning, testable prediction. Consider central galaxies of a fixed stellar mass. SHAM predicts that the ones we find in more tightly clustered regions (high galaxy bias) should live in older, more concentrated halos. The ones in less-clustered regions (low bias) should live in younger, puffier halos. A mass-only HOD predicts no such difference. How can we check? With gravitational lensing! By measuring the lensing signal for these two groups of galaxies separately, we can see if their average halo profiles are indeed different. Finding such a difference, as recent observations suggest, would be a triumph for the physical picture painted by SHAM and a powerful clue about the intimate connection between a galaxy and its cosmic history.

Painting the Universe: Beyond Mass and Position

The real beauty of the abundance matching principle is its flexibility. Who says we have to match on mass? We can try to match other ranked properties as well. This opens up a whole new palette for painting a more complete picture of the galaxy population.

A wonderful example is age matching. We know that galaxies come in two main "colors": red and blue. Red galaxies are typically older, with little ongoing star formation, while blue galaxies are young and actively forming stars. We also know that dark matter halos have a formation time. What if we make a new matching hypothesis: at a fixed stellar mass, the reddest (oldest) galaxies live in the halos that formed earliest? This is age matching.

This simple idea beautifully explains a long-standing observation: red galaxies are more clustered than blue galaxies, even at the same mass. Why? Because of assembly bias! As we just learned, halos that form earlier are intrinsically more clustered. By linking old galaxies to old halos, age matching naturally predicts that the red population should trace the most clustered parts of the cosmic web. It’s a profound connection between the stellar populations inside a galaxy and the large-scale dynamics of its dark matter host.

We can push this even further. By applying SHAM at different snapshots in cosmic time—different redshifts—we can track the growth of galaxies along the growth of their parent subhalos. This allows us to move from a static picture to a dynamic movie of galaxy evolution. By measuring how much a galaxy's mass must have increased between two snapshots to maintain its rank in the abundance hierarchy, we can infer its average star formation rate. This transforms SHAM into a tool for probing the fuel cycle of galaxies over billions of years, connecting the dark matter skeleton's growth to the flesh and blood of stellar creation.

Galaxies in their Environment: The Finer Details

Zooming in, SHAM also provides insights into the demographics and local environment of galaxies. For example, by analyzing a SHAM mock, we can directly predict the satellite fraction—what fraction of galaxies of a given mass are centrals, reigning over their own halo, versus satellites, orbiting within a larger system. This is a fundamental prediction that constrains models of galaxy quenching and transformation.

Furthermore, the universe is not isotropic. Accretion onto halos happens along preferred directions, guided by the gravitational tidal fields of the cosmic web. This should leave an imprint on the distribution of satellites. They shouldn't be arranged in a simple spherical swarm around their central galaxy, but should show some alignment with their host halo's shape and the direction of the local filament. SHAM can be extended to model this very effect. By conditioning the abundance matching not just on mass but also on information from the local tidal tensor, we can build models that predict the anisotropic alignment of satellite galaxies, linking their small-scale distribution to the grand geometry of the cosmos.

Beyond the Cosmos: An Interdisciplinary Echo

Perhaps the most delightful surprise is that the logic of abundance matching is not confined to cosmology. Its echo can be heard in other fields, demonstrating a beautiful unity of scientific thought. Consider the world of ecology. Ecologists study the distribution of species across a landscape of habitats. A habitat "patch" (like a forest fragment) has a certain "carrying capacity" (analogous to a halo's mass), and a species has a characteristic biomass.

What determines which species lives where? Here, we can formulate an ecological SHAM! We can hypothesize that the species with the highest biomass tend to occupy the patches with the highest carrying capacity. By rank-ordering species by biomass and patches by carrying capacity, we can create a model for the spatial distribution of species. We can then test this model by seeing if it reproduces the observed "clustering" of occupied patches. This cross-domain application shows that abundance matching is fundamentally a powerful statistical inference tool for any system where ranked properties are thought to be coupled.

From predicting the cosmic web, to weighing dark matter halos, to painting the life stories of galaxies, and even to modeling ecological communities, Subhalo Abundance Matching proves to be far more than a simple algorithm. It is a profound and versatile idea, a testament to the power of simple physical principles to unify and explain a breathtaking range of phenomena across the natural world.