Identity by Descent

SciencePedia

Key Takeaways

Identity by descent (IBD) distinguishes alleles that are physical copies of a single ancestral DNA molecule from those that are merely the same type (Identity by State, or IBS).
The inbreeding coefficient (F) measures the probability that an individual's two alleles are IBD, directly quantifying the resulting deficit of heterozygotes in a population compared to random-mating expectations.
IBD is used to calculate genetic relatedness, such as the kinship coefficient, which helps explain altruistic behavior via Hamilton's rule and the evolution of social structures.
The concept of IBD has critical applications in diverse fields, including assessing recessive disease risk in medicine, strengthening DNA evidence in forensics, and planning genetic rescue in conservation biology.

Introduction

In the study of life, few questions are more fundamental than "How are we related?" We intuitively understand that shared traits can signify a family connection, but they can also be mere coincidences. Genetics provides a powerful tool to formalize this question, moving beyond simple appearance to the level of DNA itself. The core concept that allows us to do this is Identity by Descent (IBD), a simple yet profound idea that distinguishes between a shared trait that is coincidental and one that arises from a shared, recent ancestor. Understanding IBD is not just an academic exercise; it is the key to unlocking the mechanisms of inbreeding, the mathematics of kinship, and the genetic fate of populations.

This article explores the principle of Identity by Descent and its far-reaching consequences. First, in the "Principles and Mechanisms" chapter, we will dissect the core theory, defining IBD in contrast to Identity by State (IBS). We will explore how IBD is quantified through the inbreeding and kinship coefficients and how it governs the genetic makeup of individuals and populations. Following this, the "Applications and Interdisciplinary Connections" chapter will showcase how this single concept serves as a master key across the biological sciences, revealing its critical role in medicine, forensic science, animal behavior, and conservation.

Principles and Mechanisms

Imagine you meet two people, both named John Smith. The fact that they share a name means they are identical by state for the property "name." But it tells you nothing about whether they are related. They could be from entirely different continents, their shared name a mere coincidence of history and culture. Now, imagine you meet two brothers. They also share the name "Smith." But in their case, the identity is not coincidental; it is a direct consequence of them sharing a recent common ancestor—their father. Their names are identical by descent.

This simple analogy is the key to one of the most powerful ideas in genetics: the distinction between identity by state (IBS) and identity by descent (IBD). Two alleles are IBS if they are the same type—say, both are the allele for blue eyes. Two alleles are IBD if they are physical copies of the very same ancestral DNA molecule from a recent common ancestor. IBD is a statement about genealogy, about history. IBS is just a statement about the current form. As you might guess, if two alleles are IBD, they must also be IBS (assuming no new mutation has occurred along the way). But the reverse is certainly not true; two blue-eye alleles in the population can be identical by state without being identical by descent. This distinction is the bedrock upon which our understanding of inbreeding, relatedness, and genetic drift is built.

Looking in the Mirror: The Inbreeding Coefficient

Let's start by looking not at two different people, but at the two alleles for a single gene inside one diploid individual. What is the chance that these two alleles are IBD—that they are copies of the same ancestral allele from the pedigree? This probability has a special name: the inbreeding coefficient, denoted by the letter $F$ .

If an individual's two alleles are IBD, we say the individual is autozygous at that locus. "Auto" means self, so this is literally "homozygous by self-origin." The alleles are identical because they are copies of the same ancestral piece of DNA. If the two alleles are not IBD, we say the individual is allozygous ("allo" for other). An allozygous individual can still be homozygous (e.g., genotype $AA$ ) if it happens to inherit two $A$ alleles that are IBS but come from different, unrelated ancestral lines. This is homozygosity by state, not by descent. The inbreeding coefficient $F$ is, therefore, simply the probability of autozygosity.

The Price of Inbreeding: Where Did the Heterozygotes Go?

So, what happens to a population's genetic makeup when there's inbreeding (i.e., when $F > 0$ )? Let's think it through. Consider a single gene with two alleles, $A$ and $a$ , with frequencies $p$ and $q$ in the population's gene pool. We want to figure out the expected frequencies of the genotypes $AA$ , $Aa$ , and $aa$ .

Take a random individual. There are two mutually exclusive possibilities for its pair of alleles:

The two alleles are IBD. The probability of this is $F$ . If they are IBD, they must be copies of a single ancestral allele. The chance that this ancestor was an $A$ is $p$ , and the chance it was an $a$ is $q$ . So, this path contributes a probability of $Fp$ to the $AA$ genotype and $Fq$ to the $aa$ genotype. Notice it's impossible to form a heterozygote $Aa$ this way!
The two alleles are not IBD. The probability of this is $1-F$ . If they are not IBD, they are like two independent, random draws from the gene pool. The probability of drawing two $A$ 's is $p^2$ , two $a$ 's is $q^2$ , and an $A$ and an $a$ is $2pq$ . So, this path contributes $(1-F)p^2$ to the $AA$ genotype, $(1-F)q^2$ to the $aa$ genotype, and $(1-F)2pq$ to the $Aa$ genotype.

Putting it all together, the total genotype frequencies are:

$P(AA) = Fp + (1-F)p^2 = p^2 + Fp(1-p) = p^2 + Fpq$
$P(aa) = Fq + (1-F)q^2 = q^2 + Fq(1-q) = q^2 + Fpq$
$P(Aa) = (1-F)2pq$

Look closely at these results. Inbreeding takes a fraction, $Fpq$ , of individuals that would have been heterozygotes and turns them into homozygotes ( $Fpq/2$ to $AA$ and $Fpq/2$ to $aa$ , if you check the math). The most striking result is the frequency of heterozygotes, $H$ . In a perfectly random-mating population (where $F=0$ ), the heterozygote frequency is $H_0 = 2pq$ . With inbreeding, it becomes $H = H_0(1-F)$ .

This gives us a beautifully intuitive interpretation of the inbreeding coefficient: $F$ is the proportional deficit of heterozygotes compared to the random-mating expectation. If a population has an inbreeding coefficient of $F=0.25$ , it means it has $25\%$ fewer heterozygotes than you'd expect based on its allele frequencies alone. Inbreeding doesn't change the allele frequencies $p$ and $q$ in the population, but it "packages" them into genotypes differently, creating a surplus of homozygotes and a deficit of heterozygotes.

A Straight Line to Homozygosity: The Case of Selfing

To see the accumulation of IBD in action, consider the most extreme form of inbreeding imaginable: self-fertilization, or "selfing," common in many plants. Imagine we start with a single plant that is heterozygous, $Aa$ , at generation $t=0$ . Its alleles are not IBD, so $F_0 = 0$ . What happens as it and its descendants self-fertilize generation after generation?

A heterozygous parent ( $Aa$ ) produces offspring in the classic Mendelian ratio: $1/4$ $AA$ , $1/2$ $Aa$ , and $1/4$ $aa$ . The key is that the homozygous offspring ( $AA$ and $aa$ ) can only ever produce more homozygous offspring by selfing. The only source of new heterozygotes is the existing heterozygotes. Each generation, half of the offspring of heterozygotes are themselves heterozygous. Therefore, the proportion of heterozygotes in the population, $H_t$ , is halved each generation: $H_t = \frac{1}{2} H_{t-1}$ .

Since we started with $H_0 = 1$ , the frequency of heterozygotes after $t$ generations is simply $H_t = (\frac{1}{2})^t$ . In this special system, an individual is heterozygous if and only if its alleles are not IBD. This gives us a direct link: $H_t = 1 - F_t$ . Combining these equations, we get a beautiful expression for the increase in the inbreeding coefficient over time: $F_t = 1 - \left(\frac{1}{2}\right)^t$ After one generation, $F_1 = 1/2$ . After two, $F_2 = 3/4$ . After ten generations, $F_{10} \approx 0.999$ . The population rapidly approaches complete autozygosity, where nearly every individual is homozygous.

The Ties That Bind: Quantifying Genetic Relationships

So far, we've used IBD to look inside an individual. But its real power comes from using it to look between individuals. We can define a measure of relatedness called the kinship coefficient, $\phi_{xy}$ , as the probability that an allele picked at random from individual $x$ and an allele picked at random from individual $y$ are IBD.

This single definition unlocks the secrets of the pedigree. For instance, what is the inbreeding coefficient, $F_z$ , of a child $z$ whose parents are $x$ and $y$ ? The child gets one allele from $x$ (a random draw from $x$ 's alleles) and one from $y$ (a random draw from $y$ 's alleles). The probability that these two alleles are IBD is, by definition, the kinship of the parents, $\phi_{xy}$ . Thus, we have the elegant and powerful rule: $F_z = \phi_{xy}$ If the parents are unrelated ( $\phi_{xy}=0$ ), their child is not inbred ( $F_z=0$ ). But if the parents are related—say, first cousins—their kinship will be greater than zero, and their child will be inbred with a probability equal to that kinship.

A Modern View of Relatedness: Counting Shared Alleles

With modern genomics, we can get an even more detailed picture of relatedness than a single kinship coefficient. For any pair of diploid individuals, we can ask: how many alleles do they share IBD at a given locus? There are three possibilities: they share zero, one, or two alleles IBD. We can define $k_0$ , $k_1$ , and $k_2$ as the probabilities of these three events occurring.

This trio of numbers provides a rich genetic fingerprint for any relationship:

Parent-Offspring: A child shares exactly one allele IBD with each parent at every locus. The other allele comes from the other parent. So, for a parent-offspring pair, the IBD state is always $(k_0, k_1, k_2) = (0, 1, 0)$ .
Full Siblings: Two siblings have a $1/4$ chance of inheriting the same paternal allele AND the same maternal allele (sharing 2 IBD), and a $1/4$ chance of inheriting different paternal AND different maternal alleles (sharing 0 IBD). The remaining $1/2$ of the time, they share exactly one allele IBD. Thus, for full sibs, $(k_0, k_1, k_2) = (1/4, 1/2, 1/4)$ .

Notice the difference! You are guaranteed to share one allele with your parent, but you might share zero, one, or two with your sibling. This is the beautiful uncertainty of Mendelian segregation in action. These $k$ probabilities can be neatly linked back to the kinship coefficient with the formula $\phi = \frac{1}{4}k_1 + \frac{1}{2}k_2$ . For both parent-offspring and full-siblings, this gives $\phi = 1/4$ , confirming the classic result that you are, on average, equally related to your parent and your sibling. For half-siblings, who share only one parent, the state is $(1/2, 1/2, 0)$ , yielding a kinship of $\phi = 1/8$ .

The Inexorable March of IBD: Genetic Drift and Population Size

Finally, let's zoom out to the whole population. What happens in a small, isolated population, even if mating is completely random? Imagine a small island with only 50 people. Inevitably, after a few generations, people will end up marrying a distant cousin by pure chance. The smaller the population, the faster this happens. Every generation, a small amount of IBD is generated simply because the pool of potential ancestors is limited. This unavoidable, random increase in inbreeding over time is a core part of genetic drift.

The rate of this increase is governed by the inbreeding effective population size, $N_e^{(I)}$ . In an idealized population, the probability that two alleles drawn to make an offspring come from the very same ancestral allele in the previous generation is $\frac{1}{2N_e}$ . This is a new source of IBD. The full recurrence relation for the inbreeding coefficient becomes: $F_{t+1} = \frac{1}{2N_e^{(I)}} + \left(1 - \frac{1}{2N_e^{(I)}}\right)F_t$ This equation tells us that every generation, $F$ creeps a little closer to $1$ . The change is approximately $\Delta F \approx \frac{1}{2N_e^{(I)}}$ when $F$ is small. This is why conservation biologists are so concerned with effective population size. A small $N_e$ means a rapid increase in $F$ , a rapid loss of heterozygosity, and an increased risk of inbreeding depression, which can spell doom for an endangered species. Identity by descent, which began as a simple idea to distinguish two John Smiths, ends up being the master variable that governs the genetic fate of entire populations.

Applications and Interdisciplinary Connections

Now that we have grasped the machinery of Identity by Descent (IBD)—the simple yet profound idea that two alleles can be physical copies of a single ancestral piece of DNA—we can embark on a journey. We will see how this one concept, like a master key, unlocks doors across the vast mansion of biological science. It is far from an abstract accounting tool for geneticists; it is a fundamental principle that explains the intimate details of our health, the structure of animal societies, the grand drama of evolution, and even the whispers we can coax from the bones of our ancestors. The trail of IBD leads everywhere.

The Double-Edged Sword in Our Genes: Medicine and Forensics

Perhaps the most personal place we encounter the consequences of IBD is in our own health. We all carry a handful of "sleeping" recessive alleles—variants that could cause disease but are harmless as long as they are paired with a normal, functional copy. In a large, randomly-mating population, the chances of two individuals carrying the same rare recessive allele meeting and having a child are exceedingly small.

But what happens when parents are related? Their child has a non-zero chance of inheriting two alleles at a locus that are identical by descent. This probability is precisely the inbreeding coefficient, $F$ . As we've seen, this can be calculated by tracing paths through a family tree, such as in the classic case of first cousins where we find $F = 1/16$ . Inbreeding doesn't create new faulty alleles, but it dramatically increases the chance that two sleeping copies will meet in the same person, awakening a recessive disease. The rarer the disease-causing allele is in the general population, the more dramatic this increase in relative risk becomes. Inbreeding acts as a concentrating lens for genetic risk, a fact that is of paramount importance in genetic counseling.

This principle isn't limited to severe diseases. Consider the Rh blood group system. A child being Rh-negative (genotype $dd$ ) when the $d$ allele is rare in the population is an unlikely event. If that child's parents are first cousins, however, we can use the logic of IBD to trace the likely path of that rare allele through the family. We can even calculate the probability that the child's two $d$ alleles are not just identical by state, but are in fact IBD copies inherited from a specific grandparent, providing a powerful window into the flow of genetic information through generations.

The same logic that helps us understand family resemblances can also be used to establish identity with astonishing precision. In forensic science, a DNA match between a crime scene sample and a suspect can be powerful evidence. But a crucial question must be asked: what is the probability that an unrelated person would match by chance? To answer this, forensic geneticists use reference databases of allele frequencies.

Here, a subtle application of IBD becomes critical. If a suspect belongs to a small, relatively isolated sub-population, the assumption of random mating may not hold. Individuals in that group are, on average, more related to each other than to a random person from the general population. They share a higher "background" level of IBD. To account for this, forensic analysts apply a "theta-correction" ( $\theta$ ). This $\theta$ is nothing more than an inbreeding coefficient for the sub-population, quantifying the probability that two alleles in an individual are IBD due to shared ancestry within that group. Applying this correction gives a more conservative, and therefore more scientifically honest, estimate of the genotype frequency. It acknowledges that two alleles might look the same (identity by state) precisely because they are descended from a common ancestor (identity by descent), making the genotype less rare in that specific group than the broad population data might suggest.

The Architect of Societies and the Savior of Species

Moving from individuals to populations, we find that IBD is a chief architect of the social lives of animals. The puzzle of altruism—why would an animal risk its own life or sacrifice its own reproduction to help another?—was largely solved by W. D. Hamilton with the concept of inclusive fitness. The famous Hamilton's rule, $rB > C$ , states that an altruistic act is favored by selection if the benefit to the recipient ( $B$ ), weighted by the relatedness of the actor to the recipient ( $r$ ), exceeds the cost to the actor ( $C$ ).

That all-important coefficient of relatedness, $r$ , is a direct measure of IBD. It can be seen as the probability that a gene in the actor is an identical-by-descent copy of a gene in the recipient. For first cousins, for instance, we can trace the paths of inheritance from their shared grandparents to find that $r = 1/8$ . From a "gene's-eye view," helping a relative is a form of self-interest, as it promotes the survival of its own IBD copies residing in another body.

This logic finds its most spectacular confirmation in the Hymenoptera—ants, bees, and wasps. Due to their strange "haplodiploid" genetic system, where males are haploid (from unfertilized eggs) and females are diploid, a startling asymmetry in relatedness emerges. While diploid full sisters (like in humans or termites) share, on average, half their genes ( $r = 1/2$ ), haplodiploid full sisters are "super-sisters," sharing a remarkable three-quarters of their genes ( $r = 3/4$ ). A female is more related to her sister than she would be to her own offspring! This high relatedness provides a powerful evolutionary incentive for a female worker to forgo her own reproduction and instead dedicate her life to helping her mother, the queen, produce more sisters. The concept of IBD elegantly explains the origin of the highly organized and seemingly selfless eusocial colonies that dot the natural world.

While IBD can build societies, it can also threaten the very existence of species. In small, isolated populations of endangered animals, mates are often related. This leads to a steady increase in the average inbreeding coefficient $F$ of the population. The consequence is "inbreeding depression": as IBD becomes more common, deleterious recessive alleles are expressed more often, leading to lower fertility, higher infant mortality, and greater susceptibility to disease. This results in a decline in the mean fitness of the population.

Fortunately, an understanding of IBD also provides the solution. Conservation managers can plan a "genetic rescue" by introducing a small number of unrelated individuals into the threatened population. Using the mathematics of IBD, they can calculate the minimum fraction of immigrants needed to reduce the population's average $F$ below a critical threshold in a single generation. This is IBD not as a passive descriptor, but as an active, quantitative tool used on the front lines of the fight to preserve biodiversity.

Reading the Past and Redefining the Present: The Frontiers of IBD

The power of IBD extends even into the deep past. With the advent of paleogenomics, we can now extract and sequence DNA from ancient bones and teeth. This data is often of low quality—fragmented, scarce, and riddled with chemical damage. How, from such noisy data, can we determine if two Neanderthals buried near each other were siblings, or a parent and child, or merely unrelated tribe members? We cannot draw a pedigree.

The modern solution is to estimate IBD in a more nuanced, statistical way. Instead of a single relatedness value, researchers estimate the probabilities that two individuals share 0, 1, or 2 alleles IBD, denoted $k_0, k_1,$ and $k_2$ . Using sophisticated likelihood models that explicitly account for sequencing errors, post-mortem damage, and population allele frequencies, scientists can infer these IBD states directly from the messy genomic data. This allows for the reconstruction of family structures and social dynamics in populations that have been extinct for tens of thousands of years.

This statistical view of IBD leads us to a final, profound generalization. What if we redefine relatedness itself, freeing it from the need for a family tree? In its most modern form, relatedness is cast as a regression coefficient: $r = \frac{\mathrm{Cov}(G_{\text{actor}}, G_{\text{recipient}})}{\mathrm{Var}(G_{\text{actor}})}$ In plain English, this simply asks: to what extent does knowing the genetic makeup of an actor predict the genetic makeup of a recipient? This statistical association doesn't have to come from a neat, well-defined pedigree. It can arise from any process that makes interacting individuals genetically non-random: limited dispersal (neighbors tend to be kin), assortative interactions ("green-beard" effects where individuals with a gene for altruism recognize and help others with the same gene), or any form of population structure. This definition works for microbes in a biofilm just as well as it does for lions in a pride.

Here, the concept of IBD completes its transformation: from a simple rule about inheritance in families, it becomes a powerful, general measure of statistical genetic association, applicable across all life.

From the quiet consultation rooms of genetic counselors to the bustling societies of ants, from the desperate efforts to save the last members of a species to the ghostly genomes of our ancient cousins, Identity by Descent is the common thread. It is a testament to the beauty of science that such a simple idea—that two things can be copies of one original—can yield such a rich and intricate understanding of the living world.