try ai
Popular Science
Edit
Share
Feedback
  • Identity by Descent

Identity by Descent

SciencePediaSciencePedia
Key Takeaways
  • Identity by descent (IBD) distinguishes alleles that are physical copies of a single ancestral DNA molecule from those that are merely the same type (Identity by State, or IBS).
  • The inbreeding coefficient (F) measures the probability that an individual's two alleles are IBD, directly quantifying the resulting deficit of heterozygotes in a population compared to random-mating expectations.
  • IBD is used to calculate genetic relatedness, such as the kinship coefficient, which helps explain altruistic behavior via Hamilton's rule and the evolution of social structures.
  • The concept of IBD has critical applications in diverse fields, including assessing recessive disease risk in medicine, strengthening DNA evidence in forensics, and planning genetic rescue in conservation biology.

Introduction

In the study of life, few questions are more fundamental than "How are we related?" We intuitively understand that shared traits can signify a family connection, but they can also be mere coincidences. Genetics provides a powerful tool to formalize this question, moving beyond simple appearance to the level of DNA itself. The core concept that allows us to do this is ​​Identity by Descent (IBD)​​, a simple yet profound idea that distinguishes between a shared trait that is coincidental and one that arises from a shared, recent ancestor. Understanding IBD is not just an academic exercise; it is the key to unlocking the mechanisms of inbreeding, the mathematics of kinship, and the genetic fate of populations.

This article explores the principle of Identity by Descent and its far-reaching consequences. First, in the "Principles and Mechanisms" chapter, we will dissect the core theory, defining IBD in contrast to Identity by State (IBS). We will explore how IBD is quantified through the inbreeding and kinship coefficients and how it governs the genetic makeup of individuals and populations. Following this, the "Applications and Interdisciplinary Connections" chapter will showcase how this single concept serves as a master key across the biological sciences, revealing its critical role in medicine, forensic science, animal behavior, and conservation.

Principles and Mechanisms

Imagine you meet two people, both named John Smith. The fact that they share a name means they are identical by state for the property "name." But it tells you nothing about whether they are related. They could be from entirely different continents, their shared name a mere coincidence of history and culture. Now, imagine you meet two brothers. They also share the name "Smith." But in their case, the identity is not coincidental; it is a direct consequence of them sharing a recent common ancestor—their father. Their names are identical by descent.

This simple analogy is the key to one of the most powerful ideas in genetics: the distinction between ​​identity by state (IBS)​​ and ​​identity by descent (IBD)​​. Two alleles are IBS if they are the same type—say, both are the allele for blue eyes. Two alleles are IBD if they are physical copies of the very same ancestral DNA molecule from a recent common ancestor. IBD is a statement about genealogy, about history. IBS is just a statement about the current form. As you might guess, if two alleles are IBD, they must also be IBS (assuming no new mutation has occurred along the way). But the reverse is certainly not true; two blue-eye alleles in the population can be identical by state without being identical by descent. This distinction is the bedrock upon which our understanding of inbreeding, relatedness, and genetic drift is built.

Looking in the Mirror: The Inbreeding Coefficient

Let's start by looking not at two different people, but at the two alleles for a single gene inside one diploid individual. What is the chance that these two alleles are IBD—that they are copies of the same ancestral allele from the pedigree? This probability has a special name: the ​​inbreeding coefficient​​, denoted by the letter FFF.

If an individual's two alleles are IBD, we say the individual is ​​autozygous​​ at that locus. "Auto" means self, so this is literally "homozygous by self-origin." The alleles are identical because they are copies of the same ancestral piece of DNA. If the two alleles are not IBD, we say the individual is ​​allozygous​​ ("allo" for other). An allozygous individual can still be homozygous (e.g., genotype AAAAAA) if it happens to inherit two AAA alleles that are IBS but come from different, unrelated ancestral lines. This is homozygosity by state, not by descent. The inbreeding coefficient FFF is, therefore, simply the probability of autozygosity.

The Price of Inbreeding: Where Did the Heterozygotes Go?

So, what happens to a population's genetic makeup when there's inbreeding (i.e., when F>0F > 0F>0)? Let's think it through. Consider a single gene with two alleles, AAA and aaa, with frequencies ppp and qqq in the population's gene pool. We want to figure out the expected frequencies of the genotypes AAAAAA, AaAaAa, and aaaaaa.

Take a random individual. There are two mutually exclusive possibilities for its pair of alleles:

  1. The two alleles are IBD. The probability of this is FFF. If they are IBD, they must be copies of a single ancestral allele. The chance that this ancestor was an AAA is ppp, and the chance it was an aaa is qqq. So, this path contributes a probability of FpFpFp to the AAAAAA genotype and FqFqFq to the aaaaaa genotype. Notice it's impossible to form a heterozygote AaAaAa this way!
  2. The two alleles are not IBD. The probability of this is 1−F1-F1−F. If they are not IBD, they are like two independent, random draws from the gene pool. The probability of drawing two AAA's is p2p^2p2, two aaa's is q2q^2q2, and an AAA and an aaa is 2pq2pq2pq. So, this path contributes (1−F)p2(1-F)p^2(1−F)p2 to the AAAAAA genotype, (1−F)q2(1-F)q^2(1−F)q2 to the aaaaaa genotype, and (1−F)2pq(1-F)2pq(1−F)2pq to the AaAaAa genotype.

Putting it all together, the total genotype frequencies are:

  • P(AA)=Fp+(1−F)p2=p2+Fp(1−p)=p2+FpqP(AA) = Fp + (1-F)p^2 = p^2 + Fp(1-p) = p^2 + FpqP(AA)=Fp+(1−F)p2=p2+Fp(1−p)=p2+Fpq
  • P(aa)=Fq+(1−F)q2=q2+Fq(1−q)=q2+FpqP(aa) = Fq + (1-F)q^2 = q^2 + Fq(1-q) = q^2 + FpqP(aa)=Fq+(1−F)q2=q2+Fq(1−q)=q2+Fpq
  • P(Aa)=(1−F)2pqP(Aa) = (1-F)2pqP(Aa)=(1−F)2pq

Look closely at these results. Inbreeding takes a fraction, FpqFpqFpq, of individuals that would have been heterozygotes and turns them into homozygotes (Fpq/2Fpq/2Fpq/2 to AAAAAA and Fpq/2Fpq/2Fpq/2 to aaaaaa, if you check the math). The most striking result is the frequency of heterozygotes, HHH. In a perfectly random-mating population (where F=0F=0F=0), the heterozygote frequency is H0=2pqH_0 = 2pqH0​=2pq. With inbreeding, it becomes H=H0(1−F)H = H_0(1-F)H=H0​(1−F).

This gives us a beautifully intuitive interpretation of the inbreeding coefficient: FFF is the proportional deficit of heterozygotes compared to the random-mating expectation. If a population has an inbreeding coefficient of F=0.25F=0.25F=0.25, it means it has 25%25\%25% fewer heterozygotes than you'd expect based on its allele frequencies alone. Inbreeding doesn't change the allele frequencies ppp and qqq in the population, but it "packages" them into genotypes differently, creating a surplus of homozygotes and a deficit of heterozygotes.

A Straight Line to Homozygosity: The Case of Selfing

To see the accumulation of IBD in action, consider the most extreme form of inbreeding imaginable: self-fertilization, or "selfing," common in many plants. Imagine we start with a single plant that is heterozygous, AaAaAa, at generation t=0t=0t=0. Its alleles are not IBD, so F0=0F_0 = 0F0​=0. What happens as it and its descendants self-fertilize generation after generation?

A heterozygous parent (AaAaAa) produces offspring in the classic Mendelian ratio: 1/41/41/4 AAAAAA, 1/21/21/2 AaAaAa, and 1/41/41/4 aaaaaa. The key is that the homozygous offspring (AAAAAA and aaaaaa) can only ever produce more homozygous offspring by selfing. The only source of new heterozygotes is the existing heterozygotes. Each generation, half of the offspring of heterozygotes are themselves heterozygous. Therefore, the proportion of heterozygotes in the population, HtH_tHt​, is halved each generation: Ht=12Ht−1H_t = \frac{1}{2} H_{t-1}Ht​=21​Ht−1​.

Since we started with H0=1H_0 = 1H0​=1, the frequency of heterozygotes after ttt generations is simply Ht=(12)tH_t = (\frac{1}{2})^tHt​=(21​)t. In this special system, an individual is heterozygous if and only if its alleles are not IBD. This gives us a direct link: Ht=1−FtH_t = 1 - F_tHt​=1−Ft​. Combining these equations, we get a beautiful expression for the increase in the inbreeding coefficient over time: Ft=1−(12)tF_t = 1 - \left(\frac{1}{2}\right)^tFt​=1−(21​)t After one generation, F1=1/2F_1 = 1/2F1​=1/2. After two, F2=3/4F_2 = 3/4F2​=3/4. After ten generations, F10≈0.999F_{10} \approx 0.999F10​≈0.999. The population rapidly approaches complete autozygosity, where nearly every individual is homozygous.

The Ties That Bind: Quantifying Genetic Relationships

So far, we've used IBD to look inside an individual. But its real power comes from using it to look between individuals. We can define a measure of relatedness called the ​​kinship coefficient​​, ϕxy\phi_{xy}ϕxy​, as the probability that an allele picked at random from individual xxx and an allele picked at random from individual yyy are IBD.

This single definition unlocks the secrets of the pedigree. For instance, what is the inbreeding coefficient, FzF_zFz​, of a child zzz whose parents are xxx and yyy? The child gets one allele from xxx (a random draw from xxx's alleles) and one from yyy (a random draw from yyy's alleles). The probability that these two alleles are IBD is, by definition, the kinship of the parents, ϕxy\phi_{xy}ϕxy​. Thus, we have the elegant and powerful rule: Fz=ϕxyF_z = \phi_{xy}Fz​=ϕxy​ If the parents are unrelated (ϕxy=0\phi_{xy}=0ϕxy​=0), their child is not inbred (Fz=0F_z=0Fz​=0). But if the parents are related—say, first cousins—their kinship will be greater than zero, and their child will be inbred with a probability equal to that kinship.

A Modern View of Relatedness: Counting Shared Alleles

With modern genomics, we can get an even more detailed picture of relatedness than a single kinship coefficient. For any pair of diploid individuals, we can ask: how many alleles do they share IBD at a given locus? There are three possibilities: they share zero, one, or two alleles IBD. We can define k0k_0k0​, k1k_1k1​, and k2k_2k2​ as the probabilities of these three events occurring.

This trio of numbers provides a rich genetic fingerprint for any relationship:

  • ​​Parent-Offspring:​​ A child shares exactly one allele IBD with each parent at every locus. The other allele comes from the other parent. So, for a parent-offspring pair, the IBD state is always (k0,k1,k2)=(0,1,0)(k_0, k_1, k_2) = (0, 1, 0)(k0​,k1​,k2​)=(0,1,0).
  • ​​Full Siblings:​​ Two siblings have a 1/41/41/4 chance of inheriting the same paternal allele AND the same maternal allele (sharing 2 IBD), and a 1/41/41/4 chance of inheriting different paternal AND different maternal alleles (sharing 0 IBD). The remaining 1/21/21/2 of the time, they share exactly one allele IBD. Thus, for full sibs, (k0,k1,k2)=(1/4,1/2,1/4)(k_0, k_1, k_2) = (1/4, 1/2, 1/4)(k0​,k1​,k2​)=(1/4,1/2,1/4).

Notice the difference! You are guaranteed to share one allele with your parent, but you might share zero, one, or two with your sibling. This is the beautiful uncertainty of Mendelian segregation in action. These kkk probabilities can be neatly linked back to the kinship coefficient with the formula ϕ=14k1+12k2\phi = \frac{1}{4}k_1 + \frac{1}{2}k_2ϕ=41​k1​+21​k2​. For both parent-offspring and full-siblings, this gives ϕ=1/4\phi = 1/4ϕ=1/4, confirming the classic result that you are, on average, equally related to your parent and your sibling. For half-siblings, who share only one parent, the state is (1/2,1/2,0)(1/2, 1/2, 0)(1/2,1/2,0), yielding a kinship of ϕ=1/8\phi = 1/8ϕ=1/8.

The Inexorable March of IBD: Genetic Drift and Population Size

Finally, let's zoom out to the whole population. What happens in a small, isolated population, even if mating is completely random? Imagine a small island with only 50 people. Inevitably, after a few generations, people will end up marrying a distant cousin by pure chance. The smaller the population, the faster this happens. Every generation, a small amount of IBD is generated simply because the pool of potential ancestors is limited. This unavoidable, random increase in inbreeding over time is a core part of ​​genetic drift​​.

The rate of this increase is governed by the ​​inbreeding effective population size​​, Ne(I)N_e^{(I)}Ne(I)​. In an idealized population, the probability that two alleles drawn to make an offspring come from the very same ancestral allele in the previous generation is 12Ne\frac{1}{2N_e}2Ne​1​. This is a new source of IBD. The full recurrence relation for the inbreeding coefficient becomes: Ft+1=12Ne(I)+(1−12Ne(I))FtF_{t+1} = \frac{1}{2N_e^{(I)}} + \left(1 - \frac{1}{2N_e^{(I)}}\right)F_tFt+1​=2Ne(I)​1​+(1−2Ne(I)​1​)Ft​ This equation tells us that every generation, FFF creeps a little closer to 111. The change is approximately ΔF≈12Ne(I)\Delta F \approx \frac{1}{2N_e^{(I)}}ΔF≈2Ne(I)​1​ when FFF is small. This is why conservation biologists are so concerned with effective population size. A small NeN_eNe​ means a rapid increase in FFF, a rapid loss of heterozygosity, and an increased risk of inbreeding depression, which can spell doom for an endangered species. Identity by descent, which began as a simple idea to distinguish two John Smiths, ends up being the master variable that governs the genetic fate of entire populations.

Applications and Interdisciplinary Connections

Now that we have grasped the machinery of Identity by Descent (IBD)—the simple yet profound idea that two alleles can be physical copies of a single ancestral piece of DNA—we can embark on a journey. We will see how this one concept, like a master key, unlocks doors across the vast mansion of biological science. It is far from an abstract accounting tool for geneticists; it is a fundamental principle that explains the intimate details of our health, the structure of animal societies, the grand drama of evolution, and even the whispers we can coax from the bones of our ancestors. The trail of IBD leads everywhere.

The Double-Edged Sword in Our Genes: Medicine and Forensics

Perhaps the most personal place we encounter the consequences of IBD is in our own health. We all carry a handful of "sleeping" recessive alleles—variants that could cause disease but are harmless as long as they are paired with a normal, functional copy. In a large, randomly-mating population, the chances of two individuals carrying the same rare recessive allele meeting and having a child are exceedingly small.

But what happens when parents are related? Their child has a non-zero chance of inheriting two alleles at a locus that are identical by descent. This probability is precisely the inbreeding coefficient, FFF. As we've seen, this can be calculated by tracing paths through a family tree, such as in the classic case of first cousins where we find F=1/16F = 1/16F=1/16. Inbreeding doesn't create new faulty alleles, but it dramatically increases the chance that two sleeping copies will meet in the same person, awakening a recessive disease. The rarer the disease-causing allele is in the general population, the more dramatic this increase in relative risk becomes. Inbreeding acts as a concentrating lens for genetic risk, a fact that is of paramount importance in genetic counseling.

This principle isn't limited to severe diseases. Consider the Rh blood group system. A child being Rh-negative (genotype dddddd) when the ddd allele is rare in the population is an unlikely event. If that child's parents are first cousins, however, we can use the logic of IBD to trace the likely path of that rare allele through the family. We can even calculate the probability that the child's two ddd alleles are not just identical by state, but are in fact IBD copies inherited from a specific grandparent, providing a powerful window into the flow of genetic information through generations.

The same logic that helps us understand family resemblances can also be used to establish identity with astonishing precision. In forensic science, a DNA match between a crime scene sample and a suspect can be powerful evidence. But a crucial question must be asked: what is the probability that an unrelated person would match by chance? To answer this, forensic geneticists use reference databases of allele frequencies.

Here, a subtle application of IBD becomes critical. If a suspect belongs to a small, relatively isolated sub-population, the assumption of random mating may not hold. Individuals in that group are, on average, more related to each other than to a random person from the general population. They share a higher "background" level of IBD. To account for this, forensic analysts apply a "theta-correction" (θ\thetaθ). This θ\thetaθ is nothing more than an inbreeding coefficient for the sub-population, quantifying the probability that two alleles in an individual are IBD due to shared ancestry within that group. Applying this correction gives a more conservative, and therefore more scientifically honest, estimate of the genotype frequency. It acknowledges that two alleles might look the same (identity by state) precisely because they are descended from a common ancestor (identity by descent), making the genotype less rare in that specific group than the broad population data might suggest.

The Architect of Societies and the Savior of Species

Moving from individuals to populations, we find that IBD is a chief architect of the social lives of animals. The puzzle of altruism—why would an animal risk its own life or sacrifice its own reproduction to help another?—was largely solved by W. D. Hamilton with the concept of inclusive fitness. The famous Hamilton's rule, rB>CrB > CrB>C, states that an altruistic act is favored by selection if the benefit to the recipient (BBB), weighted by the relatedness of the actor to the recipient (rrr), exceeds the cost to the actor (CCC).

That all-important coefficient of relatedness, rrr, is a direct measure of IBD. It can be seen as the probability that a gene in the actor is an identical-by-descent copy of a gene in the recipient. For first cousins, for instance, we can trace the paths of inheritance from their shared grandparents to find that r=1/8r = 1/8r=1/8. From a "gene's-eye view," helping a relative is a form of self-interest, as it promotes the survival of its own IBD copies residing in another body.

This logic finds its most spectacular confirmation in the Hymenoptera—ants, bees, and wasps. Due to their strange "haplodiploid" genetic system, where males are haploid (from unfertilized eggs) and females are diploid, a startling asymmetry in relatedness emerges. While diploid full sisters (like in humans or termites) share, on average, half their genes (r=1/2r = 1/2r=1/2), haplodiploid full sisters are "super-sisters," sharing a remarkable three-quarters of their genes (r=3/4r = 3/4r=3/4). A female is more related to her sister than she would be to her own offspring! This high relatedness provides a powerful evolutionary incentive for a female worker to forgo her own reproduction and instead dedicate her life to helping her mother, the queen, produce more sisters. The concept of IBD elegantly explains the origin of the highly organized and seemingly selfless eusocial colonies that dot the natural world.

While IBD can build societies, it can also threaten the very existence of species. In small, isolated populations of endangered animals, mates are often related. This leads to a steady increase in the average inbreeding coefficient FFF of the population. The consequence is "inbreeding depression": as IBD becomes more common, deleterious recessive alleles are expressed more often, leading to lower fertility, higher infant mortality, and greater susceptibility to disease. This results in a decline in the mean fitness of the population.

Fortunately, an understanding of IBD also provides the solution. Conservation managers can plan a "genetic rescue" by introducing a small number of unrelated individuals into the threatened population. Using the mathematics of IBD, they can calculate the minimum fraction of immigrants needed to reduce the population's average FFF below a critical threshold in a single generation. This is IBD not as a passive descriptor, but as an active, quantitative tool used on the front lines of the fight to preserve biodiversity.

Reading the Past and Redefining the Present: The Frontiers of IBD

The power of IBD extends even into the deep past. With the advent of paleogenomics, we can now extract and sequence DNA from ancient bones and teeth. This data is often of low quality—fragmented, scarce, and riddled with chemical damage. How, from such noisy data, can we determine if two Neanderthals buried near each other were siblings, or a parent and child, or merely unrelated tribe members? We cannot draw a pedigree.

The modern solution is to estimate IBD in a more nuanced, statistical way. Instead of a single relatedness value, researchers estimate the probabilities that two individuals share 0, 1, or 2 alleles IBD, denoted k0,k1,k_0, k_1,k0​,k1​, and k2k_2k2​. Using sophisticated likelihood models that explicitly account for sequencing errors, post-mortem damage, and population allele frequencies, scientists can infer these IBD states directly from the messy genomic data. This allows for the reconstruction of family structures and social dynamics in populations that have been extinct for tens of thousands of years.

This statistical view of IBD leads us to a final, profound generalization. What if we redefine relatedness itself, freeing it from the need for a family tree? In its most modern form, relatedness is cast as a regression coefficient: r=Cov(Gactor,Grecipient)Var(Gactor)r = \frac{\mathrm{Cov}(G_{\text{actor}}, G_{\text{recipient}})}{\mathrm{Var}(G_{\text{actor}})}r=Var(Gactor​)Cov(Gactor​,Grecipient​)​ In plain English, this simply asks: to what extent does knowing the genetic makeup of an actor predict the genetic makeup of a recipient? This statistical association doesn't have to come from a neat, well-defined pedigree. It can arise from any process that makes interacting individuals genetically non-random: limited dispersal (neighbors tend to be kin), assortative interactions ("green-beard" effects where individuals with a gene for altruism recognize and help others with the same gene), or any form of population structure. This definition works for microbes in a biofilm just as well as it does for lions in a pride.

Here, the concept of IBD completes its transformation: from a simple rule about inheritance in families, it becomes a powerful, general measure of statistical genetic association, applicable across all life.

From the quiet consultation rooms of genetic counselors to the bustling societies of ants, from the desperate efforts to save the last members of a species to the ghostly genomes of our ancient cousins, Identity by Descent is the common thread. It is a testament to the beauty of science that such a simple idea—that two things can be copies of one original—can yield such a rich and intricate understanding of the living world.