128k views
5 votes
Parent 1 and 2 have children. Assume infinite, randomly-mating population size. How many generations until the median descendant by lineage of parent 1 has 0 base pairs inherited from parent 1? I tried calculating this, but it was difficult to model variability in cross-over rates, and my computer is too slow for the Monte-Carlo I made. Here is my guess: from math import ceil, log2

import random
import numpy as np

haploid_genome_size_bp = 3300 * 1e6
diploid_genome_size_bp = haploid_genome_size_bp * 2
chromosome_pairs = 23

# Recomb hotspots
hotspot_length = 1500 # Taking the average length of 1-2 kb, in bp
hotspot_interval = 75000 # Taking the average interval of 50-100 kb, in bp
hotspots_per_genome = 30000 # Number of hotspots in the human genome

# Average crossover frequency per hotspot
avg_crossover_per_hotspot = 1 / 1300 # One crossover per 1300 meioses

total_hotspot_length_bp = hotspots_per_genome * hotspot_length

initial_genome_fraction = 1 # Starting with 100% of the genome from Parent 1
generations = 0 # Counter for the number of generations

# Loop to calculate the number of generations required for the descendant to have less than 1 bp from Parent 1
while initial_genome_fraction * diploid_genome_size_bp > 1:
# Introduce variance in the average number of crossovers per chromosome, for probably needless complication
# Using a Gaussian distribution centered around the average, with a standard deviation of 0.5
avg_crossover_per_chromosome = (1.25, 0.5) #this doesn't actually affect the result

# Total length affected by crossovers in one meiosis event bc hotspots
total_crossover_length_bp = total_hotspot_length_bp * avg_crossover_per_hotspot * avg_crossover_per_chromosome * chromosome_pairs

# Calculate the fraction of the genome that is affected by crossovers and recombination in each generation
fraction_swapped_each_generation = 2 * (total_crossover_length_bp / diploid_genome_size_bp)

# Update the fraction of the genome from Parent 1 in the descendant
initial_genome_fraction *= 0.5 * (1 - fraction_swapped_each_generation) + 0.5 * fraction_swapped_each_generation / 2

generations += 1

print(generations) Result: 33 generations. But really, information is lost sooner, since (approximately) non-SNPs won't contribute to information, non-SNPs are shared between individuals. Let's say there's 600 million SNPs, randomly distributed throughout the genome. Let's assume that parent 1 and the entire rest of the population differ at, say, 70% of SNPs (this value doesn't matter much, it's always 29 or 30). from math import ceil, log2
import random
import numpy as np

total_snps = 600e6
haploid_genome_size_bp = 3300 * 1e6
diploid_genome_size_bp = haploid_genome_size_bp * 2
chromosome_pairs = 23

hotspot_length = 1500
hotspot_interval = 75000
hotspots_per_genome = 30000

avg_crossover_per_hotspot = 1 / 1300

total_hotspot_length_bp = hotspots_per_genome * hotspot_length

initial_snps_fraction = 0.7
generations = 0

while initial_snps_fraction * total_snps > 1:
avg_crossover_per_chromosome = (1.25, 0.1)
total_crossover_length_bp = total_hotspot_length_bp * avg_crossover_per_hotspot * avg_crossover_per_chromosome * chromosome_pairs
fraction_swapped_each_generation = 2 * (total_crossover_length_bp / diploid_genome_size_bp)
initial_snps_fraction *= 0.5 * (1 - fraction_swapped_each_generation) + 0.5 * fraction_swapped_each_generation / 2
generations += 1
print(generations) Result: 29 generations. It is sensitive to the number of SNPs in existence. If we use 1 million SNPs, we get 15 generations. Is this right? Can we find a more accurate estimate? My code probably has some mistakes.

User Krishnab
by
7.7k points

1 Answer

1 vote

Final answer:

The question asks about the number of generations until the median descendant by lineage of parent 1 has 0 base pairs inherited from parent 1. The provided code attempts to estimate this, but has some errors and limitations. A more accurate estimate can be obtained by considering recombination frequency and genetic distance, but precise calculations require a more detailed model.

Step-by-step explanation:

The question is asking about the number of generations it would take for the median descendant of Parent 1 to have 0 base pairs inherited from Parent 1 in an infinite, randomly-mating population. The provided code attempts to calculate this using a Monte-Carlo simulation. However, the code seems to have some errors and limitations. It estimates that it would take 33 generations, but acknowledges that information is lost sooner due to non-SNPs and suggests that using a different number of SNPs would yield different results.

While the provided code has attempted to model the problem, a more accurate estimate can be obtained by considering the concept of recombination frequency and genetic distance. Geneticists use recombination frequency, which measures the proportion of nonparental gametes and indicates the distance between genes on a chromosome. Recombination hotspots and the average crossover frequency per hotspot can be used to calculate the fraction of the genome that is affected by crossovers and recombination in each generation.

By incorporating the number of SNPs and applying recombination frequency, a more accurate estimate of the number of generations can be obtained. However, precise calculations would require a more detailed model and analysis, including variations in crossover rates and other factors affecting genetic inheritance.

User Fulv
by
7.4k points