34.5k views
5 votes
n the context of rare diseases, what sequencing coverage will be sufficient to detect the mutated loci (for eg. SMN1, SMN2 CNVs, gene SNPS, INDELs etc) ? How does this relate to statistics and how one is confident of the results? Most commercial sequencing services are marketed as promising 95-99% at 20-30X. My question would be, how we arrive at these values. For example, why is 99% at 20X ideal and how are values derived mathematically?

User Darx
by
8.7k points

1 Answer

3 votes

Final answer:

Sufficient sequencing coverage for detecting rare disease mutations like SMN1, SMN2 CNVs, SNPs, and INDELs typically ranges from 20-30X. High coverage ensures that true variants are detected reliably, which is essential for accurate diagnosis and treatment. These values are derived from statistical models accounting for sequencing error rates and variant prevalence.

Step-by-step explanation:

In the context of rare diseases, sequencing coverage sufficient to detect mutated loci, such as SMN1, SMN2 CNVs, gene SNPs, INDELs, etc., can be statistically complex. The goal of whole-genome sequencing is to identify genetic variants that cause disease. Due to the vastness of the human genome, sequencing coverage and statistical confidence in the results become crucial.

Most commercial sequencing services advertise 95-99% detection at 20-30X coverage, meaning that each region of the genome, on average, is read 20 to 30 times. High coverage helps ensure that true variants are detected and that sequencing errors are minimized. Coverage depends on the sequencing technology used and the reliability required for the study's purpose. Next-generation sequencing technologies have lowered costs and increased the speed of genome sequencing, making it a powerful tool for identifying disease-causing genetic mutations.

From a statistical perspective, higher coverage increases the confidence in detecting true genetic variants while reducing false positives and negatives. The values, such as '99% confidence at 20X coverage,' are derived from mathematical models that consider the rate of sequencing errors, the prevalence of the variant in the genome, and the expected accuracy needed for a correct diagnosis. Physicians and researchers can, therefore, make more informed decisions about treatment and understanding diseases at a genetic level.

User Manuel Mourato
by
7.6k points