Final answer:
The calculation results in coverage that suggests each base in the genome has been sequenced on average 20 times, but due to the randomness in DNA fragment selection, some bases may not have been sequenced. Exact calculation of the unsequenced proportion requires additional data.
Step-by-step explanation:
To determine what proportion of the DNA sequence will not have been sequenced from an organism with a genome of 100,000,000 bases when you have sequenced a 500 base portion from 4,000,000 randomly selected DNA fragments, we use the following calculation:
Total sequenced bases = Number of fragments × Length of each sequence = 4,000,000 × 500 = 2,000,000,000 bases
Since the genome is 100,000,000 bases long, the proportion of the genome sequenced is:
Proportion sequenced = Total sequenced bases / Genome size = 2,000,000,000 / 100,000,000 = 20
However, 20 times the genome size has been sequenced due to overlap of fragments. Since you cannot have more than 100% of the genome sequenced, it implies that on average, every base has been sequenced multiple times. As such, we could say the coverage is 20X, meaning every base has been sequenced on average 20 times, though some may not have been sequenced at all due to the randomness of selection.
To calculate the proportion not sequenced, we'd need additional information regarding the distribution of the sequenced fragments; assuming a perfect distribution is unrealistic due to the randomness of sampling. Therefore, without specific details about coverage uniformity, it is statistically likely that there is a small proportion of the genome that has not been sequenced at all, but determining the exact proportion would require more data.