96.7k views
5 votes
What can be added to the code to get the number of CDS? I was thinking that the length might have something to do with it. This is the original problem so you know what the code was intended to do. The code was suppose to give: Use Biopython to extract the coding sequence (CDS) features from the Genbank format file (NCBI accession NC_006273.2) and make a fasta file with the RefSeq protein_id as the CDS identifier. The code below does give this but I want to know the number of CDS as well.

User Nic Gibson
by
7.1k points

1 Answer

5 votes

Final answer:

To count the number of Coding Sequences (CDS) in a BioPython script, add a counter that increments for each CDS found while parsing the GenBank file. Print the total count after processing all the features in the file.

Step-by-step explanation:

To count the number of Coding Sequences (CDS) using BioPython, you will want to add a counter to your script. Typically, when parsing a GenBank file to extract CDS features, you loop through each feature and increase a counter whenever a feature is identified as a CDS. You can initialize the counter before the loop, then each time you identify a CDS, increment the counter by one. To display the total number afterwards, simply print it out or use it as needed. The process would be something similar to:

from Bio import SeqIO

cds_counter = 0
for record in SeqIO.parse("NC_006273.2.gb", "genbank"):
for feature in record.features:
if feature.type == "CDS":
cds_counter += 1
print("Total number of CDS: ", cds_counter)

This script would output the total number of CDS after parsing the entire GenBank file. It is crucial that the 'CDS' string matches the type in the GenBank file to ensure proper counting.

User Cleanshooter
by
7.7k points