221k views
3 votes
I am looking at the mouse reference genome in combination with ensemble annotation and am finding many transcripts that have no start codon.

For example, the transcript ENSMUST00000193149 at position 165503943-165567747. The first CDS has a frame number 1. In the UCSC genome browser, I see that in one of the tracks ADCY10 gene starts at this location, but in the protein sequence, there is an X under the first base, followed by a T under the next three (which indeed correspond to T).

I am puzzled: why does the transcript start at something other than the start codon? Why does the annotation indicate its start position as 165503943 instead of 165503944 where the actual codon starts?

User Phyllis
by
7.9k points

1 Answer

6 votes

Final answer:

The transcript may start at a non-traditional start codon due to regulatory elements like upstream open reading frames or a non-optimal Kozak consensus sequence. Additionally, pseudogenes and non-coding RNA elements within genomic annotations can present start sites that do not align with expected protein-coding sequences.

Step-by-step explanation:

The observation that a transcript starts at a position other than the start codon can occur due to several reasons. One factor is the presence of non-coding RNAs or upstream open reading frames (uORFs) that can serve regulatory functions, leading to a start site that does not coincide with a typical start codon.

Additionally, eukaryotic translation initiation does not always start at the first AUG codon; it depends on the surrounding Kozak consensus sequence to determine the efficiency of translation initiation.

Sometimes, the actual coding sequence starts after some 5' untranslated region (5' UTR), which may contain uORFs or regulatory elements.

The eukaryotic initiation complex, which scans the mRNA in the 5' to 3' direction after recognizing the 7-methylguanosine cap at the 5' end, may bypass certain start codons until it reaches a consensus sequence that matches Kozak's rules well.

Misannotations or the identification of alternate transcripts, such as when looking at pseudogenes or non-coding RNA elements, can also result in apparent discrepancies.

These transcripts may have alternate starting points that do not initially translate into a protein but might contain short open reading frames (smORFs) or function in other regulatory capacities.

In some cases, recent discoveries have revealed that some long non-coding RNAs could translate into short peptides, indicating our understanding of the genome is still evolving.

User Mazen Harake
by
8.0k points
Welcome to QAmmunity.org, where you can ask questions and receive answers from other members of our community.