3 votes
DNA is the fundamental encoding of the instructions that govern the operation of living cells and, by extension, biological organisms. You can think of DNA as a storage medium in which the program that executes within all of your cells is written. The "machine code" of DNA, corresponding to the byte-code of Java, consists of only four nucleotides: four amino acids that are arranged in a linear sequence along the DNA molecule. These four bases are: guanine (G), adenine (A), thymine (T), and cytosine (C). So, a DNA molecule can be represented as a string made up of those four letters. The science of bioinformatics is largely concerned with computations on such genetic strings, or sequences. There are a variety of computations that one might perform on genetic sequences. We will investigate two types: basic statistics of individual sequences and pairwise alignments used to compare pairs of sequences.

Your program will first prompt the user to enter a single DNA sequence, which it should validate for legality (i.e., only the four valid bases) — you might do this validation by writing a function that takes a String as a parameter and returns a boolean. Re-prompt the user if the input was invalid. Once you have a valid input, compute the following statistics (each should be implemented as a separate function, called from main()).
1. Count the number of occurrences of "C".
2. Determine the fraction of cytosine and guanine nucleotides. For example, if half of the nucleotides in the sequence are either "C" or "G", the fraction should be 0.5.
-A DNA strand is actually made up of pairs of bases — in effect, two strands that are cross-linked together. These two strands are complementary: if you know one, you can always determine the other, or complement, because each nucleotide only pairs up with one other. In particular, "A" and "T" are complements, as are "C" and "G". So, for example, the complement of the sequence "AAGGT" would be "TTCCA". Compute the complement of the input sequence.

1 Answer

2 votes


See explaination

Step-by-step explanation:

import java.util.*;

class Dna


public static void main(String args[])


Scanner sc = new Scanner(System.in);

boolean b = false; //boolean variable to check validity

String s1="",s2="";

//input 1st sequence from user

while(b != true)


System.out.print("Sequence 1: ");

s1 = sc.nextLine();

b = isValid(s1); //checks validity


int c = findCount(s1); //finds C-Count for 1st sequence

double ratio = findRatio(c, s1); //finds CG-Ratio for 1st sequence

String complement = findComplement(s1); //finds complement of 1st sequence

System.out.println("C-count: "+c);

System.out.println("CG-ratio: "+ratio);

System.out.println("Complement: "+complement+"\\");

b = false; //re-initialize for 2nd sequence

//input 2nd sequence from user

while(b != true)


System.out.print("Sequence 2: ");

s2 = sc.nextLine();

b = isValid(s2); //checks validity


c = findCount(s2); //finds C-Count for 2nd sequence

ratio = findRatio(c, s2); //finds CG-Ratio for 2nd sequence

complement = findComplement(s2); //finds complement of 2nd sequence

System.out.println("C-count: "+c);

System.out.println("CG-ratio: "+ratio);

System.out.println("Complement: "+complement+"\\");

findAlignment(s1, s2); //finds best alignment score


/* This function determines validity of a sequence */

public static boolean isValid(String s)


boolean b = true;

for(int i=0; i<s.length(); i++)


char c = s.charAt(i);

if(!(c=='A' || c=='C' || c=='G' || c=='T'))


b = false;




return b;


/* This function finds count of 'C' by iterating over string */

public static int findCount(String s)


int count = 0;

for(int i=0; i<s.length(); i++)


if(s.charAt(i) == 'C')



return count;



This function finds CG-Ratio by iterating over string

and finding count of 'C' and 'G' and dividing the count by

size of string


public static double findRatio(int c, String s)


int count = 0;

int length = s.length();

for(int i=0; i<length; i++)

if(s.charAt(i) == 'C'

double ratio = (double)count/length;

ratio = (double) Math.round(ratio * 1000) / 1000;

return ratio;


/* This function finds complement of a sequence */

public static String findComplement(String s)


String sc = "";

for(int i=0; i<s.length(); i++)


char c = s.charAt(i);

if(c == 'A')

sc = sc + "T";

else if(c == 'T')

sc = sc + "A";

else if(c == 'C')

sc = sc + "G";

else if(c == 'G')

sc = sc + "C";


return sc;



This function finds maximum Alignment score by shifting

the string with lower size by 1 until the difference

between the size of both strings and calculating count

of characters match


public static void findAlignment(String s1, String s2)


int offset = 0; //highest shift upto which 2nd sequence need to be shifted

int maxOffset = 0; //the offset where we get maximum alignment score

int maxAllignment = 0; //stores max Alignment score

int l1 = s1.length(); //length of 1st sequence

int l2 = s2.length(); //length of 2nd sequence

int min = 0; //stores the length of sequence with smaller size

//calculate difference between size of both sequences

//to determine offset



offset = l1 - l2;

min = l2;


else if(l1<l2)


offset = l2 - l1;

min = l1;




offset = 1; //ensures single iteration for equi-length sequences

min = l1;


//loop to find max alignment score

for(int i=0; i<offset; i++)


int count = 0; //counts alignment score for each offset

//This loop checks the count for each alignment

for(int j=0; j<min; j++)


if(s1.charAt(j+i) == s2.charAt(j))



//store highest alignment score in maxAlignment

//and shift of the smaller sequence in maxOffset

if(count > maxAllignment)


maxAllignment = count;

maxOffset = i;



//Print the alignment score and alignment of sequences



System.out.println("Best alignment score: "+maxAllignment);


for(int i=0; i<maxOffset; i++)

System.out.print(" ");



else if(l2>l1)


System.out.println("Best alignment score: "+maxAllignment);

for(int i=0; i<maxOffset; i++)

System.out.print(" ");






System.out.println("Best alignment score: "+maxAllignment);






User Sreedeepkesav M S
7.2k points