Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics 生物信息学理论和实践 唐继军 13928761660.

Similar presentations


Presentation on theme: "Bioinformatics 生物信息学理论和实践 唐继军 13928761660."— Presentation transcript:

1 Bioinformatics 生物信息学理论和实践 唐继军 jtang@cse.sc.edu 13928761660

2 More Conditions

3

4 But Use ==,, >=, !=, ||, && for numeric numbers Use eq, lt, le, gt, ge, ne, or, and for string comparisons

5 More Arithmatics +, -, *, **, /, % +=, -=, *=, **=, /=, %= ++, --

6 $x = 28; $x = $x/2; print $x/=2, "\n"; print $x--, "\n"; print $x, "\n"; print --$x, "\n"; print $x, "\n"; print $x % 3, "\n"; print $x**2, "\n";

7 #!/usr/bin/perl -w print "Please type the filename of the DNA sequence data: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = ; @DNA = ; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_CG = 0; $position = 0; while ( $position < length $DNA) { $base = substr($DNA, $position, 1); if ( $base eq 'C' or $base eq 'G') { ++$count_of_CG; } $position++; } print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";

8 #!/usr/bin/perl –w print "Please type the filename of the DNA sequence data: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = ; @DNA = ; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_CG = 0; for ( $position = 0 ; $position < length $DNA ; ++$position ) { $base = substr($DNA, $position, 1); if ( $base eq 'C' or $base eq 'G') { ++$count_of_CG; } print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";

9 #!/usr/bin/perl –w print "Please type the filename of the DNA sequence data: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = ; @DNA = ; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_CG = 0; while($DNA =~ /c/ig) {$count_of_CG++;} while($DNA =~ /g/ig) {$count_of_CG++;} print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";

10 $DNA = "ACCTAAACCCGGGAGAATTCCCACCAATTCTACGTAAC"; $s = ""; for ($i = 0, $j = 5; $i < $j; $i+=2, $j++) { $s.= substr($DNA, $i, $j); } print $s, "\n";

11 $DNA = "ACCTAAACCCGGGAGAATTCCCACCAATTCTACGTAAC"; $s = ""; for ($i = 0, $j = 5; $i < $j; $i+=2, $j++) { $s.= substr $DNA, $i, $j; } print ($s, "\n");

12 Call functions/subroutines Name p1, p2, p3; Name(p1, p2, p3); print $DNA1, $DNA2, "\n"; print ($DNA1, $DNA2, "\n");

13 Exercise 1 Ask for a protein file in fasta format Ask for an amino acid Count the frequency of that amino acid TKFHSNAHFYDCWRMLQYQLDMRCMRAISTF SPHCGMEHMPDQTHNQGEMCKPRMWQVS MNQSCNHTPPFRKTYVEWDYMAKALIAPYTL GWLASTCFIW

14 Exercise 2 Ask for an RNA file in fasta format Convert it to RNA Ask for a codon Count the frequency of that codon TCGTACTTAGAAATGAGGGTCCGCTTTTGCCC ACGCACCTGATCGCTCCTCGTTTGCTTTTAAG AACCGGACGAACCACAGAGCATAAGGAGAA CCTCTAGCTGCTTTACAAAGTACTGGTTCCCT TTCCAGCGGGATGCTTTATCTAAACGCAATGA GAGAGGTATTCCTCAGGCCACATCGCTTCCTA GTTCCGCTGGGATCCATCGTTGGCGGCCGAA GCCGCCATTCCATAGTGAGTTCTTCGTCTGTG TCATTCTGTGCCAGATCGTCTGGCAAATAGCC GATCCAGTTTATCTCTCGAAACTATAGTCGTA CAGATCGAAATCTTAAGTCAAATCACGCGACT AGACTCAGCTCTATTTTAGTGGTCATGGGTTT TGGTCCCCCCGAGCGGTGCAACCGATTAGGA CCATGTAGAACATTAGTTATAAGTCTTCTTTTA AACACAATCTTCCTGCTCAGTGGTACATGGTT ATCGTTATTGCTAGCCAGCCTGATAAGTAACA CCACCACTGCGACCCTAATGCGCCCTTTCCAC GAACACAGGGCTGTCCGATCCTATATTACGA CTCCGGGAAGGGGTTCGCAAGTCGCACCCTA AACGATGTTGAAGGCTCAGGATGTACACGCA CTAGTACAATACATACGTGTTCCGGCTCTTAT CCTGCATCGGAAGCTCAATCATGCATCGCACC AGCGTGTTCGTGTCATCTAGGAGGGGCGCGT AGGATAAATAATTCAATTAAGATATCGTTATG CTAGTATACGCCTACCCGTCACCGGCCAACAG TGTGCAGATGGCGCCACGAGTTACTGGCCCT GATTTCTCCGCTTCTAATACCGCACACTGGGC AATACGAGCTCAAGCCAGTCTCGCAGTAACG CTCATCAGCTAACGAAAGAGTTAGAGGCTCG CTAAATCGCACTGTCGGGGTCCCTTGGGTATT TTACACTAGCGTCAGGTAGGCTAGCATGTGT CTTTCCTTCCAGGGGTATG

15 Subroutine Some code needs to be reused A good way to organize code Called “function” in some languages Name Return Parameters (@_)

16 #!/usr/bin/perl –w print "Please type the filename: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = ; @DNA = ; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_G = countG($DNA); print $count_of_G; sub countG { my($dna) = @_; my($count) = 0; $count = ( $dna =~ tr/Gg//); return $count; }

17 #!/usr/bin/perl –w print "Please type the filename: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = ; @DNA = ; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_G = count($DNA, 'Gg'); print $count_of_G; sub count { my($dna, $pattern) = @_; my($count) = 0; $count = ( eval("$dna =~ tr/$pattern//") ); return $count; }

18 Codon

19 sub codon2aa { my($codon) = @_; if ( $codon =~ /TCA/i ) { return 'S' } # Serine elsif ( $codon =~ /TCC/i ) { return 'S' } # Serine elsif ( $codon =~ /TCG/i ) { return 'S' } # Serine elsif ( $codon =~ /TCT/i ) { return 'S' } # Serine elsif ( $codon =~ /TTC/i ) { return 'F' } # Phenylalanine elsif ( $codon =~ /TTT/i ) { return 'F' } # Phenylalanine elsif ( $codon =~ /TTA/i ) { return 'L' } # Leucine elsif ( $codon =~ /TTG/i ) { return 'L' } # Leucine elsif ( $codon =~ /TAC/i ) { return 'Y' } # Tyrosine elsif ( $codon =~ /TAT/i ) { return 'Y' } # Tyrosine elsif ( $codon =~ /TAA/i ) { return '_' } # Stop elsif ( $codon =~ /TAG/i ) { return '_' } # Stop elsif ( $codon =~ /TGC/i ) { return 'C' } # Cysteine elsif ( $codon =~ /TGT/i ) { return 'C' } # Cysteine elsif ( $codon =~ /TGA/i ) { return '_' } # Stop elsif ( $codon =~ /TGG/i ) { return 'W' } # Tryptophan elsif ( $codon =~ /CTA/i ) { return 'L' } # Leucine elsif ( $codon =~ /CTC/i ) { return 'L' } # Leucine elsif ( $codon =~ /CTG/i ) { return 'L' } # Leucine elsif ( $codon =~ /CTT/i ) { return 'L' } # Leucine elsif ( $codon =~ /CCA/i ) { return 'P' } # Proline elsif ( $codon =~ /CCC/i ) { return 'P' } # Proline elsif ( $codon =~ /CCG/i ) { return 'P' } # Proline elsif ( $codon =~ /CCT/i ) { return 'P' } # Proline elsif ( $codon =~ /CAC/i ) { return 'H' } # Histidine elsif ( $codon =~ /CAT/i ) { return 'H' } # Histidine elsif ( $codon =~ /CAA/i ) { return 'Q' } # Glutamine elsif ( $codon =~ /CAG/i ) { return 'Q' } # Glutamine elsif ( $codon =~ /CGA/i ) { return 'R' } # Arginine elsif ( $codon =~ /CGC/i ) { return 'R' } # Arginine elsif ( $codon =~ /CGG/i ) { return 'R' } # Arginine elsif ( $codon =~ /CGT/i ) { return 'R' } # Arginine elsif ( $codon =~ /ATA/i ) { return 'I' } # Isoleucine elsif ( $codon =~ /ATC/i ) { return 'I' } # Isoleucine elsif ( $codon =~ /ATT/i ) { return 'I' } # Isoleucine elsif ( $codon =~ /ATG/i ) { return 'M' } # Methionine elsif ( $codon =~ /ACA/i ) { return 'T' } # Threonine elsif ( $codon =~ /ACC/i ) { return 'T' } # Threonine elsif ( $codon =~ /ACG/i ) { return 'T' } # Threonine elsif ( $codon =~ /ACT/i ) { return 'T' } # Threonine elsif ( $codon =~ /AAC/i ) { return 'N' } # Asparagine elsif ( $codon =~ /AAT/i ) { return 'N' } # Asparagine elsif ( $codon =~ /AAA/i ) { return 'K' } # Lysine elsif ( $codon =~ /AAG/i ) { return 'K' } # Lysine elsif ( $codon =~ /AGC/i ) { return 'S' } # Serine elsif ( $codon =~ /AGT/i ) { return 'S' } # Serine elsif ( $codon =~ /AGA/i ) { return 'R' } # Arginine elsif ( $codon =~ /AGG/i ) { return 'R' } # Arginine elsif ( $codon =~ /GTA/i ) { return 'V' } # Valine elsif ( $codon =~ /GTC/i ) { return 'V' } # Valine elsif ( $codon =~ /GTG/i ) { return 'V' } # Valine elsif ( $codon =~ /GTT/i ) { return 'V' } # Valine elsif ( $codon =~ /GCA/i ) { return 'A' } # Alanine elsif ( $codon =~ /GCC/i ) { return 'A' } # Alanine elsif ( $codon =~ /GCG/i ) { return 'A' } # Alanine elsif ( $codon =~ /GCT/i ) { return 'A' } # Alanine elsif ( $codon =~ /GAC/i ) { return 'D' } # Aspartic Acid elsif ( $codon =~ /GAT/i ) { return 'D' } # Aspartic Acid elsif ( $codon =~ /GAA/i ) { return 'E' } # Glutamic Acid elsif ( $codon =~ /GAG/i ) { return 'E' } # Glutamic Acid elsif ( $codon =~ /GGA/i ) { return 'G' } # Glycine elsif ( $codon =~ /GGC/i ) { return 'G' } # Glycine elsif ( $codon =~ /GGG/i ) { return 'G' } # Glycine elsif ( $codon =~ /GGT/i ) { return 'G' } # Glycine else { print STDERR "Bad codon \"$codon\"!!\n"; exit; } # }

20 sub codon2aa { my($codon) = @_; if ( $codon =~ /GC./i) { return 'A' } # Alanine elsif ( $codon =~ /TG[TC]/i) { return 'C' } # Cysteine elsif ( $codon =~ /GA[TC]/i) { return 'D' } # Aspartic Acid elsif ( $codon =~ /GA[AG]/i) { return 'E' } # Glutamic Acid elsif ( $codon =~ /TT[TC]/i) { return 'F' } # Phenylalanine elsif ( $codon =~ /GG./i) { return 'G' } # Glycine elsif ( $codon =~ /CA[TC]/i) { return 'H' } # Histidine elsif ( $codon =~ /AT[TCA]/i) { return 'I' } # Isoleucine elsif ( $codon =~ /AA[AG]/i) { return 'K' } # Lysine elsif ( $codon =~ /TT[AG]|CT./i) { return 'L' } # Leucine elsif ( $codon =~ /ATG/i) { return 'M' } # Methionine elsif ( $codon =~ /AA[TC]/i) { return 'N' } # Asparagine elsif ( $codon =~ /CC./i) { return 'P' } # Proline elsif ( $codon =~ /CA[AG]/i) { return 'Q' } # Glutamine elsif ( $codon =~ /CG.|AG[AG]/i) { return 'R' } # Arginine elsif ( $codon =~ /TC.|AG[TC]/i) { return 'S' } # Serine elsif ( $codon =~ /AC./i) { return 'T' } # Threonine elsif ( $codon =~ /GT./i) { return 'V' } # Valine elsif ( $codon =~ /TGG/i) { return 'W' } # Tryptophan elsif ( $codon =~ /TA[TC]/i) { return 'Y' } # Tyrosine elsif ( $codon =~ /TA[AG]|TGA/i) { return '_' } # Stop else {print STDERR "Bad codon \"$codon\"!!\n"; exit; } }

21 Exercise Make the subroutine of converting codon to aa Read in a dna fasta file, print out an Amino Acid sequence

22 #!/usr/bin/perl -w $dna = 'CGACGTCTTCGTACGGGACTAGCTCGTGTCGGTCGC'; $protein = ''; for(my $i=0; $i < (length($dna) - 2) ; $i += 3) { $codon = substr($dna,$i,3); $protein.= codon2aa($codon); } print "I translated the DNA\n\n$dna\n\n into the protein\n\n$protein\n\n"; sub codon2aa { #... }

23 Reading Frame 5' 3' atgcccaagctgaatagcgtagaggggttttcatcatttgaggacgatgtataa 1 atg ccc aag ctg aat agc gta gag ggg ttt tca tca ttt gag gac gat gta taa M P K L N S V E G F S S F E D D V * 2 tgc cca agc tga ata gcg tag agg ggt ttt cat cat ttg agg acg atg tat C P S * I A * R G F H H L R T M Y 3 gcc caa gct gaa tag cgt aga ggg gtt ttc atc att tga gga cga tgt ata A Q A E * R R G V F I I * G R C I three in the forward reading, three in the reverse complement reading

24 Exercise 3 Make the subroutine of converting codon to aa Read in a dna fasta file, print out an Amino Acid sequence There are 6 reading frame, can you try to print all 6 version?

25 #!/usr/bin/perl –w print "Please type the filename: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = ;@DNA = ;close DNAFILE; $DNA = join( '', @DNA);$DNA =~ s/\s//g; print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n"; sub dna2peptide { my ($dna) = @_; my $protein = ""; for(my $i=0; $i < (length($dna) - 2) ; $i += 3) { $codon = substr($dna,$i,3); $protein.= codon2aa($codon); } return $protein; } sub codon2aa { #... }

26 Modules A Perl Module is a self-contained pieceof [Perl] code that can be used by a Perl program later Like a library End with extension.pm Needs a 1 at the end

27 #Bio.pm sub codon2aa { #.... } sub dna2peptide { #.... } 1

28 #!/usr/bin/perl -w use Bio; print "Please type the filename: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = ;@DNA = ;close DNAFILE; $DNA = join( '', @DNA);$DNA =~ s/\s//g; print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

29 #Bio.pm sub codon2aa { #.... } sub dna2peptide { #.... } sub fasta_read { print "Please type the filename: "; my $dna_filename = ; chomp $dna_filename; unless (open(DNAFILE, $dna_filename)) { print "Cannot open file ", $dna_filename, "\n"; } $name = ;@DNA = ;close DNAFILE; $DNA = join( '', @DNA);$DNA =~ s/\s//g; return $DNA; } 1

30 #!/usr/bin/perl -w use Bio; $DNA = fasta_read(); print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

31 Scope my provides lexical scoping; a variable declared with my is visible only within the block in which it is declared. Blocks of code are hunks within curly braces {}; files are blocks. Use use vars qw([list of var names]) or our ([var_names]) to create package globals.

32 #!/usr/bin/perl -w use Bio; use strict; use warnings; $DNA = fasta_read(); print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

33 Variable "$DNA" is not imported at frame2.pl line 6. Variable "$DNA" is not imported at frame2.pl line 8. Variable "$DNA" is not imported at frame2.pl line 9. Variable "$DNA" is not imported at frame2.pl line 10. Variable "$DNA" is not imported at frame2.pl line 12. Variable "$DNA" is not imported at frame2.pl line 13. Variable "$DNA" is not imported at frame2.pl line 14. Variable "$DNA" is not imported at frame2.pl line 15. Global symbol "$DNA" requires explicit package name at frame2.pl line 6. Global symbol "$DNA" requires explicit package name at frame2.pl line 8. Global symbol "$DNA" requires explicit package name at frame2.pl line 9. Global symbol "$DNA" requires explicit package name at frame2.pl line 10. Global symbol "$DNA" requires explicit package name at frame2.pl line 12. Global symbol "$DNA" requires explicit package name at frame2.pl line 13. Global symbol "$DNA" requires explicit package name at frame2.pl line 14. Global symbol "$DNA" requires explicit package name at frame2.pl line 15. Execution of frame2.pl aborted due to compilation errors.

34 #!/usr/bin/perl -w use Bio; use strict; use warnings; my $DNA = fasta_read(); print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";


Download ppt "Bioinformatics 生物信息学理论和实践 唐继军 13928761660."

Similar presentations


Ads by Google