Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics 生物信息学理论和实践 唐继军 13928761660.

Similar presentations


Presentation on theme: "Bioinformatics 生物信息学理论和实践 唐继军 13928761660."— Presentation transcript:

1 Bioinformatics 生物信息学理论和实践 唐继军 jtang@cse.sc.edu 13928761660

2 Exercise 1 Ask for a protein file in fasta format Ask for an amino acid Count the frequency of that amino acid TKFHSNAHFYDCWRMLQYQLDMRCMRAISTF SPHCGMEHMPDQTHNQGEMCKPRMWQVS MNQSCNHTPPFRKTYVEWDYMAKALIAPYTL GWLASTCFIW

3 Exercise 2 Ask for an RNA file in fasta format Convert it to RNA Ask for a codon Count the frequency of that codon TCGTACTTAGAAATGAGGGTCCGCTTTTGCCC ACGCACCTGATCGCTCCTCGTTTGCTTTTAAG AACCGGACGAACCACAGAGCATAAGGAGAA CCTCTAGCTGCTTTACAAAGTACTGGTTCCCT TTCCAGCGGGATGCTTTATCTAAACGCAATGA GAGAGGTATTCCTCAGGCCACATCGCTTCCTA GTTCCGCTGGGATCCATCGTTGGCGGCCGAA GCCGCCATTCCATAGTGAGTTCTTCGTCTGTG TCATTCTGTGCCAGATCGTCTGGCAAATAGCC GATCCAGTTTATCTCTCGAAACTATAGTCGTA CAGATCGAAATCTTAAGTCAAATCACGCGACT AGACTCAGCTCTATTTTAGTGGTCATGGGTTT TGGTCCCCCCGAGCGGTGCAACCGATTAGGA CCATGTAGAACATTAGTTATAAGTCTTCTTTTA AACACAATCTTCCTGCTCAGTGGTACATGGTT ATCGTTATTGCTAGCCAGCCTGATAAGTAACA CCACCACTGCGACCCTAATGCGCCCTTTCCAC GAACACAGGGCTGTCCGATCCTATATTACGA CTCCGGGAAGGGGTTCGCAAGTCGCACCCTA AACGATGTTGAAGGCTCAGGATGTACACGCA CTAGTACAATACATACGTGTTCCGGCTCTTAT CCTGCATCGGAAGCTCAATCATGCATCGCACC AGCGTGTTCGTGTCATCTAGGAGGGGCGCGT AGGATAAATAATTCAATTAAGATATCGTTATG CTAGTATACGCCTACCCGTCACCGGCCAACAG TGTGCAGATGGCGCCACGAGTTACTGGCCCT GATTTCTCCGCTTCTAATACCGCACACTGGGC AATACGAGCTCAAGCCAGTCTCGCAGTAACG CTCATCAGCTAACGAAAGAGTTAGAGGCTCG CTAAATCGCACTGTCGGGGTCCCTTGGGTATT TTACACTAGCGTCAGGTAGGCTAGCATGTGT CTTTCCTTCCAGGGGTATG

4 Subroutine Some code needs to be reused A good way to organize code Called "function" in some languages Name Return Parameters (@_)

5 sub codon2aa { my($codon) = @_; if ( $codon =~ /GC./i) { return 'A' } Alanine elsif ( $codon =~ /TG[TC]/i) { return 'C' } Cysteine elsif ( $codon =~ /GA[TC]/i) { return 'D' } Aspartic Acid elsif ( $codon =~ /GA[AG]/i) { return 'E' } Glutamic Acid elsif ( $codon =~ /TT[TC]/i) { return 'F' } Phenylalanine elsif ( $codon =~ /GG./i) { return 'G' } Glycine elsif ( $codon =~ /CA[TC]/i) { return 'H' } Histidine elsif ( $codon =~ /AT[TCA]/i) { return 'I' } Isoleucine elsif ( $codon =~ /AA[AG]/i) { return 'K' } Lysine elsif ( $codon =~ /TT[AG]|CT./i) { return 'L' } Leucine elsif ( $codon =~ /ATG/i) { return 'M' } Methionine elsif ( $codon =~ /AA[TC]/i) { return 'N' } Asparagine elsif ( $codon =~ /CC./i) { return 'P' } Proline elsif ( $codon =~ /CA[AG]/i) { return 'Q' } Glutamine elsif ( $codon =~ /CG.|AG[AG]/i) { return 'R' } Arginine elsif ( $codon =~ /TC.|AG[TC]/i) { return 'S' } Serine elsif ( $codon =~ /AC./i) { return 'T' } Threonine elsif ( $codon =~ /GT./i) { return 'V' } Valine elsif ( $codon =~ /TGG/i) { return 'W' } Tryptophan elsif ( $codon =~ /TA[TC]/i) { return 'Y' } Tyrosine elsif ( $codon =~ /TA[AG]|TGA/i) { return '_' } Stop else {print STDERR "Bad codon \"$codon\"!!\n"; exit; } }

6 !/usr/bin/perl –w print "Please type the filename: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = ;@DNA = ;close DNAFILE; $DNA = join( '', @DNA);$DNA =~ s/\s//g; print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n"; sub dna2peptide { my ($dna) = @_; my $protein = ""; for(my $i=0; $i < (length($dna) - 2) ; $i += 3) { $codon = substr($dna,$i,3); $protein.= codon2aa($codon); } return $protein; } sub codon2aa {... }

7 Modules A Perl Module is a self-contained pieceof [Perl] code that can be used by a Perl program later Like a library End with extension.pm Needs a 1 at the end

8 Bio.pm sub codon2aa {.... } sub dna2peptide {.... } 1

9 !/usr/bin/perl -w use Bio; print "Please type the filename: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = ;@DNA = ;close DNAFILE; $DNA = join( '', @DNA);$DNA =~ s/\s//g; print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

10 Bio.pm sub codon2aa {.... } sub dna2peptide {.... } sub fasta_read { print "Please type the filename: "; my $dna_filename = ; chomp $dna_filename; unless (open(DNAFILE, $dna_filename)) { print "Cannot open file ", $dna_filename, "\n"; } $name = ;@DNA = ;close DNAFILE; $DNA = join( '', @DNA);$DNA =~ s/\s//g; return $DNA; } 1

11 !/usr/bin/perl -w use Bio; $DNA = fasta_read(); print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

12 Scope my provides lexical scoping; a variable declared with my is visible only within the block in which it is declared. Blocks of code are hunks within curly braces {}; files are blocks. Use use vars qw([list of var names]) or our ([var_names]) to create package globals.

13 !/usr/bin/perl -w use Bio; use strict; use warnings; $DNA = fasta_read(); print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

14 Variable "$DNA" is not imported at frame2.pl line 6. Variable "$DNA" is not imported at frame2.pl line 8. Variable "$DNA" is not imported at frame2.pl line 9. Variable "$DNA" is not imported at frame2.pl line 10. Variable "$DNA" is not imported at frame2.pl line 12. Variable "$DNA" is not imported at frame2.pl line 13. Variable "$DNA" is not imported at frame2.pl line 14. Variable "$DNA" is not imported at frame2.pl line 15. Global symbol "$DNA" requires explicit package name at frame2.pl line 6. Global symbol "$DNA" requires explicit package name at frame2.pl line 8. Global symbol "$DNA" requires explicit package name at frame2.pl line 9. Global symbol "$DNA" requires explicit package name at frame2.pl line 10. Global symbol "$DNA" requires explicit package name at frame2.pl line 12. Global symbol "$DNA" requires explicit package name at frame2.pl line 13. Global symbol "$DNA" requires explicit package name at frame2.pl line 14. Global symbol "$DNA" requires explicit package name at frame2.pl line 15. Execution of frame2.pl aborted due to compilation errors.

15 !/usr/bin/perl -w use Bio; use strict; use warnings; my $DNA = fasta_read(); print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

16 my $x = 10; for (my $x = 0; $x < 5; $x++) { Scope(); print $x, "\n"; } print $x, "\n"; sub Scope { my $x = 0; }

17 sub get_file_data { my($filename) = @_; use strict; use warnings; # Initialize variables my @filedata = ( ); unless( open(GET_FILE_DATA, $filename) ) { print STDERR "Cannot open file \"$filename\"\n\n"; exit; } @filedata = ; close GET_FILE_DATA; return @filedata; }

18 sub extract_sequence_from_fasta_data { my(@fasta_file_data) = @_; my $sequence = ''; foreach my $line (@fasta_file_data) { if ($line =~ /^\s*$/) { next; } elsif($line =~ /^\s*#/) { next; } elsif($line =~ /^>/) { next; } else { $sequence.= $line; } # remove non-sequence data (in this case, whitespace) from $sequence string $sequence =~ s/\s//g; return $sequence; }

19 Molecular Scissors Molecular Cell Biology, 4 th edition

20 Discovering Restriction Enzymes HindII - first restriction enzyme – was discovered accidentally in 1970 while studying how the bacterium Haemophilus influenzae takes up DNA from the virus Recognizes and cuts DNA at sequences: GTGCAC GTTAAC

21 Recognition Sites of Restriction Enzymes Molecular Cell Biology, 4 th edition

22 Uses of Restriction Enzymes Recombinant DNA technology Cloning cDNA/genomic library construction DNA mapping

23 Restriction Enzyme Database http://rebase.neb.com/rebase/rebase.html

24 http://rebase.neb.com/rebase/rebase.files.html

25

26 R = G or A Y = C or T M = A or C K = G or T S = G or C W = A or T B = not A (C or G or T) D = not C (A or G or T) H = not G (A or C or T) V = not T (A or C or G) N = A or C or G or T

27 sub IUB_to_regexp { my($iub) = @_; my $regular_expression = ‘’; my %iub2character_class = ( A => 'A', C => 'C', G => 'G', T => 'T', R => '[GA]', Y => '[CT]', M => '[AC]', K => '[GT]', S => '[GC]', W => '[AT]', B => '[CGT]', D => '[AGT]', H => '[ACT]', V => '[ACG]', N => '[ACGT]', ); $iub =~ s/\^//g; for ( my $i = 0 ; $i < length($iub) ; ++$i ) { $regular_expression.= $iub2character_class{substr($iub, $i, 1)}; } return $regular_expression; }

28 Hash Initialize: my %hash = (); Add key/value pair: $hash{$key} = $value; Add more keys: %hash = ( 'key1', 'value1', 'key2', 'value2 ); %hash = ( key1 => 'value1', key2 => 'value2', ); Delete: delete $hash{$key};

29 while ( my ($key, $value) = each(%hash) ) { print "$key => $value\n"; } for my $key ( keys %hash ) { my $value = $hash{$key}; print "$key => $value\n"; }

30 sub codon2aa { my($codon) = @_; $codon = uc $codon; my %genetic_code = ( 'TCA' => 'S', # Serine 'TCC' => 'S', # Serine 'TCG' => 'S', # Serine 'TCT' => 'S', # Serine 'TTC' => 'F', # Phenylalanine 'TTT' => 'F', # Phenylalanine 'TTA' => 'L', # Leucine 'TTG' => 'L', # Leucine #Many more ); if(exists $genetic_code{$codon}) { return $genetic_code{$codon}; }else{ print STDERR "Bad codon \"$codon\"!!\n"; exit; }

31 sub parseREBASE { my($rebasefile) = @_; my @rebasefile = ( ); my %rebase_hash = ( ); my $name; my $site; my $regexp; open($rebase_filehandle, $rebasefile) or die "Cannot open file\n"; while( ) { # Discard header lines ( 1.. /Rich Roberts/ ) and next; # Discard blank lines /^\s*$/ and next; # Split the two (or three if includes parenthesized name) fields my @fields = split( " ", $_); $name = shift @fields; $site = pop @fields; # Translate the recognition sites to regular expressions $regexp = IUB_to_regexp($site); # Store the data into the hash $rebase_hash{$name} = "$site $regexp"; } # Return the hash containing the reformatted REBASE data return %rebase_hash; }

32 Range ( 1.. /Rich Roberts/ ) and next from first line till some line containing Rich Roberts If that is true, it will check the statement after "and" If that is not true, it will not check the statement after "and" open(…) or die If can open, the statement is already true, no need to check the statement after "or" If cannot open, the statement is false, need to check the statement after "or" to see if it can be true

33 @fred = (1,2,3); @barney = @fred; @huh = 1; @fred = qw(one two); @barney = (4,5,@fred,6,7); @barney = (8,@barney); ($a,$b,$c) = (1,2,3); @fred = (@barney = (2,3,4)); @fred = @barney = (2,3,4); @fred = (1,2,3); $fred[3] = "hi"; $fred[6] = "ho"; # @fred is now (1,2,3,"hi",undef,undef,"ho")

34 Array operators push and pop (right-most element) @mylist = (1,2,3); push(@mylist,4,5,6); $oldvalue = pop(@mylist); shift and unshift (left-most element) @fred = (5,6,7); unshift(@fred,2,3,4); $x = shift(@fred); reverse: @a = (7,8,9); @b = reverse(@a); sort: @a = (7,9,9); @b = sort(@a);

35 sub match_positions { my($regexp, $sequence) = @_; use BeginPerlBioinfo; my @positions = ( ); while ( $sequence =~ /$regexp/ig ) { push ( @positions, pos($sequence) - length($&) + 1); } return @positions; }

36 use BeginPerlBioinfo; my %rebase_hash = ( ); my @file_data = ( ); my $query = ''; my $dna = ''; my $recognition_site = ''; my $regexp = ''; my @locations = ( ); @file_data = get_file_data("sample.dna"); $dna = extract_sequence_from_fasta_data(@file_data); %rebase_hash = parseREBASE('bionet'); do { print "Search for what restriction site for (or quit)?: "; $query = ; chomp $query; if ($query =~ /^\s*$/ ) { exit; } if ( exists $rebase_hash{$query} ) { ($recognition_site, $regexp) = split ( " ", $rebase_hash{$query}); @locations = match_positions($regexp, $dna); if (@locations) { print "Searching for $query $recognition_site $regexp\n"; print "Restriction site for $query at :", join(" ", @locations), "\n"; } else { print "A restriction enzyme $query is not in the DNA:\n"; } } until ( $query =~ /quit/ ); exit;


Download ppt "Bioinformatics 生物信息学理论和实践 唐继军 13928761660."

Similar presentations


Ads by Google