Bioinformatics 生物信息学理论和实践唐继军 13928761660.

Bioinformatics 生物信息学理论和实践唐继军 jtang@cse.sc.edu 13928761660

Exercise 1 Ask for a protein file in fasta format Ask for an amino acid Count the frequency of that amino acid TKFHSNAHFYDCWRMLQYQLDMRCMRAISTF SPHCGMEHMPDQTHNQGEMCKPRMWQVS MNQSCNHTPPFRKTYVEWDYMAKALIAPYTL GWLASTCFIW

Exercise 2 Ask for an RNA file in fasta format Convert it to RNA Ask for a codon Count the frequency of that codon TCGTACTTAGAAATGAGGGTCCGCTTTTGCCC ACGCACCTGATCGCTCCTCGTTTGCTTTTAAG AACCGGACGAACCACAGAGCATAAGGAGAA CCTCTAGCTGCTTTACAAAGTACTGGTTCCCT TTCCAGCGGGATGCTTTATCTAAACGCAATGA GAGAGGTATTCCTCAGGCCACATCGCTTCCTA GTTCCGCTGGGATCCATCGTTGGCGGCCGAA GCCGCCATTCCATAGTGAGTTCTTCGTCTGTG TCATTCTGTGCCAGATCGTCTGGCAAATAGCC GATCCAGTTTATCTCTCGAAACTATAGTCGTA CAGATCGAAATCTTAAGTCAAATCACGCGACT AGACTCAGCTCTATTTTAGTGGTCATGGGTTT TGGTCCCCCCGAGCGGTGCAACCGATTAGGA CCATGTAGAACATTAGTTATAAGTCTTCTTTTA AACACAATCTTCCTGCTCAGTGGTACATGGTT ATCGTTATTGCTAGCCAGCCTGATAAGTAACA CCACCACTGCGACCCTAATGCGCCCTTTCCAC GAACACAGGGCTGTCCGATCCTATATTACGA CTCCGGGAAGGGGTTCGCAAGTCGCACCCTA AACGATGTTGAAGGCTCAGGATGTACACGCA CTAGTACAATACATACGTGTTCCGGCTCTTAT CCTGCATCGGAAGCTCAATCATGCATCGCACC AGCGTGTTCGTGTCATCTAGGAGGGGCGCGT AGGATAAATAATTCAATTAAGATATCGTTATG CTAGTATACGCCTACCCGTCACCGGCCAACAG TGTGCAGATGGCGCCACGAGTTACTGGCCCT GATTTCTCCGCTTCTAATACCGCACACTGGGC AATACGAGCTCAAGCCAGTCTCGCAGTAACG CTCATCAGCTAACGAAAGAGTTAGAGGCTCG CTAAATCGCACTGTCGGGGTCCCTTGGGTATT TTACACTAGCGTCAGGTAGGCTAGCATGTGT CTTTCCTTCCAGGGGTATG

Subroutine Some code needs to be reused A good way to organize code Called "function" in some languages Name Return Parameters (@_)

sub codon2aa { my($codon) = @_; if ( $codon =~ /GC./i) { return 'A' } Alanine elsif ( $codon =~ /TG[TC]/i) { return 'C' } Cysteine elsif ( $codon =~ /GA[TC]/i) { return 'D' } Aspartic Acid elsif ( $codon =~ /GA[AG]/i) { return 'E' } Glutamic Acid elsif ( $codon =~ /TT[TC]/i) { return 'F' } Phenylalanine elsif ( $codon =~ /GG./i) { return 'G' } Glycine elsif ( $codon =~ /CA[TC]/i) { return 'H' } Histidine elsif ( $codon =~ /AT[TCA]/i) { return 'I' } Isoleucine elsif ( $codon =~ /AA[AG]/i) { return 'K' } Lysine elsif ( $codon =~ /TT[AG]|CT./i) { return 'L' } Leucine elsif ( $codon =~ /ATG/i) { return 'M' } Methionine elsif ( $codon =~ /AA[TC]/i) { return 'N' } Asparagine elsif ( $codon =~ /CC./i) { return 'P' } Proline elsif ( $codon =~ /CA[AG]/i) { return 'Q' } Glutamine elsif ( $codon =~ /CG.|AG[AG]/i) { return 'R' } Arginine elsif ( $codon =~ /TC.|AG[TC]/i) { return 'S' } Serine elsif ( $codon =~ /AC./i) { return 'T' } Threonine elsif ( $codon =~ /GT./i) { return 'V' } Valine elsif ( $codon =~ /TGG/i) { return 'W' } Tryptophan elsif ( $codon =~ /TA[TC]/i) { return 'Y' } Tyrosine elsif ( $codon =~ /TA[AG]|TGA/i) { return '_' } Stop else {print STDERR "Bad codon \"$codon\"!!\n"; exit; } }

!/usr/bin/perl –w print "Please type the filename: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = ;@DNA = ;close DNAFILE; $DNA = join( '', @DNA);$DNA =~ s/\s//g; print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n"; sub dna2peptide { my ($dna) = @_; my $protein = ""; for(my $i=0; $i < (length($dna) - 2) ; $i += 3) { $codon = substr($dna,$i,3); $protein.= codon2aa($codon); } return $protein; } sub codon2aa {... }

Modules A Perl Module is a self-contained pieceof [Perl] code that can be used by a Perl program later Like a library End with extension.pm Needs a 1 at the end

Bio.pm sub codon2aa {.... } sub dna2peptide {.... } 1

!/usr/bin/perl -w use Bio; print "Please type the filename: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = ;@DNA = ;close DNAFILE; $DNA = join( '', @DNA);$DNA =~ s/\s//g; print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

Bio.pm sub codon2aa {.... } sub dna2peptide {.... } sub fasta_read { print "Please type the filename: "; my $dna_filename = ; chomp $dna_filename; unless (open(DNAFILE, $dna_filename)) { print "Cannot open file ", $dna_filename, "\n"; } $name = ;@DNA = ;close DNAFILE; $DNA = join( '', @DNA);$DNA =~ s/\s//g; return $DNA; } 1

!/usr/bin/perl -w use Bio; $DNA = fasta_read(); print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

Scope my provides lexical scoping; a variable declared with my is visible only within the block in which it is declared. Blocks of code are hunks within curly braces {}; files are blocks. Use use vars qw([list of var names]) or our ([var_names]) to create package globals.

!/usr/bin/perl -w use Bio; use strict; use warnings; $DNA = fasta_read(); print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

Variable "$DNA" is not imported at frame2.pl line 6. Variable "$DNA" is not imported at frame2.pl line 8. Variable "$DNA" is not imported at frame2.pl line 9. Variable "$DNA" is not imported at frame2.pl line 10. Variable "$DNA" is not imported at frame2.pl line 12. Variable "$DNA" is not imported at frame2.pl line 13. Variable "$DNA" is not imported at frame2.pl line 14. Variable "$DNA" is not imported at frame2.pl line 15. Global symbol "$DNA" requires explicit package name at frame2.pl line 6. Global symbol "$DNA" requires explicit package name at frame2.pl line 8. Global symbol "$DNA" requires explicit package name at frame2.pl line 9. Global symbol "$DNA" requires explicit package name at frame2.pl line 10. Global symbol "$DNA" requires explicit package name at frame2.pl line 12. Global symbol "$DNA" requires explicit package name at frame2.pl line 13. Global symbol "$DNA" requires explicit package name at frame2.pl line 14. Global symbol "$DNA" requires explicit package name at frame2.pl line 15. Execution of frame2.pl aborted due to compilation errors.

!/usr/bin/perl -w use Bio; use strict; use warnings; my $DNA = fasta_read(); print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

my $x = 10; for (my $x = 0; $x < 5; $x++) { Scope(); print $x, "\n"; } print $x, "\n"; sub Scope { my $x = 0; }

sub get_file_data { my($filename) = @_; use strict; use warnings; # Initialize variables my @filedata = ( ); unless( open(GET_FILE_DATA, $filename) ) { print STDERR "Cannot open file \"$filename\"\n\n"; exit; } @filedata = ; close GET_FILE_DATA; return @filedata; }

sub extract_sequence_from_fasta_data { my(@fasta_file_data) = @_; my $sequence = ''; foreach my $line (@fasta_file_data) { if ($line =~ /^\s*$/) { next; } elsif($line =~ /^\s*#/) { next; } elsif($line =~ /^>/) { next; } else { $sequence.= $line; } # remove non-sequence data (in this case, whitespace) from $sequence string $sequence =~ s/\s//g; return $sequence; }

Molecular Scissors Molecular Cell Biology, 4 th edition

Discovering Restriction Enzymes HindII - first restriction enzyme – was discovered accidentally in 1970 while studying how the bacterium Haemophilus influenzae takes up DNA from the virus Recognizes and cuts DNA at sequences: GTGCAC GTTAAC

Recognition Sites of Restriction Enzymes Molecular Cell Biology, 4 th edition

Uses of Restriction Enzymes Recombinant DNA technology Cloning cDNA/genomic library construction DNA mapping

Restriction Enzyme Database http://rebase.neb.com/rebase/rebase.html

http://rebase.neb.com/rebase/rebase.files.html

R = G or A Y = C or T M = A or C K = G or T S = G or C W = A or T B = not A (C or G or T) D = not C (A or G or T) H = not G (A or C or T) V = not T (A or C or G) N = A or C or G or T

sub IUB_to_regexp { my($iub) = @_; my $regular_expression = ‘’; my %iub2character_class = ( A => 'A', C => 'C', G => 'G', T => 'T', R => '[GA]', Y => '[CT]', M => '[AC]', K => '[GT]', S => '[GC]', W => '[AT]', B => '[CGT]', D => '[AGT]', H => '[ACT]', V => '[ACG]', N => '[ACGT]', ); $iub =~ s/\^//g; for ( my $i = 0 ; $i < length($iub) ; ++$i ) { $regular_expression.= $iub2character_class{substr($iub, $i, 1)}; } return $regular_expression; }

Hash Initialize: my %hash = (); Add key/value pair: $hash{$key} = $value; Add more keys: %hash = ( 'key1', 'value1', 'key2', 'value2 ); %hash = ( key1 => 'value1', key2 => 'value2', ); Delete: delete $hash{$key};

while ( my ($key, $value) = each(%hash) ) { print "$key => $value\n"; } for my $key ( keys %hash ) { my $value = $hash{$key}; print "$key => $value\n"; }

sub codon2aa { my($codon) = @_; $codon = uc $codon; my %genetic_code = ( 'TCA' => 'S', # Serine 'TCC' => 'S', # Serine 'TCG' => 'S', # Serine 'TCT' => 'S', # Serine 'TTC' => 'F', # Phenylalanine 'TTT' => 'F', # Phenylalanine 'TTA' => 'L', # Leucine 'TTG' => 'L', # Leucine #Many more ); if(exists $genetic_code{$codon}) { return $genetic_code{$codon}; }else{ print STDERR "Bad codon \"$codon\"!!\n"; exit; }

sub parseREBASE { my($rebasefile) = @_; my @rebasefile = ( ); my %rebase_hash = ( ); my $name; my $site; my $regexp; open($rebase_filehandle, $rebasefile) or die "Cannot open file\n"; while( ) { # Discard header lines ( 1.. /Rich Roberts/ ) and next; # Discard blank lines /^\s*$/ and next; # Split the two (or three if includes parenthesized name) fields my @fields = split( " ", $_); $name = shift @fields; $site = pop @fields; # Translate the recognition sites to regular expressions $regexp = IUB_to_regexp($site); # Store the data into the hash $rebase_hash{$name} = "$site $regexp"; } # Return the hash containing the reformatted REBASE data return %rebase_hash; }

Range ( 1.. /Rich Roberts/ ) and next from first line till some line containing Rich Roberts If that is true, it will check the statement after "and" If that is not true, it will not check the statement after "and" open(…) or die If can open, the statement is already true, no need to check the statement after "or" If cannot open, the statement is false, need to check the statement after "or" to see if it can be true

@fred = (1,2,3); @barney = @fred; @huh = 1; @fred = qw(one two); @barney = (4,5,@fred,6,7); @barney = (8,@barney); ($a,$b,$c) = (1,2,3); @fred = (@barney = (2,3,4)); @fred = @barney = (2,3,4); @fred = (1,2,3); $fred[3] = "hi"; $fred[6] = "ho"; # @fred is now (1,2,3,"hi",undef,undef,"ho")

Array operators push and pop (right-most element) @mylist = (1,2,3); push(@mylist,4,5,6); $oldvalue = pop(@mylist); shift and unshift (left-most element) @fred = (5,6,7); unshift(@fred,2,3,4); $x = shift(@fred); reverse: @a = (7,8,9); @b = reverse(@a); sort: @a = (7,9,9); @b = sort(@a);

sub match_positions { my($regexp, $sequence) = @_; use BeginPerlBioinfo; my @positions = ( ); while ( $sequence =~ /$regexp/ig ) { push ( @positions, pos($sequence) - length($&) + 1); } return @positions; }

use BeginPerlBioinfo; my %rebase_hash = ( ); my @file_data = ( ); my $query = ''; my $dna = ''; my $recognition_site = ''; my $regexp = ''; my @locations = ( ); @file_data = get_file_data("sample.dna"); $dna = extract_sequence_from_fasta_data(@file_data); %rebase_hash = parseREBASE('bionet'); do { print "Search for what restriction site for (or quit)?: "; $query = ; chomp $query; if ($query =~ /^\s*$/ ) { exit; } if ( exists $rebase_hash{$query} ) { ($recognition_site, $regexp) = split ( " ", $rebase_hash{$query}); @locations = match_positions($regexp, $dna); if (@locations) { print "Searching for $query $recognition_site $regexp\n"; print "Restriction site for $query at :", join(" ", @locations), "\n"; } else { print "A restriction enzyme $query is not in the DNA:\n"; } } until ( $query =~ /quit/ ); exit;

Bioinformatics 生物信息学理论和实践唐继军 13928761660.

Similar presentations

Presentation on theme: "Bioinformatics 生物信息学理论和实践唐继军 13928761660."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bioinformatics 生物信息学理论和实践 唐继军 13928761660.

Similar presentations

Presentation on theme: "Bioinformatics 生物信息学理论和实践 唐继军 13928761660."— Presentation transcript:

Similar presentations

About project

Feedback

Bioinformatics 生物信息学理论和实践唐继军 13928761660.

Presentation on theme: "Bioinformatics 生物信息学理论和实践唐继军 13928761660."— Presentation transcript: