Bioinformatics 生物信息学理论和实践 唐继军 13928761660.

Slides:



Advertisements
Similar presentations
Translation (The Specifics) Audra Brown Ward Marist School Atlanta, Georgia
Advertisements

Programming Perls* Objective: To introduce students to the perl language. –Perl is a language for getting your job done. –Making Easy Things Easy & Hard.
Warm Up: (11_5) ATGCGTCGT What is the complementary DNA strand? Based on this complementary strand what would the mRNA strand be?
Programming and Perl for Bioinformatics Part III.
6/23/2015 JOIN2004 Universidade do Minho Why Life is Beautiful James Tisdall.
for($i=0; $i/)
Physical Mapping II + Perl CIS 667 March 2, 2004.
Lecture 8: Basic concepts of subroutines. Functions In perl functions take the following format: – sub subname – { my $var1 = $_[0]; statements Return.
Protein Synthesis. DNA RNA Proteins (Transcription) (Translation) DNA (genetic information stored in genes) RNA (working copies of genes) Proteins (functional.
PERL Variables and data structures Andrew Emerson, High Performance Systems, CINECA.
Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp
Human Genetic Variation Basic terminology. What is a gene? A gene is a functional and physical unit of heredity passed from parent to offspring. Genes.
Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp
Subroutines and Files Bioinformatics Ellen Walker Hiram College.
PROTEIN SYNTHESIS NOTES #1. Review What is transcription? Copying of DNA onto mRNA Where does transcription occur? In the Nucleus When copying DNA onto.
BINF 634 FALL 2013 LECTURE 8 Modules and Maps1 Thanks to John Grefenstette for Many of These Slides !! Topics Midterm Discussions Program 2 Discussions.
PART 1 - DNA REPLICATION PART 2 - TRANSCRIPTION AND TRANSLATION.
BINF634 FALL15 - LECTURE 41 Topics Logical expression string functions: substr and index random numbers and mutation hashes Transcription, translation,
WSSP Chapter 8 BLASTX Translated DNA vs Protein searches atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag.
Learning Targets “I Can...” -State how many nucleotides make up a codon. -Use a codon chart to find the corresponding amino acid.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Aim: How does DNA direct the production of proteins in the cell?
Bioinformatics 生物信息学理论和实践 唐继军 北京林业大学计算生物学中心
Bioinformatics 生物信息学理论和实践 唐继军
Bioinformatics 生物信息学理论和实践 唐继军
Place your keyboard aside. Only use the mouse.
Computer Programming for Biologists Class 3 Nov 13 th, 2014 Karsten Hokamp
The Purpose of DNA To make PROTEINS! Proteins give us our traits (ex: one protein gives a person blue eyes, another gives brown Central Dogma of Molecular.
Place your keyboard aside. Only use the mouse.
Online – animated web site 5Storyboard.htm.
DNA Pretest! Yes, I know I am a little late… Take out a separate sheet of paper Name Date Period DNA Pretest.
Perl Tutorial. Why PERL ??? Practical extraction and report language Similar to shell script but lot easier and more powerful Easy availablity All details.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
Body System Project Animal Nutrition Chapter 41 Kristy Blake and Krystal Brostek.
Your Home Directory When you login to the server, you always start in your Home directory. Create sub-directories to store specific projects or groups.
Perl Scripting III Arrays and Hashes (Also known as Data Structures) Ed Lee & Suzi Lewis Genome Informatics.
DANDY Deoxyribonucleic Acid ALL CELLS HAVE DNA… Cells are the basic unit of structure and function of all living things. –Prokaryotes (bacteria) –Eukaryotes.
BINF 634 Fall LECTURE061 Outline Lab 1 (Quiz 3) Solution Program 2 Scoping Algorithm efficiency Sorting Hashes Review for midterm Quiz 4 Outline.
2000 Copyrights, Danielle S. Lahmani Foreach example = ( 3, 5, 7, 9) foreach $one ) { $one*=3; } is now (9,15,21,27)
Replication, Transcription, Translation PRACTICE.
Bioinformatics 生物信息学理论和实践 唐继军
DNA, RNA and Protein.
Biochemistry Free For All
Protein Folding Notes.
Protein Synthesis: Translation
Protein Folding.
Translation Tutorial Place your keyboard aside. Only use the mouse.
BIOLOGY 12 Protein Synthesis.
RNA Ribonucleic Acid.
Do now activity #2 Name all the DNA base pairs.
DNA By: Mr. Kauffman.
UNIT 3: Genetics-DNA vs. RNA
Warm Up.
Section 3-4: Translation
Translation Tutorial Place your keyboard aside. Only use the mouse.
Translation Tutorial Place your keyboard aside. Only use the mouse.
20.2 Gene Expression & Protein Synthesis
How is the genetic code contained in DNA used to make proteins?
Transcription and Translation
Transcription and Translation
Translation Tutorial Place your keyboard aside. Only use the mouse.
Python.
Do now activity #6 What is the definition of: RNA?
Translation.
Replication, Transcription, Translation PRACTICE
Do now activity #5 How many strands are there in DNA?
Aim: How does DNA direct the production of proteins in the cell?
Replication, Transcription, Translation PRACTICE
Replication, Transcription, Translation PRACTICE
Presentation transcript:

Bioinformatics 生物信息学理论和实践 唐继军

Exercise 1 Ask for a protein file in fasta format Ask for an amino acid Count the frequency of that amino acid TKFHSNAHFYDCWRMLQYQLDMRCMRAISTF SPHCGMEHMPDQTHNQGEMCKPRMWQVS MNQSCNHTPPFRKTYVEWDYMAKALIAPYTL GWLASTCFIW

Exercise 2 Ask for an RNA file in fasta format Convert it to RNA Ask for a codon Count the frequency of that codon TCGTACTTAGAAATGAGGGTCCGCTTTTGCCC ACGCACCTGATCGCTCCTCGTTTGCTTTTAAG AACCGGACGAACCACAGAGCATAAGGAGAA CCTCTAGCTGCTTTACAAAGTACTGGTTCCCT TTCCAGCGGGATGCTTTATCTAAACGCAATGA GAGAGGTATTCCTCAGGCCACATCGCTTCCTA GTTCCGCTGGGATCCATCGTTGGCGGCCGAA GCCGCCATTCCATAGTGAGTTCTTCGTCTGTG TCATTCTGTGCCAGATCGTCTGGCAAATAGCC GATCCAGTTTATCTCTCGAAACTATAGTCGTA CAGATCGAAATCTTAAGTCAAATCACGCGACT AGACTCAGCTCTATTTTAGTGGTCATGGGTTT TGGTCCCCCCGAGCGGTGCAACCGATTAGGA CCATGTAGAACATTAGTTATAAGTCTTCTTTTA AACACAATCTTCCTGCTCAGTGGTACATGGTT ATCGTTATTGCTAGCCAGCCTGATAAGTAACA CCACCACTGCGACCCTAATGCGCCCTTTCCAC GAACACAGGGCTGTCCGATCCTATATTACGA CTCCGGGAAGGGGTTCGCAAGTCGCACCCTA AACGATGTTGAAGGCTCAGGATGTACACGCA CTAGTACAATACATACGTGTTCCGGCTCTTAT CCTGCATCGGAAGCTCAATCATGCATCGCACC AGCGTGTTCGTGTCATCTAGGAGGGGCGCGT AGGATAAATAATTCAATTAAGATATCGTTATG CTAGTATACGCCTACCCGTCACCGGCCAACAG TGTGCAGATGGCGCCACGAGTTACTGGCCCT GATTTCTCCGCTTCTAATACCGCACACTGGGC AATACGAGCTCAAGCCAGTCTCGCAGTAACG CTCATCAGCTAACGAAAGAGTTAGAGGCTCG CTAAATCGCACTGTCGGGGTCCCTTGGGTATT TTACACTAGCGTCAGGTAGGCTAGCATGTGT CTTTCCTTCCAGGGGTATG

Subroutine Some code needs to be reused A good way to organize code Called "function" in some languages Name Return Parameters

sub codon2aa { my($codon) if ( $codon =~ /GC./i) { return 'A' } Alanine elsif ( $codon =~ /TG[TC]/i) { return 'C' } Cysteine elsif ( $codon =~ /GA[TC]/i) { return 'D' } Aspartic Acid elsif ( $codon =~ /GA[AG]/i) { return 'E' } Glutamic Acid elsif ( $codon =~ /TT[TC]/i) { return 'F' } Phenylalanine elsif ( $codon =~ /GG./i) { return 'G' } Glycine elsif ( $codon =~ /CA[TC]/i) { return 'H' } Histidine elsif ( $codon =~ /AT[TCA]/i) { return 'I' } Isoleucine elsif ( $codon =~ /AA[AG]/i) { return 'K' } Lysine elsif ( $codon =~ /TT[AG]|CT./i) { return 'L' } Leucine elsif ( $codon =~ /ATG/i) { return 'M' } Methionine elsif ( $codon =~ /AA[TC]/i) { return 'N' } Asparagine elsif ( $codon =~ /CC./i) { return 'P' } Proline elsif ( $codon =~ /CA[AG]/i) { return 'Q' } Glutamine elsif ( $codon =~ /CG.|AG[AG]/i) { return 'R' } Arginine elsif ( $codon =~ /TC.|AG[TC]/i) { return 'S' } Serine elsif ( $codon =~ /AC./i) { return 'T' } Threonine elsif ( $codon =~ /GT./i) { return 'V' } Valine elsif ( $codon =~ /TGG/i) { return 'W' } Tryptophan elsif ( $codon =~ /TA[TC]/i) { return 'Y' } Tyrosine elsif ( $codon =~ /TA[AG]|TGA/i) { return '_' } Stop else {print STDERR "Bad codon \"$codon\"!!\n"; exit; } }

!/usr/bin/perl –w print "Please type the filename: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = = ;close DNAFILE; $DNA = join( =~ s/\s//g; print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n"; sub dna2peptide { my ($dna) my $protein = ""; for(my $i=0; $i < (length($dna) - 2) ; $i += 3) { $codon = substr($dna,$i,3); $protein.= codon2aa($codon); } return $protein; } sub codon2aa {... }

Modules A Perl Module is a self-contained pieceof [Perl] code that can be used by a Perl program later Like a library End with extension.pm Needs a 1 at the end

Bio.pm sub codon2aa {.... } sub dna2peptide {.... } 1

!/usr/bin/perl -w use Bio; print "Please type the filename: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = = ;close DNAFILE; $DNA = join( =~ s/\s//g; print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

Bio.pm sub codon2aa {.... } sub dna2peptide {.... } sub fasta_read { print "Please type the filename: "; my $dna_filename = ; chomp $dna_filename; unless (open(DNAFILE, $dna_filename)) { print "Cannot open file ", $dna_filename, "\n"; } $name = = ;close DNAFILE; $DNA = join( =~ s/\s//g; return $DNA; } 1

!/usr/bin/perl -w use Bio; $DNA = fasta_read(); print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

Scope my provides lexical scoping; a variable declared with my is visible only within the block in which it is declared. Blocks of code are hunks within curly braces {}; files are blocks. Use use vars qw([list of var names]) or our ([var_names]) to create package globals.

!/usr/bin/perl -w use Bio; use strict; use warnings; $DNA = fasta_read(); print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

Variable "$DNA" is not imported at frame2.pl line 6. Variable "$DNA" is not imported at frame2.pl line 8. Variable "$DNA" is not imported at frame2.pl line 9. Variable "$DNA" is not imported at frame2.pl line 10. Variable "$DNA" is not imported at frame2.pl line 12. Variable "$DNA" is not imported at frame2.pl line 13. Variable "$DNA" is not imported at frame2.pl line 14. Variable "$DNA" is not imported at frame2.pl line 15. Global symbol "$DNA" requires explicit package name at frame2.pl line 6. Global symbol "$DNA" requires explicit package name at frame2.pl line 8. Global symbol "$DNA" requires explicit package name at frame2.pl line 9. Global symbol "$DNA" requires explicit package name at frame2.pl line 10. Global symbol "$DNA" requires explicit package name at frame2.pl line 12. Global symbol "$DNA" requires explicit package name at frame2.pl line 13. Global symbol "$DNA" requires explicit package name at frame2.pl line 14. Global symbol "$DNA" requires explicit package name at frame2.pl line 15. Execution of frame2.pl aborted due to compilation errors.

!/usr/bin/perl -w use Bio; use strict; use warnings; my $DNA = fasta_read(); print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

my $x = 10; for (my $x = 0; $x < 5; $x++) { Scope(); print $x, "\n"; } print $x, "\n"; sub Scope { my $x = 0; }

sub get_file_data { my($filename) use strict; use warnings; # Initialize variables = ( ); unless( open(GET_FILE_DATA, $filename) ) { print STDERR "Cannot open file \"$filename\"\n\n"; exit; = ; close GET_FILE_DATA; }

sub extract_sequence_from_fasta_data { my $sequence = ''; foreach my $line { if ($line =~ /^\s*$/) { next; } elsif($line =~ /^\s*#/) { next; } elsif($line =~ /^>/) { next; } else { $sequence.= $line; } # remove non-sequence data (in this case, whitespace) from $sequence string $sequence =~ s/\s//g; return $sequence; }

Molecular Scissors Molecular Cell Biology, 4 th edition

Discovering Restriction Enzymes HindII - first restriction enzyme – was discovered accidentally in 1970 while studying how the bacterium Haemophilus influenzae takes up DNA from the virus Recognizes and cuts DNA at sequences: GTGCAC GTTAAC

Recognition Sites of Restriction Enzymes Molecular Cell Biology, 4 th edition

Uses of Restriction Enzymes Recombinant DNA technology Cloning cDNA/genomic library construction DNA mapping

Restriction Enzyme Database

R = G or A Y = C or T M = A or C K = G or T S = G or C W = A or T B = not A (C or G or T) D = not C (A or G or T) H = not G (A or C or T) V = not T (A or C or G) N = A or C or G or T

sub IUB_to_regexp { my($iub) my $regular_expression = ‘’; my %iub2character_class = ( A => 'A', C => 'C', G => 'G', T => 'T', R => '[GA]', Y => '[CT]', M => '[AC]', K => '[GT]', S => '[GC]', W => '[AT]', B => '[CGT]', D => '[AGT]', H => '[ACT]', V => '[ACG]', N => '[ACGT]', ); $iub =~ s/\^//g; for ( my $i = 0 ; $i < length($iub) ; ++$i ) { $regular_expression.= $iub2character_class{substr($iub, $i, 1)}; } return $regular_expression; }

Hash Initialize: my %hash = (); Add key/value pair: $hash{$key} = $value; Add more keys: %hash = ( 'key1', 'value1', 'key2', 'value2 ); %hash = ( key1 => 'value1', key2 => 'value2', ); Delete: delete $hash{$key};

while ( my ($key, $value) = each(%hash) ) { print "$key => $value\n"; } for my $key ( keys %hash ) { my $value = $hash{$key}; print "$key => $value\n"; }

sub codon2aa { my($codon) $codon = uc $codon; my %genetic_code = ( 'TCA' => 'S', # Serine 'TCC' => 'S', # Serine 'TCG' => 'S', # Serine 'TCT' => 'S', # Serine 'TTC' => 'F', # Phenylalanine 'TTT' => 'F', # Phenylalanine 'TTA' => 'L', # Leucine 'TTG' => 'L', # Leucine #Many more ); if(exists $genetic_code{$codon}) { return $genetic_code{$codon}; }else{ print STDERR "Bad codon \"$codon\"!!\n"; exit; }

sub parseREBASE { my($rebasefile) = ( ); my %rebase_hash = ( ); my $name; my $site; my $regexp; open($rebase_filehandle, $rebasefile) or die "Cannot open file\n"; while( ) { # Discard header lines ( 1.. /Rich Roberts/ ) and next; # Discard blank lines /^\s*$/ and next; # Split the two (or three if includes parenthesized name) fields = split( " ", $_); $name = $site = # Translate the recognition sites to regular expressions $regexp = IUB_to_regexp($site); # Store the data into the hash $rebase_hash{$name} = "$site $regexp"; } # Return the hash containing the reformatted REBASE data return %rebase_hash; }

Range ( 1.. /Rich Roberts/ ) and next from first line till some line containing Rich Roberts If that is true, it will check the statement after "and" If that is not true, it will not check the statement after "and" open(…) or die If can open, the statement is already true, no need to check the statement after "or" If cannot open, the statement is false, need to check the statement after "or" to see if it can be true

@fred = = = = qw(one = ($a,$b,$c) = = = = = (1,2,3); $fred[3] = "hi"; $fred[6] = "ho"; is now (1,2,3,"hi",undef,undef,"ho")

Array operators push and pop (right-most = (1,2,3); $oldvalue = shift and unshift (left-most = (5,6,7); $x = = = = =

sub match_positions { my($regexp, $sequence) use BeginPerlBioinfo; = ( ); while ( $sequence =~ /$regexp/ig ) { push pos($sequence) - length($&) + 1); } }

use BeginPerlBioinfo; my %rebase_hash = ( ); = ( ); my $query = ''; my $dna = ''; my $recognition_site = ''; my $regexp = ''; = ( = get_file_data("sample.dna"); $dna = %rebase_hash = parseREBASE('bionet'); do { print "Search for what restriction site for (or quit)?: "; $query = ; chomp $query; if ($query =~ /^\s*$/ ) { exit; } if ( exists $rebase_hash{$query} ) { ($recognition_site, $regexp) = split ( " ", = match_positions($regexp, $dna); if { print "Searching for $query $recognition_site $regexp\n"; print "Restriction site for $query at :", join(" "\n"; } else { print "A restriction enzyme $query is not in the DNA:\n"; } } until ( $query =~ /quit/ ); exit;