Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China.

Slides:



Advertisements
Similar presentations
Uses of Cloned Genes sequencing reagents (eg, probes) protein production insufficient natural quantities modify/mutagenesis library screening Expression.
Advertisements

Huntington Disease An overview
The genetic code.
Center for Biological Sequence Analysis Prokaryotic gene finding Marie Skovgaard Ph.D. student
Protein Synthesis (making proteins)
 -GLOBIN MUTATIONS AND SICKLE CELL DISORDER (SCD) - RESTRICTION FRAGMENT LENGTH POLYMORPHISMS (RFLP)
ATG GAG GAA GAA GAT GAA GAG ATC TTA TCG TCT TCC GAT TGC GAC GAT TCC AGC GAT AGT TAC AAG GAT GAT TCT CAA GAT TCT GAA GGA GAA AAC GAT AAC CCT GAG TGC GAA.
Supplementary Fig.1: oligonucleotide primer sequences.
Gene Mutations Worksheet
Transcription & Translation Worksheet
Crick’s early Hypothesis Revisited. Or The Existence of a Universal Coding Frame Axel Bernal UPenn Center for Bioinformatics Jean-Louis Lassez Coastal.
1 Essential Computing for Bioinformatics Bienvenido Vélez UPR Mayaguez Lecture 5 High-level Programming with Python Part II: Container Objects Reference:
In vitro expression of BVDV capsid protein Corpus Christi College, University of Oxford Glycobiology Institute, Department of Biochemistry KOR SHU CHAN.
Today… Genome 351, 8 April 2013, Lecture 3 The information in DNA is converted to protein through an RNA intermediate (transcription) The information in.
Figure S1. Sequence alignment of yeast and horse cyt-c (Identity~60%), green highly conserved residues. There are 40 amino acid differences in the primary.
Dictionaries.
GENE MUTATIONS aka point mutations. DNA sequence ↓ mRNA sequence ↓ Polypeptide Gene mutations which affect only one gene Transcription Translation © 2010.
IGEM Arsenic Bioremediation Possibly finished biobrick for ArsR by adding a RBS and terminator. Will send for sequencing today or Monday.
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
Nature and Action of the Gene
Biological Dynamics Group Central Dogma: DNA->RNA->Protein.
1 Perl: subroutines (for sorting). 2 Good Programming Strategies for Subroutines #!/usr/bin/perl # example why globals are bad $one = ; $two = ; $max.
Introduction to Python for Biologists Lecture 2 This Lecture Stuart Brown Associate Professor NYU School of Medicine.
Math 15 Introduction to Scientific Data Analysis Lecture 10 Python Programming – Part 4 University of California, Merced Today – We have A Quiz!
More on translation. How DNA codes proteins The primary structure of each protein (the sequence of amino acids in the polypeptide chains that make up.
Undifferentiated Differentiated (4 d) Supplemental Figure S1.
Supplemental Table S1 For Site Directed Mutagenesis and cloning of constructs P9GF:5’ GAC GCT ACT TCA CTA TAG ATA GGA AGT TCA TTT C 3’ P9GR:5’ GAA ATG.
Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.
Fig. S1 siControl E2 G1: 45.7% S: 26.9% G2-M: 27.4% siER  E2 G1: 70.9% S: 9.9% G2-M: 19.2% G1: 57.1% S: 12.0% G2-M: 30.9% siRNF31 E2 A B siRNF31 siControl.
PART 1 - DNA REPLICATION PART 2 - TRANSCRIPTION AND TRANSLATION.
TRANSLATION: information transfer from RNA to protein the nucleotide sequence of the mRNA strand is translated into an amino acid sequence. This is accomplished.
Today… Genome 351, 8 April 2013, Lecture 3 The information in DNA is converted to protein through an RNA intermediate (transcription) The information in.
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
NSCI 314 LIFE IN THE COSMOS 4 - The Biochemistry of Life on Earth Dr. Karen Kolehmainen Department of Physics CSUSB
Prodigiosin Production in E. Coli Brian Hovey and Stephanie Vondrak.
Passing Genetic Notes in Class CC106 / Discussion D by John R. Finnerty.
Supplementary materials
Dictionaries. A “Good morning” dictionary English: Good morning Spanish: Buenas días Swedish: God morgon German: Guten morgen Venda: Ndi matscheloni Afrikaans:
Transcription and Translation Activity 1.You will work with the person sitting next to you. 2.One of you will take the role of RNA polymerase and transcribe.
Suppl. Figure 1 APP23 + X Terc +/- Terc +/-, APP23 + X Terc +/- G1Terc -/-, APP23 + X G1Terc -/- G2Terc -/-, APP23 + X G2Terc -/- G3Terc -/-, APP23 + and.
RA(4kb)- Atggagtccgaaatgctgcaatcgcctcttctgggcctgggggaggaagatgaggc……………………………………………….. ……………………………………………. ……………………….,……. …tactacatctccgtgtactcggtggagaagcgtgtcagatag.
Example 1 DNA Triplet mRNA Codon tRNA anticodon A U A T A U G C G
Name of presentation Month 2009 SPARQ-ed PROJECT Mutations in the tumor suppressor gene p53 Pulari Thangavelu (PhD student) April Chromosome Instability.
DNA, RNA and Protein.
Ji-Yoon Park Nanoparticle-Based Theorem Proving.
The response of amino acid frequencies to directional mutation pressure in mitochondrial genomes is related to the physical properties of the amino acids.
Introduction to perl programming: the minimum to know for practice!
Fundamentals of Protein Structure
Modelling Proteomes.
Supplementary information Table-S1 (Xiao)
Sequence – 5’ to 3’ Tm ˚C Genome Position HV68 TMER7 Δ mt. Forward
Python.
Supplemental Table 3. Oligonucleotides for qPCR
Laboratory Encounters in Plant Genomics
GENE MUTATIONS aka point mutations © 2016 Paul Billiet ODWS.
Supplementary Figure 1 – cDNA analysis reveals that three splice site alterations generate multiple RNA isoforms. (A) c.430-1G>C (IVS 6) results in 3.
Huntington Disease (HD)
DNA By: Mr. Kauffman.
DNA and RNA.
Gene architecture and sequence annotation
Schematic of the PCR assay.
PROTEIN SYNTHESIS RELAY
More on translation.
Molecular engineering of photoresponsive three-dimensional DNA
Fundamentals of Protein Structure
Python.
Station 2 Protein Synethsis.
6.096 Algorithms for Computational Biology Lecture 2 BLAST & Database Search Manolis Piotr Indyk.
Shailaja Gantla, Conny T. M. Bakker, Bishram Deocharan, Narsing R
Presentation transcript:

Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China August 17 - August 29, 2009 Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China August 17 - August 29, 2009 Fredj Tekaia Institut Pasteur

perl A basic program #!/bin/perl # Program to print a message print 'Hello world.';# Print a message

Variables, Arrays $val=9; $val=ABC transporter; case sensitive: $val is different from $Val

Perl uses arithmetic operators: $a = 1 + 2;# Add 1 and 2 and store in $a $a = 3 - 4; # Subtract 4 from 3 and store in $a $a = 5 * 6;# Multiply 5 and 6 $a = 7 / 8;# Divide 7 by 8 to give $a = 9 ** 10;# Nine to the power of 10 $a = 5 % 2;# Remainder of 5 divided by 2 $a++;# Return $a and then increment it $a--;# Return $a and then decrement it for strings perl has among others: $a = $b. $c;# Concatenate $b and $c $a = $b x $c;# $b repeated $c times Operations and Assignment

To assign values perl includes $a = $b;# Assign $b to $a $a += $b;# Add $b to $a $a -= $b;# Subtract $b from $a $a.= $b;# Append $b onto $a

Array variables An array variable is a list of scalars (ie numbers and/or strings). they are = (MG001", MG002", MG003"); $SEQNAME [2] (MG003) Attention: 0, 1, = (0,1,2,3);

@L_CODONS = ('TTT','TTC','TTA','TTG', 'CTT','CTC','CTA','CTG', 'ATT','ATC','ATA','ATG', 'GTT','GTC','GTA','GTG', 'TCT','TCC','TCA','TCG', 'CCT','CCC','CCA','CCG', 'ACT','ACC','ACA','ACG', 'GCT','GCC','GCA','GCG', 'TAT','TAC','TAA','TAG', 'CAT','CAC','CAA','CAG', 'AAT','AAC','AAA','AAG', 'GAT','GAC','GAA','GAG', 'TGT','TGC','TGA','TGG', 'CGT','CGC','CGA','CGG', 'AGT','AGC','AGA','AGG', 'GGT','GGC','GGA','GGG');

@AA = = ( 'a','r','n','d','c','q','e','g','h','i','l','k','m','f','p','s','t','w','y','v','b );

Associative arrays : hash tables Ordinary list arrays allow us to access their element by number. The first element of is $AA[0]. The second element is $AA[1], and so on. But perl also allows us to create arrays which are accessed by string. These are called associative arrays. array itself is prefixed by a % sign

%ages = (Michael", 39, "Angie", 27, "Willy", "21 years", "The Queen Mother", 108); $ages{"Michael"};# Returns 39 $ages{"Angie"};# Returns 27 $ages{"Willy"};# Returns "21 years" $ages{"The Queen Mother"};# Returns 108

File handling #!/bin/perl open(FILE,GMG.pep); while { print $_; } close (FILE); a script (cat.pl) equivalent to the UNIX cat: use: chmod a+x cat.pl ; cat.pl

split #!/bin/perl open(FILE,GMG.pep); while \s+/, $_); print $tab[0]; } close (FILE); A very useful function in perl: splits up a string and places it into an array.

#!/bin/perl open(FILE,GMG.pep); while \s+/, $_, 2); $NOM{$tab[0]} = $tab[1]; print $NOM{$tab[0]} ; } close

Control structures foreach To go through each line of an array or other list-like structure (such as lines in a file) perl uses the foreach structure. This has the form foreach $nom Visit each item in turn # and call it $nom { print "$nom\n";# Print the item }

foreach $j ( 0.. 2)# Visit each value in turn # and call it $j { print "$SEQNAM [$j] \n";# Print the item } foreach $j ( 0.. $#AA)# Visit each value in turn # and call it $j { print "$AA [$j] \n";# Print the item }

Testing Here are some tests on numbers and strings. $a == $b# Is $a numerically equal to $b? #Beware: Don't use the = operator. $a != $b# Is $a numerically unequal to $b? $a eq $b# Is $a string-equal to $b? $a ne $b# Is $a string-unequal to $b? You can also use logical and, or and not: ($a && $b)# Is $a and $b true? ($a || $b)# Is either $a or $b true? !($a)# is $a false?

for for (initialise; test; inc) { first_action; second_action; etc.... } for ($i = 0; $i < 10; ++$i)# Start with $i = 1 # Do it while $i < 10 # Increment $i before repeating { print "$i\n"; }

Conditionals if ($a) { print "The string is not empty\n"; } else { print "The string is empty\n"; } #!/bin/perl open(FILE,GMG.pep); while { print $_ if ( m/>/ ); } close (FILE);

String matching $a eq $b# Is $a string-equal to $b? $a ne $b# Is $a string-unequal to $b? Here are some special RE characters and their meaning.# Any single character except a newline ^# The beginning of the line or string $# The end of the line or string *# Zero or more of the last character +# One or more of the last character ?# Zero or one of the last character

\n# A newline \t# A tab \w# Any alphanumeric (word) character. # The same as [a-zA-Z0-9_] \W# Any non-word character. # The same as [^a-zA-Z0-9_] \d# Any digit. The same as [0-9] \D# Any non-digit. The same as [^0-9] \s# Any whitespace character: space, # tab, newline, etc \S# Any non-whitespace character \b# A word boundary, outside [] only \B# No word boundary Some more special characters

Characters like $, |, [, ), \, / and so on are peculiar cases in regular expressions. If you want to match for one of those then you have to preceed it by a backslash (\). So: \|# Vertical bar \[# An open square bracket \)# A closing parenthesis \*# An asterisk \^# A carat symbol \/# A slash \\# A backslash

Substitution and translation s/london/London/ $sentence =~ s/london/London/ global substitution; i option (for "ignore case"). s/london/London/gi Translation $sentence =~ tr/abc/edf/ tr/a-z/A-Z/; #converts $_ to upper case tr/A-Z/a-z/; #converts $_ to lower case

-given a nucleotide sequence: base composition -given a protein sequence: amino-acid composition; -given a nucleic databse (in fasta format): base composition -given a protein database (in fasta format): amino-acid composition Simple scripts

-sequence size (base or amino-acids) -extract a portion of a sequence: (pos start; pos end) -extract a sequence by name (from a database of sequences) -gene sequence: codon count; given allxxseqnew file: -script to compute frequencies of multiple matches; see splitfasta.pl; splitdnafasta.pl

given allxxseqnew file: -script to compute frequencies of multiple matches; Exercices de manipulation des données : - home-directory, mkdir, cd, pathway, pwd, find ; - notation : DB.pep, DB.dna, seq.dna, seq.prt ; - utiliser « tab » comme séparateur ; - utilisation de sed et de grep ; - le format fasta des séquences ; - compter le nombre des séquences dans une base de séquences au format fasta ; (grep « > » DB.pep wc –l ) - changer un caractère par un autre : -extraire les séquences dune base (fichier au format fasta) (splitfasta.pl, splitdnafasta.pl); -extraire 1 partie dune séquence (la séquence est au format fasta); -fréquence des aa dune séquence protéique ; -fréquence des bases dune séquence nucléotidique ; -taille dune séquence ; -tailles des séquence dune base ; -fréquence des codons dune séquence codante ; -Codons volatilité :. correspondance codons/amino-acids ;