Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Slides:



Advertisements
Similar presentations
Computational Biology, Part 8 Protein Coding Regions Robert F. Murphy Copyright  All rights reserved.
Advertisements

Computer Programming for Biologists Class 9 Dec 4 th, 2014 Karsten Hokamp
Computer lab exercises #8. Comments on projects worth sharing: 1.Use BLINK whenever possible. It can save a lot of waiting and greatly accelerates explorations.
Programming and Perl for Bioinformatics Part III.
Where to get Perl ? / Unix/Linux/Solaris Unix/Linux/Solaris Interpreter (called “perl”) comes pre-installed usually Interpreter.
Hashes a “hash” is another fundamental data structure, like scalars and arrays. Hashes are sometimes called “associative arrays”. Basically, a hash associates.
Transmission of Information
7ex.1 Hashes. 7ex.2 Let's say we want to create a phone book... Enter a name that will be added to the phone book: Eyal Enter a phone number:
9.1 Hash revision. 9.2 Variable types in PERL ScalarArrayHash $number $string %hash => $array[0] $hash{key}
8.1 References and complex data structures. 8.2 An associative array (or simply – a hash) is an unordered set of key=>value pairs. Each key is associated.
8.1 Hashes (associative arrays). 8.2 Let's say we want to create a phone book... Enter a name that will be added to the phone book: Dudi Enter a phone.
10.1 Variable types in PERL ScalarArrayHash $number $string %hash => $array[0] $hash{key}
Lecture 4 BNFO 235 Usman Roshan. IUPAC Nucleic Acid symbols.
Lecture 2 BNFO 135 Usman Roshan. Perl variables Scalar –Number –String Examples –$myname = “Roshan”; –$year = 2006;
9.1 Hashes. 9.2 Let's say we want to create a phone book... Enter a name that will be added to the phone book: Ofir Enter a phone number: Enter.
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
Lecture 8: Basic concepts of subroutines. Functions In perl functions take the following format: – sub subname – { my $var1 = $_[0]; statements Return.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Introduction to Perl Part III By: Cedric Notredame Adapted from (BT McInnes)
Python programs How can I run a program? Input and output.
T HE D NA CODE The Key to Protein Synthesis. The Question of DNA DNA stores information to build proteins in sequences of nucleotides - DNA nucleotides.
PERL Variables and data structures Andrew Emerson, High Performance Systems, CINECA.
Perl IV Part V: Hashing Out the Genetic Code, Bioperl.
Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp
LING/C SC/PSYC 438/538 Lecture 4 Sandiway Fong. Continuing with Perl Homework 3: first Perl homework – due Sunday by midnight – one PDF file, by .
Protein Synthesis Using RNA to make proteins. Going from DNA to Proteins Let’s review what we’ve done so far: We take our DNA and convert it into RNA.
Scripting Languages Diana Trandab ă ț Master in Computational Linguistics - 1 st year
ECMM6018 Enterprise Networking For Electronic Commerce Tutorial 5 Server Side Scripting Perl.
4 1 Array and Hash Variables CGI/Perl Programming By Diane Zak.
Word of the Day Mutation Quiz RNA notes Video. RNA Differences from DNA: One-stranded Instead of Thymine, RNA has Uracil Three types of RNA: mRNA –carries.
Prof. Alfred J Bird, Ph.D., NBCT -bird.wikispaces.umb.edu/ Office – McCormick 3rd floor.
Bioinformatics 生物信息学理论和实践 唐继军
The following table represents the number of bacteria, algae, viruses, and fungi in a pond at different temperatures. All values represent the number of.
An Intro to Perl, Pt. 2 Hashes, Foreach Control, and the Split Function.
Computer Programming for Biologists Class 3 Nov 13 th, 2014 Karsten Hokamp
Dictionaries.   Review on for loops – nested for loops  Dictionaries (p.79 Learning Python)  Sys Module for system arguments  Reverse complementing.
Computer Programming for Biologists Class 6 Nov 21 th, 2014 Karsten Hokamp
Perl Chapter 5 Hashes. Outside of world of Perl, know as associative arrays Also called hash tables Perl one of few languages that has hashes built-in.
CPTG286K Programming - Perl Chapter 4: Control Structures.
Digital Text and Data Processing Tokenisation. Today’s class □ Tokenisation and creation of frequency lists □ Keyword in context lists □ Moretti and distant.
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
More Perl Data Types Scalar: it may be a number, a character string, or a reference to another data type. -the sigil $ is used to denote a scalar(or reference)
8.1 Common Errors – Exercise #3 Assuming something on the variable part of the input file. When parsing a format file (genebank, fasta or any other format),
 In computer programming, a loop is a sequence of instruction s that is continually repeated until a certain condition is reached.  PHP Loops :  In.
GE3M25: Computer Programming for Biologists Python, Class 5
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 4 Karsten Hokamp, PhD Genetics TCD, 01/12/2015.
The Genetic Code. The DNA that makes up the human genome can be subdivided into information bytes called genes. Each gene encodes a unique protein that.
Perl Scripting III Arrays and Hashes (Also known as Data Structures) Ed Lee & Suzi Lewis Genome Informatics.
Point Mutations Silent Missense Nonsense Frameshift.
Transcription and Translation Activity 1.You will work with the person sitting next to you. 2.One of you will take the role of RNA polymerase and transcribe.
Computer Programming for Biologists Class 4 Nov 14 th, 2014 Karsten Hokamp
Review: can we distinguish foreign from native genes? GC-content = [G] + [C] [total nucleotides] SQ2: List the two triplets that code for Lys. What proportion.
Protein Synthesis. The genetic code This is the sequence of bases along the DNA molecule Read in 3 letter words (Triplet) Each triplet codes for a different.
UUU... UUG GAC AUA AAU GGG F O L D I N G ACC AGG GCU AAU AGU UUA GCG ACU AUC... AAC T R A N S L A T I O N
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
The DNA Code Chromosomes are made of DNA. Each chromosome contains thousands of genes. The sequence of bases in a gene forms a code that tells the cell.
PROTEIN SYNTHESIS DNA vs. RNA DNA Made of deoxyribose sugar A - C - G - Thymine Located in nucleus Stores genetic info Double Stranded RNA Made of ribose.
PERL SCRIPTING. COMPUTER BASICS CPU, RAM, Hard drive CPU can only use data in the register directly CPU RAM HARD DRIVE.
RNA Transcription, Translation and Protein Synthesis.
Introduction to Bioinformatic Computation. Lecture #
Quiz#6 LC710 10/13/10 name___________
Introduction to Perl Jarrad Battaglia.
Perl Variables: Hash Web Programming.
DNA Structure and Replication.
Quiz#6 LC710 10/13/10 name___________
Introduction to Bioinformatic Computation. Lecture #
Patterns of amino acid usage and its GC-content of synonymous codons in 65 nuclear genomes in this study. Patterns of amino acid usage and its GC-content.
Protein Synthesis.
Section 13.2 Protein Synthesis.
Presentation transcript:

Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Hash Variables  associative arrays  list of key/value pairs  values and keys  scalars  access values by key names  Great for look-ups! Description

Hash Variables Look-up Table Look-up table in real life for translation: AAAK AACN AAGK AAUN … … UUGL UUUF Genetic code In Perl use hash variable: %genetic_code = ( 'AAA' => 'K', 'AAC' => 'N', 'AAG' => 'K', 'AAU' => 'N', … 'UUG' => 'L', 'UUU' => 'F' ); Keys are unique!

Hash Variables  %bases = ('a', 'purine', 'c', 'pyrimidine', 'g', 'purine', 't', 'pyrimidine');  %complement = ('a' =>'t', 'c' => 'g', 'g' => 'c', 't' => 'a');  %letters = (1, 'a', 2, 'b', 3, 'c', 4, 'd'); Examples Hashes: Lists with special relationship between each pair of elements!

Hash Variables Storing Data # count frequency of nucleotides: my $As = 0; my $Cs = 0; my $Gs = 0; my $Ts = 0; foreach my $nuc (split //, $dna) { if ($nuc eq 'A') { $As++; } elsif ($nuc eq 'C') { $Cs++; } elsif ($nuc eq 'G') { $Gs++; } elsif ($nuc eq 'T') { $Ts++; }

Hash Variables Storing Data # count frequency of nucleotides: my %freq = (); foreach my $nuc (split //, $dna) { $freq{$nuc}++; }

Hash Variables Storing Data # count frequency of nucleotides: my %freq = (); foreach my $nuc (split //, 'ACTTGGGT') { $freq{$nuc}++; } keyvalue A1 C1 G3 T3 keys are stored in no specific order auto-initialisation with '' or 0

Hash Variables Scalar vs Hash $As = 0; As 0 $Cs = 0; Cs 0 $Gs = 0; Gs 0 $Ts = 0; Ts 0

Hash Variables Scalar vs Hash $As = 0; $As++; As 1 $Cs = 0; $Cs++; Cs 1 $Gs = 0; $Gs++; Gs 1 $Ts = 0; $Ts++; Ts 1

Hash Variables Scalar vs Hash $As = 0; $As++; As 1 $Cs = 0; $Cs++; Cs 1 $Gs = 0; $Gs++; Gs 1 $Ts = 0; $Ts++; Ts 1 Cs As Gs Ts 1 %freq = (); $freq{'Gs'}++; freq

Computer Programming for Biologists Practical: Exercises

Hash Variables Accessing Elements General: $value = $hash{$key}; Special funtions: keys and values # get complement of a base my $new_base = $complement{$base}; # get aminoacid for a codon my $aa = $genetic_code{$codon}; # list all the aa's that occurred foreach my $aa (keys %list) { print "$aa was found!\n"; } loop through all keys

Hash Variables $freq = $freq{'Gs'}; print "Gs: $freq\n";  Gs: 3 Retrieving a key/value pair Cs As Gs Ts 3 %freq

Hash Variables $nuc = 'Gs'; print "$nuc: $freq{$nuc}\n";  Gs: 3 Retrieving a key/value pair Cs As Gs Ts 3 %freq

Hash Variables foreach my $nuc (keys %freq) { print "$nuc: $freq{$nuc}\n"; }  Cs: 1 Ts: 3 Gs: 3 As: 1 Retrieving a key/value pair Cs As Gs Ts 3 %freq

Hash Variables foreach my $nuc (sort keys %freq) { print "$nuc: $freq{$nuc}\n"; }  As: 1 Cs: 1 Gs: 3 Ts: 3 Retrieving a key/value pair Cs As Gs Ts 3 %freq

Hash Variables Checking for keys/values # does the key exist? if (exists $hash{$key}) { } # does the key have a defined value? if (defined $hash{$key}) { } # does the key have a value if ($hash{$key}) { }

Computer Programming for Biologists Use hashes in your sequence analysis tool for: -reporting frequencies of nucleotides or amino acids - reporting the GC content Exercises