Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 10.10 The Parsimony criterion GKN 13.10 Stochastic Models of.

Slides:



Advertisements
Similar presentations
Chapter 1 overview 1.1DNA is the molecule of heredity “discovered” by Miesher (1869)
Advertisements

Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers.
Bioinformatics What is bioinformatics? Why bioinformatics? The major molecular biology facts Brief history of bioinformatics Typical problems of bioinformatics:
Lecture 2 Molecular Biology Primer Saurabh Sinha.
Molecular Evolution Revised 29/12/06
RNA and Protein Synthesis
How does the cell manufacture these magnificent machines? Proteins, that is…
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Lectures on Computational Biology HC Lee Computational Biology Lab Center for Complex Systems & Biophysics National Central University EFSS II National.
Schedule Day 1: Molecular Evolution Introduction Lecture: Models of Sequence Evolution Practical: Phylogenies Chose Project and collect literature Read.
Exciting Developments in Molecular Biology As seen by an amateur.
DNA and RNA. I. DNA Structure Double Helix In the early 1950s, American James Watson and Britain Francis Crick determined that DNA is in the shape of.
I. P(0) = I Q and P(t) What is the probability of going from i (C?) to j (G?) in time t with rate matrix Q? vi. QE=0 E ij =1 (all i,j) vii. PE=E viii.
Introduction to Molecular Biology. G-C and A-T pairing.
Chapter 17 AP Biology From Gene to Protein.
The Human Genome (Harding & Sanger) * *20  globin (chromosome 11) 6*10 4 bp 3*10 9 bp *10 3 Exon 2 Exon 1 Exon 3 5’ flanking 3’ flanking 3*10 3.
Comparative Genomics & Annotation The Foundation of Comparative Genomics The main methodological tasks of CG Annotation: Protein Gene Finding RNA Structure.
Central Dogma First described by Francis Crick
Comparative Genomics Ana C. Marques
Comparative Genomics & Annotation The Foundation of Comparative Genomics The main methodological tasks of CG Annotation: Protein Gene Finding RNA Structure.
Molecular Biology Primer for CS and engineering students Alan Qi January, 2010.
Manifestations of a Code Genes, genomes, bioinformatics and cyberspace – and the promise they hold for biology education.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
Molecular Biology Primer for CS and engineering students Alan Qi Jan. 10, 2008.
More on translation. How DNA codes proteins The primary structure of each protein (the sequence of amino acids in the polypeptide chains that make up.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
Molecular Genetics Section 1: DNA: The Genetic Material
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Chapter 8 Microbial Genetics part A. Life in term of Biology –Growth of organisms Metabolism is the sum of all chemical reactions that occur in living.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
From Genomes to Genes Rui Alves.
Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
Hidden Markov Models in Bioinformatics
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Schedule Day 1: Molecular Evolution Lecture: Models of Sequence Evolution and Statistical Alignment Practical: Molecular Evolution.
Lesson Four Structure of a Gene. Gene Structure What is a gene? Gene: a unit of DNA on a chromosome that codes for a protein(s) –Exons –Introns –Promoter.
DNA and RNA II Sapling Chapter 6 short version You are responsible for textbook material covered by the worksheets. CP Biology Paul VI Catholic High School.
CS273a A Zero-Knowledge Based Introduction to Biology Courtesy of George Asimenos.
1 Mona Singh What is computational biology?. 2 Mona Singh Genome The entire hereditary information content of an organism.
The Central Dogma of Molecular Biology DNA  RNA  Protein  Trait.
Introduction to molecular biology Data Mining Techniques.
Bioinformatics Overview
Exploring Molecular Evolution
Lesson Four Structure of a Gene.
Molecular Biology Molecular biology is the study of DNA its structure
Lesson Four Structure of a Gene.
Which of the following would be the corresponding amino acid sequence that would be translated as a protein product of the following segment of DNA? A.
Stochastic Context-Free Grammars for Modeling RNA
Molecular Biology DNA Expression
Algorithms for Biological Sequence Analysis
Goals of Phylogenetic Analysis
Molecular basis of evolution.
Schedule The Parsimony criterion GKN 13.10
Stochastic Context-Free Grammars for Modeling RNA
DNA to protein DNA, transcription, translation
From Gene to Protein.
Hidden Markov Models in Bioinformatics min
Exploring Molecular Evolution
CISC 667 Intro to Bioinformatics (Spring 2007) Review session for Mid-Term CISC667, S07, Lec14, Liao.
The gene: structure, function and location
Presentation transcript:

Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) The Parsimony criterion GKN Stochastic Models of Sequence Evolution GKN The Likelihood criterion GKN Tut: =12 (Friday) Trees in phylogenetics and population genetics GKN Estimating phylogenies and genealogies I GKN Tut: (Friday) Estimating phylogenies and genealogies II GKN Estimating phylogenies and genealogies III 3.11 Tut: (Friday) Alignment Algorithms I (Optimisation) (JH) 7.11 Alignment Algorithms II (Statistical Inference) (JH) Tut: (Friday) Finding Signals in Sequences (JH) Stochastic Grammars and their Biological Applications: Hidden Markov Models (JH) Tut: (Friday) Stochastic Grammars and their Biological Applications: Context Free Grammars (JH) RNA molecules and their analysis (JH) Tut: (Friday) Open Problems in Bioinformatics and Computational Biology I (JH) Possibly: Evolving Grammars, Pedigrees from Genomes Open Problems in Bioinformatics and Computational Biology II (GKN) 1.12 Possibly: The phylogeny of language: traits and dates, What can FIV sequences tell us about their host cat population? Tut: (Friday)

Bioinformatics and Computational Biology: History & Biological Background 1838 Schwann and Schleiden Cell Theory 1859 Charles Darwin publishes Origin of Species 1865 Mendel discovers basic laws of inheritance (largely ignored) 1869 Miescher Discovers DNA 1900 Mendels laws rediscovered Avery shows DNA contains genetic information 1951 Corey & Pauling Secondary structure elements of a protein Watson & Crick proposes DNA structure and states Early History up to 1953

Proteins Proteins: a string of amino acids. Often folds up in a well defined 3 dimensional structure. Has enzymatic, structural and regulatory functions.

DNA & RNA DNA: The Information carrier in the genetic material. Usually double helix. RNA: messenger tape from DNA to protein, regulatory, enzymatic and structural roles as well. More labile than DNA

An Example: t-RNA From Paul Higgs

History up to Sanger first protein sequence – Bovine Insulin 1957 Kendrew structure of Whale Myoglobin 1958 Crick, Goldschmidt,…. Central Dogma 1958 First quantitative method for phylogeny reconstruction (UGPMA - Sokal and Michener) 1959 Operon Models proposed (Jakob and Monod) 1966 Genetic Code Determined 1967 First RNA sequencing Gene Promoter

The Central Dogma

The Genetic Code Ser Thr Glu Met Cys Leu Met Gly Gly TCA ACT GAG ATG TGT TTA ATG GGG GGA * * * * * * * ** TCG ACA GGG ATA TAT CTA ATG GGT ATA Ser Thr Gly Ile Tyr Leu Met Gly Ile Substitutions Number Percent Total in all codons Synonymous Nonsynonymous Missense Nonsense 23 4 Genetic Code: Mapping from 3- nucleotides (codons) to amino acids (20) + stop codon. This 64-->21 mapping creates the distinction silent/replacement substitution.

History Temin + Baltimore Reverse Transcriptase 1970 Needleman-Wunch algorithm for pairwise alignment Hartigan-Fitch-Sankoff algorithm for assigning nucleotides to inner nodes on a tree. 1976/79 First viral genome – MS2/  X /8 Sharp/Roberts Introns 1979 Alternative Splicing 1980 Mitochondrial Genome (16.569bp) and the discovery of alternative codes

Cartegni,L. et al.(2002) “Listening to Silence and understanding nonsense: Exonic mutations that affect splicing” Nature Reviews Genetics , HMG p A challenge to automated annotation. 2.How widespread is it? 3.Is it always functional? 4.How does it evolve? Genes, Gene Structure & Alternative Splicing Presently estimated Gene Number: , Average Gene Size: 27 kb The largest gene: Dystrophin 2.4 Mb - 0.6% coding – 16 hours to transcribe. The shortest gene: tRNA TYR 100% coding Largest exon: ApoB exon 26 is 7.6 kb Smallest: <10bp Average exon number: 9 Largest exon number: Titin 363 Smallest: 1 Largest intron: WWOX intron 8 is 800 kb Smallest: 10s of bp Largest polypeptide: Titin smallest: tens – small hormones. Intronless Genes: mitochondrial genes, many RNA genes, Interferons, Histones,..

Strings and Comparing Strings 1970 Needleman-Wunch algorithm for pairwise alignment for maximizing similarity Alignment: CTAGG I=2 v=5) g=10 i v Cost 17 TT-GT T G T T C T A G G Initial condition: D 0,0 =0. D i,j := D(s1[1:i], s2[1:j]) D i,j =min{D i-1,j-1 + d(s1[i],s2[j]), D i,j-1 + g, D i-1,j +g} 1972 Sellers-Sankoff algorithm for pairwise alignment for minimizing distance (Parsimony) Sankoff algorithm for multiple alignment for minimizing distance (Parsimony) and finding phylogeny simultaneously

History Felsenstein Proposes algorithm to calculate probability of observed nucleotides on leaves on a tree Griffiths, Hudson The Ancestral Recombination Graph. 1987/89 First biological use of Hidden Markov Model (HMM) (Lander and Green, Churchill) 1991 Thorne, Kishino and Felsenstein proposes statistical model for pairwise alignment First biological use of stochastic context free grammar (Haussler)

Genealogical Structures Homology : The existence of a common ancestor (for instance for 2 sequences) Phylogeny Pedigree: Ancestral Recombination Graph – the ARG ccagtcg ccggtcg cagtct Only finding common ancestors. Only one ancestor. i. Finding common ancestors. ii. A sequence encounters Recombinations iii. A “point” ARG is a phylogeny

Time slices Population N Time All positions have found a common ancestors All positions have found a common ancestors on one sequence

Enumerating Trees: Unrooted & valency Recursion: T n = (2n-5) T n-1 Initialisation: T 1 = T 2 = T 3 =1

History First prokaryotic genome – H. influenzae 1996 First unicellular eukaryotic genome – Yeast 1998 The first multi-cellular eukaryotic genome – C.elegans 2000 Drosophila melanogaster, Arabidopsis thaliana 2001 Human Genome 2002 Mouse Genome 2005 Chimp Genome

 globin Exon 2 Exon 1 Exon 3 5’ flanking 3’ flanking (chromosome 11) The Human Genome & R.Harding & HMG (2004) p 245 *5.000 *20 6*10 4 bp 3.2*10 9 bp *10 3 3*10 3 bp ATTGCCATGTCGATAATTGGACTATTTGGA30 bp Myoglobin  globin aa DNA: Protein: X Y mitochondria.016

Molecular Evolution and Gene Finding: Two HMMs Simple Prokaryotic Simple Eukaryotic AGTGGTACCATTTAATGCG..... P coding {ATG-->GTG} or AGTGGTACTATTTAGTGCG..... P non-coding {ATG-->GTG}

Three Questions for Hidden Structures. What is the probability of the data? What is the most probable ”hidden” configuration? What is the probability of specific ”hidden” state? Training: Given a set of instances, find parameters making them probable if they were independent. O 1 O 2 O 3 O 4 O 5 O 6 O 7 O 8 O 9 O 10 H1H1 H2H2 H3H3 W i j 1 L WLWL WRWR i’j’ HMM/Stochastic Regular Grammar SCFG - Stochastic Context Free Grammars

Bioinformatics and Computational Biology: History and Biological Background (JH) The Parsimony criterion GKN Stochastic Models of Sequence Evolution GKN The Likelihood criterion GKN Trees in phylogenetics and population genetics GKN Estimating phylogenies and genealogies I GKN Estimating phylogenies and genealogies II GKN Estimating phylogenies and genealogies III GKN Alignment Algorithms I (Optimisation) (JH) Alignment Algorithms II (Statistical Inference) (JH) Finding Signals in Sequences (JH) Stochastic Grammars and their Biological Applications: Hidden Markov Models (JH) Stochastic Grammars and their Biological Applications: Context Free Grammars (JH) RNA molecules and their analysis (JH) Open Problems in Bioinformatics and Computational Biology I (JH) Possibly: Evolving Grammars, Pedigrees from Genomes Open Problems in Bioinformatics and Computational Biology II (GKN) Possibly: The phylogeny of language: traits and dates, What can FIV sequences tell us about their host cat population?