Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG.

Slides:



Advertisements
Similar presentations
B. Knudsen and J. Hein Department of Genetics and Ecology
Advertisements

Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB
Unravelling the biochemical reaction kinetics from time-series data Santiago Schnell Indiana University School of Informatics and Biocomplexity Institute.
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Comparative Biology with focus on 8 examples Comparative Biology The Domain of Comparative Biology The purpose of Comparative Biology Co-modeling in Comparative.
Network Evolution What do models of network evolution do?: t1t1 t2t2 T Overview of today’s lecture: General considerations in.
Integrating Genomes D. R. Zerbino, B. Paten, D. Haussler Science 336, 179 (2012) Teacher: Professor Chao, Kun-Mao Speaker: Ho, Bin-Shenq June 4, 2012.
Preview: Some illustrations of graphs in Integrative Genomics Genomics  Transcriptomics: Alternative Splicing Genomics  Phenotype: Genetic Mapping Comparative.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Probabilistic Modeling of Molecular Evolution Using Excel, AgentSheets, and R Jeff Krause (Shodor)
Multiscale Stochastic Simulation Algorithm with Stochastic Partial Equilibrium Assumption for Chemically Reacting Systems Linda Petzold and Yang Cao University.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
1. Elements of the Genetic Algorithm  Genome: A finite dynamical system model as a set of d polynomials over  2 (finite field of 2 elements)  Fitness.
Comparative Biology with focus on 8 examples Comparative Biology The Domain of Comparative Biology The purpose of Comparative Biology Co-modeling in Comparative.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Sónia Martins Bruno Martins José Cruz IGC, February 20 th, 2008.
Realistic evolutionary models Marjolijn Elsinga & Lars Hemel.
Statistical Alignment: Computational Properties, Homology Testing and Goodness-of-Fit J. Hein, C. Wiuf, B. Knudsen, M.B. Moller and G. Wibling.
Molecular Evolution, Part 2 Everything you didn’t want to know… and more! Everything you didn’t want to know… and more!
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
The Human Genome (Harding & Sanger) * *20  globin (chromosome 11) 6*10 4 bp 3*10 9 bp *10 3 Exon 2 Exon 1 Exon 3 5’ flanking 3’ flanking 3*10 3.
Comparative Genomics & Annotation The Foundation of Comparative Genomics The main methodological tasks of CG Annotation: Protein Gene Finding RNA Structure.
Sequencing a genome and Basic Sequence Alignment
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Automatic methods for functional annotation of sequences Petri Törönen.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Comparative Genomics & Annotation The Foundation of Comparative Genomics The main methodological tasks of CG Annotation: Protein Gene Finding RNA Structure.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
People Graduates Rahul Satija - Footprinting and Statistical Alignment Joanna Davies - Integrative Genomics of Asthma Aziz Mithani - Comparison of Metabolic.
RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Lecture 4: Metabolism Reaction system as ordinary differential equations Reaction system as stochastic process.
Network Evolution Yeast Protein Interaction Network from Knowledge and Model Organisms KangarooHuman Mouse Rat Comparative.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Sequencing a genome and Basic Sequence Alignment
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Mathematical Modeling of Signal Transduction Pathways Biplab Bose IIT Guwahati.
5 Open Problems in Bioinformatics Pedigrees from Genomes Comparative Genomics of Alternative Splicing Viral Annotation Evolving Turing Patterns Protein.
Network Evolution Statistics of Networks Comparing Networks Networks in Cellular Biology A. Metabolic Pathways B. Regulatory Networks C. Signaling Pathways.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.
Hidden Markov Models in Bioinformatics
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Introduction to biological molecular networks
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Biochemical Reactions: how types of molecules combine. Playing by the Rules + + 2a2a b c.
What is Bioinformatics?
Network Evolution ( min.) Networks in Cellular Biology A. Metabolic Pathways B. Regulatory Networks C. Signaling Pathways D. Protein Interaction.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Source Bohringer-Mannheim. The History of Biological Networks Structures Strings, Physical Structures,…. Networks: Objects with relationships or discrete.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Genome Annotation (protein coding genes)
CSCI2950-C Lecture 12 Networks
Catalogues, Homology & Molecular Evolution.
Molecular Evolution.
Hidden Markov Models in Bioinformatics min
Computational Biology
Presentation transcript:

Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG Time Direction Which phylogeny? Which ancestral states? Which process? Key Questions: Homologous objects Co-modelling Genealogical Structures? Key Generalisations:

Structure of Biology: Physical Systems and Evolution Data Sequences Structures Expression Levels …. … Data M 1.. M k Models Framework for model formulation Models Scientific Texts, Systems Biology Markup Language, Process Algebras … Knowledge and Representation Knowledge & Representation Structure of Biological Systems Atoms, Molecules, Networks, Motors Central Dogma, Genetic Code … Structure of Biological Systems Dynamics - the system as a physical entity Evolution - the system has evolved Part of individuals in a population Part of species in the tree of life

The Data Sequence Data Metabonomics/Metabolomics and Small Molecule Detection Expression Data Proteomics and Protein Interactions Structures from Crystallography, NMR and Cryo-EM Single Molecule Measurements Microscopy

Example of Reduction/Levels Enzyme catalysis: Such reductions can are based on “biological concepts” A molecular dynamics sample path involving one catalysis event: Set of E + S initial states ES states? Set of E + P final states 10 9 time steps 10 4 atoms Discrete models of one catalysis event: E + S  ES  E + P 3-5 steps reduction Other clear reductions: Individual molecules Concentration of molecules Set of atoms Nucleotide lipid molecules Membrane

Elements of Physical Dynamic Modeling Time Continuous Time Discrete Time 0 12 k No Time - Equilibrium State & Space Continuous Space Discrete Space No Space or Space Homogeneity Time/Space dependency Discrete Time 0 1k-i k-1 k Deterministic Stochastic p0p0 p1p1 p2p2 p3p3 Discrete Time Continuous Time Complicated & contentious.

Physical Dynamic Modeling: Key Models Molecular Dynamics Quantum Mechanics Classical Potential Continuous Time Markov Chains/ Gillespie Algorithm Ordinary Differential Equations - ODE Partial Differential Equations - PDE (Turing Model) Stochastic Ordinary Differential Equations - SODE Stochastic Partial Differential Equations - SPDE Models on Networks Boolean Networks Kinetic Models

Elusive Biological Concepts: Emergence Other EBCs: function, robustness, modularity, purpose, top-down, downward causation. Strong emergence: (never observed) The dynamic laws for k components are not deducible from their properties and their relationships. Lower level High dimensional detailed description Higher level Low dimensional “Surprising” stable, robust properties Reduction Weak emergence: something “new” emerges. Questions : Automatic detection of emergence? How frequent is it? Does selection pull out emergent systems? Ex.1 Network Dynamics Oscillations, sensitive amplification Large set of enzymes and atoms Ex.2 Neural Networks Ability to calculate, consciousness Large set of cells

Levels & Objects

How to Compare? Examples Protein Structures NetworksCraniums/Shape Homologous - Non-Homologous? Homologous components A C G T A - T T Matching - Similarity - Distance Distance from shortest paths The ideal: The probability of 1 observation * Summing over possible evolutionary trajectories to the second observation. Informal A set: AGT ACCT P( ) A pair:

“Natural” Evolutionary Modeling Components: Birth and Death Process. Components are born with rate and die with rate . Discrete states: Continuous Time Finite States Markov Chains. Initially all rates the same. p0p0 p1p1 p2p2 p3p3 Continuous states: Continuous Time Continuous States Markov Process - specifically Diffusion. Initially simplest Diffusion: Brownian Motion, then Ornstein-Uhlenbeck.

Comparative Biology Nucleotides/Amino Acids Continuous Quantities Sequences Gene Structure Structure RNA Protein Networks Metabolic Pathways Protein Interaction Regulatory Pathways Signal Transduction Macromolecular Assemblies Motors Shape Patterns Tissue/Organs/Skeleton/ …. Dynamics MD movements of proteins Locomotion Culture Language Vocabulary Grammar Phonetics Semantics Observed or predicted? Choice of Representation.

Comparative Biology: Evolutionary Models Nucleotides/Amino Acids/codons CTFS continuous time finite state Jukes-Cantor other Continuous Quantities CTCS Felsenstein other Sequences CT countable S Thorne, Kishino Felsenstein, Gene Structure Matching DeGroot, 07 Genome Structure CTCS MM Structure RNA SCFG-model like Holmes, I few others Protein Networks CT countable S Snijder, T Metabolic Pathways Protein Interaction Regulatory Pathways Signal Transduction Macromolecular Assemblies Motors I Shape Patterns Tissue/Organs/Skeleton/ …. Dynamics MD movements of proteins Locomotion Culture Language Vocabulary “ Infinite Allele Model ” (CTCS) Swadesh,52, Sankoff,72, … Grammar - Phonetics Semantics Phenotype ObjectTypeReference

“Natural” Co-Modeling Joint evolutionary modeling of X(t),Y(t). The ideal, rarely if ever done. Conditional evolutionary modeling of X(t) given Y(t). The standard in comparative genomics. The distribution of Y(t) is not derived from evolution, but from practicality. Protein Gene Prediction RNA structure prediction Regulatory signal prediction. Y(t) deterministic function of X(t) Movement of proteins Protein Structures

Examples RNA structure prediction Comparative Genomics Networks Patterns Protein Structures

Structure Dependent Molecular Evolution RNA Secondary Structure From Durbin et al.(1998) Biological Sequence Comparison Secondary Structure : Set of paired positions. A-U + C-G can base pair. Some other pairings can occur + triple interactions exists. Pseudoknot – non nested pairing: i < j < k < l and i-k & j-l.

Simple String Generators Context Free Grammar  S--> aSa bSb aa bb One sentence (even length palindromes): S--> aSa --> abSba --> abaaba Variables (capital) Letters (small) Regular Grammar: Start with S S --> aT bS T --> aS bT  One sentence – odd # of a’s: S-> aT -> aaS –> aabS -> aabaT -> aaba Regular Context Free

Stochastic Grammars The grammars above classify all string as belonging to the language or not. All variables has a finite set of substitution rules. Assigning probabilities to the use of each rule will assign probabilities to the strings in the language. S -> aSa -> abSba -> abaaba i. Start with S. S --> (0.3)aT (0.7)bS T --> (0.2)aS (0.4)bT (0.2)  If there is a 1-1 derivation (creation) of a string, the probability of a string can be obtained as the product probability of the applied rules. S -> aT -> aaS –> aabS -> aabaT -> aaba ii.  S--> (0.3)aSa (0.5)bSb (0.1)aa (0.1)bb *0.3 *0.2 *0.7 *0.3 *0.2 *0.5 *0.1

S --> LS L F --> dFd LS L --> s dFd Secondary Structure Generators

Knudsen & Hein, 2003 From Knudsen & Hein (1999) RNA Structure Application

Co-Modelling and Conditional Modelling Observable Unobservable Goldman, Thorne & Jones, 96 U C G A C A U A C Knudsen.., 99 Eddy & co. Meyer and Durbin 02 Pedersen …, 03 Siepel & Haussler 03 Pedersen, Meyer, Forsberg…, Simmonds 2004a,b McCauley …. Firth & Brown Conditional Modelling Needs: Footprinting -Signals (Blanchette) AGGTATATAATGCG..... P coding {ATG-->GTG} or AGCCATTTAGTGCG..... P non-coding {ATG-->GTG}

Network Evolution Statistics of Networks Comparing Networks Networks in Cellular Biology A. Metabolic Pathways B. Regulatory Networks C. Signaling Pathways D. Protein Interaction Networks - PIN Empirical Facts Dynamics on Networks (models) Models of Network Evolution

A Model for Network Inference A core metabolism: A given set of metabolites: A given set of possible reactions - arrows not shown. A set of present reactions - M black and red arrows Let  be the rate of deletion  the rate of insertion  Then Restriction R: A metabolism must define a connected graph M + R defines 1. a set of deletable (dashed) edges D(M): 2. and a set of addable edges A(M):

Likelihood of Homologous Pathways Number of Metabolisms: symmetrical versions P  (, )=P  ( )P  ( -> ) Eleni Giannoulatou Approaches: Continuous Time Markov Chains with computational tricks. MCMC Importance Sampling

PIN Network Evolution Barabasi & Oltvai, 2004 & Berg et al.,2004; Wiuf etal., 2006 A gene duplicates Inherits it connections The connections can change Berg et al.,2004: Gene duplication slow ~10 -9 /year Connection evolution fast ~10 -6 /year Observed networks can be modeled as if node number was fixed.

Likelihood of PINs Can only handle 1 graph. Limited Evolution Model de-DAing De-connecting Data 2386 nodes and 7221 links Irreducible (and isomorphic) 735 nodes Wiuf etal., 2006

The Phylogenetic Turing Patterns I

Stripes: p small Spots: p large The Phylogenetic Turing Patterns II Reaction-Diffusion Equations: Analysis Tasks: 1.Choose Class of Mechanisms 2. Observe Empirical Patterns 3.Choose Closest set of Turing Patterns T 1, T 2,.., T k, 4.Choose parameters p 1, p 2,.., p k (sets?) behind T 1,.. Evolutionary Modelling Tasks: 1. p(t 1 )-p(t 2 ) ~ N(0, (t 1 -t 2 )  ) 2. Non-overlapping intervals have independent increments I.e. Brownian Motion Scientific Motivation: 1.Is there evolutionary information on pattern mechanisms? 2. How does patterns evolve?

Known Unknown  -globin Myoglobin 300 amino acid changes 800 nucleotide changes 1 structural change 1.4 Gyr ? ? ? ? 1. Given Structure what are the possible events that could happen? 2. What are their probabilities? Old fashioned substitution + indel process with bias. Bias: Folding(Sequence  Structure) & Fitness of Structure 3. Summation over all paths. Protein Structure

Summary: The Virtues of Comparative Modeling It is the natural setup for much modeling and transfer of knowledge from one species/system to another. Even 1 system/species is an evolutionary observation: x P(x): P(Further history of x): x U C G A C A U A C