IAENG_IMECS_ICB II, Room E 10:45~13:00, March 21, 2007, Hong Kong Pseudo-Reverse Approach in Genetic Evolution: An Empirical Study with Enzymes Sukanya.

Slides:



Advertisements
Similar presentations
1 Number of substitutions between two protein- coding genes Dan Graur.
Advertisements

Quick Lesson on dN/dS Neutral Selection Codon Degeneracy Synonymous vs. Non-synonymous dN/dS ratios Why Selection? The Problem.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Statistics in Bioinformatics May 2, 2002 Quiz-15 min Learning objectives-Understand equally likely outcomes, Counting techniques (Example, genetic code,
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
EVOLUTIONARY CHANGE IN DNA SEQUENCES - usually too slow to monitor directly… … so use comparative analysis of 2 sequences which share a common ancestor.
Molecular Evolution Revised 29/12/06
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Methods of identification and localization of the DNA coding sequences Jacek Leluk Interdisciplinary Centre for Mathematical and Computational Modelling,
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
Computational Biology, Part 2 Representing and Finding Sequence Features using Consensus Sequences Robert F. Murphy Copyright  All rights reserved.
Molecular Evolution, Part 2 Everything you didn’t want to know… and more! Everything you didn’t want to know… and more!
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence Alignments Revisited
Adaptive Molecular Evolution Nonsynonymous vs Synonymous.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Sequencing a genome and Basic Sequence Alignment
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Laboratory Training for Field Epidemiologists Typing May 2007 Sequencing and Phylogeny.
Chapter 3 Substitution Patterns Presented by: Adrian Padilla.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Lecture 3: Markov models of sequence evolution Alexei Drummond.
Tree Inference Methods
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Introduction to Bioinformatics.
Comp. Genomics Recitation 3 The statistics of database searching.
Calculating branch lengths from distances. ABC A B C----- a b c.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Spliceosome attachs to hnRNA and begins to snip out non-coding introns mRNA strand composed of exons is free to leave the nucleus.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Phylogeny Ch. 7 & 8.
Chapter 3 The Interrupted Gene.
NEW TOPIC: MOLECULAR EVOLUTION.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2.
Mutations Can Change the Meaning of Genes CH 11 Section 6.
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Causes of Variation in Substitution Rates
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Linkage and Linkage Disequilibrium
Maximum likelihood (ML) method
Pipelines for Computational Analysis (Bioinformatics)
Distances.
Models of Sequence Evolution
Ab initio gene prediction
What are the Patterns Of Nucleotide Substitution Within Coding and
Molecular Evolution.
Introduction to Bioinformatics II
Summary and Recommendations
PROTEIN SYNTHESIS = CELL CONTROL
Chapter 9 Using the Genetic Code.
Pedir alineamiento múltiple
Section 20.4 Mutations and Genetic Variation
Summary and Recommendations
Presentation transcript:

IAENG_IMECS_ICB II, Room E 10:45~13:00, March 21, 2007, Hong Kong Pseudo-Reverse Approach in Genetic Evolution: An Empirical Study with Enzymes Sukanya Manna Cheng-Yuan Liou National Taiwan University Department of Computer Science and Information Engineering

2 NTU land size ~ 360 平方公里 huge botanic garden in high mountains>3000meters 台大扁泥蟲 台大扁泥蟲 eleven colleges, 54 departments, 96 graduate institutes (which offer 96 Master's programs and 83 doctoral programs), research centers: the Division of Population and Gender Studies, the Center for Condensed Matter Sciences, the Center for Biotechnology, Japanese Research Center, and the Biodiversity Center. The number of students reached 29,877 in 2004, including the students from the division of Continuing Education & Professional development

3 Concepts used Under neutral evolution –Rate of synonymous substitution = Rate of Nonsynonymous substitutions –Estimation of rate of synonymous and nonsynonymous substitutions has become an important subject in molecular evolution

4 Why? ‘Draft’ theory: initial and intuitive evolution model Part of evol based on a set of core systems. They are relatively invariant (hard and strong) over evolution. Qualitative changes occur as distinct systems are integrated. Separate systems conjoin to produce distinctively patterns of evol change. This model provides evol flexiblity.

5 Assumptions For comparative genomics –nondistantly related species like human and mouse share the vast majority of their genes amino acid sequences obtained for each enzymes share a great similarity like homologous genes

6 Our Approach Amino acid sequences for each enzyme proteins. Least Mismatch between two aa sequences, and selection of trio Generating the nucleotide (nt) sequences for the aa sequences from the trio. Perform dn/ds ratio test among the pair of species with randomly generated nt sequences. Overview of the steps undertaken

7 Our Approach (contd.) AATGATTGTCAAGAGCAT AAG TTT TAT NDCQEHKFY Nt to AA AA to nt AATGATTGTCAAGAGCAT AAG TTT TAT AACGATTGCCAAGAACAT AAG TTT TAT AATGACTGTCAGGAGCAC AAG TTC TAT ……………… REVERSEREVERSE All possible combinations, Infeasible, High space and time complexity

8 Basic Concepts Nucleotides –A,G,T,C (DNA) –A,G,U,C(mRNA) Amino acid –20 naturally occurring –Coded by a triplet of nucleotide bases (referred as a codon) Synonymous/Nonsynony mous substitution –A substitution of a base within the codon that does not / does change the type amino acid it represents. 4 3 =64 codons code for 20 amino acids 3 of the 64 codons are stop codons that marks the end of a gene section (ie. end of exon)

9 Model Used Jukes and Cantor (one parameter method) –Assumes rate of substitution between all pairs of A,T,C,G is the same. – where p is either p s or p n (result is d s and d n respectively) p s = S d /S p n = N d /N S d / N d – total # of synonymous / nonsynonymous difference for all codons compared S / N – numbers of synonmous / nonsynonmous sites

10 Our Approach (contd.) Normally, we have seen that the amino acids sequences are obtained from nucleotide sequences by using the universal genetic mapping table. Generating the nucleotide sequences from the amino acid sequences is a concept of reverse process. For a particular amino acid sequences, there can be numerous nucleotide sequences for all the possible combination of codons. But generation of all sequences is infeasible because of very large time and space complexity. We use here this reverse mechanism, to match the closely related nucleotide sequences of the respective amino acids. The next slide will show, what method we have used to proceed with this situation.

11 Calculated the total frequency of codons from each genome Calculated cumulative probability of the codons from these frequencies Our Approach (contd.)

12 Our Approach (contd.) Generated the random sequences using the cumulative probability: –Best matched pairs Generate sequences for trio –All pairs with least mismatch Generate sequences only with the all pairs

13 A = [a 1, a 2,…a n ]aa sequences for HUMAN B = [b 1, b 2,…b m ]aa sequences for MOUSE C = [c 1, c 2,…c k ]aa sequences for RAT a 1 b 1, a 1 b 3, a 2 b 2, a 1 r 2, a 2 r 5, a 1 r 1, b 1 r 1, b 1 r 2, b 2 r 6 aa sequences with least mismatch Selecting the best matched pair Choose randomly such that three pairs will be: a 1 b 1, b 1 r 2 and a 1 r 2 a 1 b 1 r 2 is the trio Our Approach (contd.) Calculate all possible mismatch between AB, BC and CA

14 Our Approach (contd.) Least mismatch means maximum similarity in their sequences. Let A, B, C be the amino acid sequences for human, mouse and rat respectively. We compare the two sequences with one amino acids at a time. Calculated the possible mismatches between all sequences. Separated out the ones with least mismatch. Here the example is shown for the amino acid sequences for one particular enzyme.

15 Our Approach (contd.) Generalized algorithm –Pathway analysis by model of Nei and Gojobori –No transition matrix used here –No phylogenetic tree for codon comparison –Sliding buffer of 3 characters used for codon comparison. –Used Jukes and Cantor’s model for multiple nucleotide substitution correction.

16 AATGATTGTCAAGAGCAT AAG TTT TAT AATGACTGTCAGGAGCAC AAG TTC TAT Sliding buffer compares codons for each sequences each time Use Nei and Goobori’s model to calculate the pathways and Jukes and Cantor’s model to get dn/ds. Our Approach (contd.)

17 Experimental Results (Best matched pairs) dn/ds Ratio of the Human-Mouse, Mouse-Rat and Human-Rat Comparison for the Enzymes Common in all. Numbers in brackets is the length of sequence compared.

18 Experimental Results (contd.) (Best matched pairs) dn/ds Ratio of Human-Mouse and Mouse-Rat Comparison for the Enzymes not Common in them.

19 Experimental Results (contd.) (Best matched pairs) Valid dn/ds Ratio of the Mouse-Rat Comparison for the Enzymes found only in these two species but not Human

20 Experimental Results (contd.) (All pairs with least mismatch) dn/ds Ratio of the Human-Mouse, Mouse-Rat and Human-Rat Comparison for the Enzymes Common in all. This graph shows the enzymes with only one least mismatch sequence pair for each species pair.

21 Experimental Results (contd.) (All pairs with least mismatch) Transaldolase Carboxylesterase For all three species comparison, enzymes with more than one least mismatch. dn/ds ratio of human-mouse, mouse-rat and human-rat comparison for the enzymes common in all. The graphs show the enzymes with multiple least mismatch sequence pair for each species pair. The label in x-axis indicates the sequence pair number and is insignificant.

22 Experimental Results (contd.) (All pairs with least mismatch) Enzymes found only for Human-mouse comparison

23 Experimental Results (contd.) (All pairs with least mismatch) Enzymes found only for Mouse-rat comparison

24 Experimental Results (contd.) (All pairs with least mismatch) Enzymes found only for human-rat comparison

25 Experimental Results (contd.) (All pairs with least mismatch) Estimated time for aa substitution per for the enzymes

26 Experimental Results (contd.) (All pairs with least mismatch) Estimated time for aa substitution per for the enzymes common in all three species

27 Summary Rate of synonymous substitution varies considerably from gene to gene Many enzymes, inspite of being proteins in nature, do not provide the valid results Accuracy rate is about 50% to 55%. Nonsynonymous sites were too high for some cases, so no valid result.

28 Summary (contd.) In cases of enzymes, the variation is high in comparison to the ordinary proteins as mentioned in the case study with ordinary proteins by Prof Li. Enzymes possess restoration capability after chemical reactions, that means it can resist many mutations.

29 Summary (contd.) Here, in this work, estimated time for mutation is around 5 times more (~400 Myr). We can say that they are 5 times stronger than ordinary proteins.

30 Summary (contd.) EnzymesLi’s ApproachOur Approach Codons compared (H-M/R) dn/ds ratio Codons compared (H-M) dn/ds ratio Codons compared (H-R) dn/ds ratio Aldolase A NVR Creatine kinase M Lactate dehydrogenase A Glyceraldehyde-3- phosphate dehydrogenase NVR332NVR Glutamine synthetase Adenine phosphoribosyltransferase NVR179NVR Carbonic anhydrase I NVR Comparison between already Established Result and Our Approach (NVR – No Valid Results, H-Human, M-Mouse, R-Rat)

31 Summary (contd.) None of the values can be considered to be accurate. All may vary with the parameters or the assumption taken into account. We can just observe the nature of selection – whether neutral or purifying or diversifying. In this table, the variations have occurred, but we don’t know which pair of genes have been taken by Prof Li. For our case, the random sequence generated might have varied a lot from what the nucleotide sequence for that gene should have been originally. NVR means- not valid result. In these cases the ratio could not be calculated as the value of ds obtained was not a valid number that could be computed.

Thank You Suppl. Materials in website. Evol model is Hairy model.websiteHairy model