Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt1 Analysis of mitochondrial transit peptides of Plasmodium falciparum Andreas.

Slides:



Advertisements
Similar presentations
(SubLoc) Support vector machine approach for protein subcelluar localization prediction (SubLoc) Kim Hye Jin Intelligent Multimedia Lab
Advertisements

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Luciano Brocchieri, PhD Research Interests. Summary of Research Interests 1.Gene identification and genome annotation 2.The evolution of genome-sequence.
An Analysis of “Coronavirus 3CL pro proteinase cleavage sites: Possible relevance to SARS virus pathology” Connie Wu.
Measuring the degree of similarity: PAM and blosum Matrix
SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V What are Tandem repeats? - How does one find.
Open Day 2006 From Expression, Through Annotation, to Function Ohad Manor & Tali Goren.
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
Prediction of protein localization and membrane protein topology Gunnar von Heijne Department of Biochemistry and Biophysics Stockholm Bioinformatics Center.
PROTEIN SECONDARY STRUCTURE PREDICTION WITH NEURAL NETWORKS.
Gene prediction and HMM Computational Genomics 2005/6 Lecture 9b Slides taken from (and rapidly mixed) Larry Hunter, Tom Madej, William Stafford Noble,
JYC: CSM17 BioinformaticsCSM17 Week 10: Summary, Conclusions, The Future.....? Bioinformatics is –the study of living systems –with respect to representation,
Protein Structure Modeling (1). Protein Folding Problem A protein folds into a unique 3D structure under physiological conditions Lysozyme sequence: KVFGRCELAA.
Bio 465 Summary. Overview Conserved DNA Conserved DNA Drug Targets, TreeSAAP Drug Targets, TreeSAAP Next Generation Sequencing Next Generation Sequencing.
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
“Gene Finding in Novel Genomes” by Ian Korf Presented by: Christine Lee SoCAL BSI 2004.
It & Health 2010 Summary Thomas Nordahl Petersen.
Introduction: stepping into the science What kind of research is being done on the project? What is an Arabidopsis plant? How does the ABE workshop fit.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca.
PREDICTION OF PROTEIN FEATURES Beyond protein structure (TM, signal/target peptides, coiled coils, conservation…)
EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI.
Protein Structures.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
Truncation of Protein Sequences for Fast Profile Alignment with Application to Subcellular Localization Man-Wai MAK and Wei WANG The Hong Kong Polytechnic.
PROTEINS Nicky Mulder Acknowledgements: Anna Kramvis for lecture material (adapted here)
Day 2: Protein Sequence Analysis 1.Physico-chemical properties. 2.Cellular localization. 3.Signal peptides. 4.Transmembrane domains. 5.Post-translational.
1 Introduction(1/2)  Eukaryotic cells can synthesize up to 10,000 different kinds of proteins  The correct transport of a protein to its final destination.
An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.
DAY 1c: Accessing Completed Genomes 1. UCSC Genome Bioinformatics 2. Ensembl 3. NCBI Genomic Biology.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Clustering Metabolic Networks Using Minimum Cut Trees Ryan Kellogg 1, Allison Heath 2, Lydia Kavraki 2,3 1 Carnegie Mellon University, Department of Electrical.
Localization prediction of transmembrane proteins Stefan Maetschke, Mikael Bodén and Marcus Gallagher The University of Queensland.
es/by-sa/2.0/. From Protein Sequence to Protein Properties Prof:Rui Alves Dept Ciencies.
Predicting protein degradation rates Karen Page. The central dogma DNA RNA protein Transcription Translation The expression of genetic information stored.
Protein Secondary Structure Prediction G P S Raghava.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
1 Web Site: Dr. G P S Raghava, Head Bioinformatics Centre Institute of Microbial Technology, Chandigarh, India Prediction.
Motif discovery and Protein Databases Tutorial 5.
From Genomes to Genes Rui Alves.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Bioinformatics lectures at Rice University Li Zhang Lecture 11: Networks and integrative genomic analysis-3 Genomic data
LOGO iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance- Pairs and Reduced Alphabet Profile into the General Pseudo Amino.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Cool BaRC Web Tools Prat Thiru. BaRC Web Tools We have.
Protein Properties Function, structure Residue features Targeting Post-trans modifications BIO520 BioinformaticsJim Lund Reading: Chapter , 11.7,
Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for.
Visually Demonstrating the Principles of Protein Folding Bill McClung, Jeff Schwehm, Greg Wolffe.
Bioinformatics in Vaccine Design
Plasmodium falciparum (3D7) - published in Draft coverage. No sequence updates for a year. No new annotation since? Leishmania major Friedlin - version.
1 Computational Approaches(1/7)  Computational methods can be divided into four categories: prediction methods based on  (i) The overall protein amino.
Introduction to Bioinformatics Summary Thomas Nordahl Petersen.
Gene prediction in metagenomic fragments: A large scale machine learning approach Katharina J Hoff, Maike Tech, Thomas Lingner, Rolf Daniel, Burkhard Morgenstern.
The Transcriptional Landscape of the Mammalian Genome
Functional Annotation of Transcripts
Prediction of Regulatory Elements for Non-Model Organisms Rachita Sharma, Patricia.
There are four levels of structure in proteins
(A) Block diagram of the precursor proteins predicted from the Oak1, 2, 3, and 4 clones showing the signal peptide (light shading), the regions corresponding.
Structure of in‐frame deletions.
Evolutionary Inference across Eukaryotes Identifies Specific Pressures Favoring Mitochondrial Gene Retention  Iain G. Johnston, Ben P. Williams  Cell.
Import Determinants of Organelle-Specific and Dual Targeting Peptides of Mitochondria and Chloroplasts in Arabidopsis thaliana  Changrong Ge, Erika Spånning,
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Assignment 5 Example of multivariate regression
Presentation transcript:

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt1 Analysis of mitochondrial transit peptides of Plasmodium falciparum Andreas Bender Diplomarbeit Research Group Gisbert Schneider April September 2002 Goethe-University, Frankfurt

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt2 Contents Why … ? Our results – in short Biological background Data coding and analysis Detailed results P. falciparum and other organisms Summary and outlook

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt3 Why … ? Why P. falciparum ? –It causes malaria –Genome sequencing recently completed –„Apicoplastic pressure“ –Closely related to Toxoplasma gondii etc.

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt4 Why … ? Why mitochondrial transit peptides? –Recent related work for apicoplast exists –Major compartment –Failure of established tools

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt5 Our results – in short Artificial neural networks results: –Mathews coefficient cc = 0.74 (test set), corresponding to ~90% correct predictions –381 to 1177 mTPs found in 5334 annotated genes (7% to 22%) of Plasmodium falciparum

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt6 Biological background Female Anopheles

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt7 Biological background Courtesy of Mark F. Wiser, Tulane University

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt8 Biological background - Targeting Courtesy of the Division of Biological Sciences, University of Montana

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt9 Biological background Mitochondrial targeting signals – Characteristics –N-terminal, internal, C-terminal –Matrix-targeting or IMS-targeting (bipartite) –No sequence conservation –On average amino acids –Net positive charge, forms α-Helix –Distinct cleavage site (Arg at -2 or -3,…)

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt10 Data coding and analysis 3 Lengths: N-terminal 24, 31, 42 residues Redundance reduction Two representations: –Relative amino acid frequencies (20-dim.) –Physikochemical properties (19-dim.) SOM ANN Variable selection

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt11 Data coding and analysis

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt12 Data coding and analysis Three-layer feed-forward perceptrons Input data –N-terminal 24, 31 and 42 amino acids –Coded in relative amino acid frequency and in physikochemical space All parameters varied one-at-a-time 10-fold cross-validation, 40 positive examples, 135 negative

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt13 Data coding and analysis

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt14 Data coding and analysis Two ANNs –Best cc: 1177 of 5334 annotated genes have mTPs (~22%) –High penalty for overpredictions: 381 of 5334 annotated genes have mTPs (~7%) –Arabidopsis thaliana: 8% mTPs –Saccharomyces cerevisiae: 11% mTPs

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt15 Data coding and analysis Matthews cc SensitivitySelectivity MitoProtII TargetP PlasMit

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt16 P. falciparum and other organisms

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt17 P. falciparum and other organisms 25% G+C-Content in coding regions (sample of chromsome 2 and 3) In good agreement with work of Lobry for 50 bacterial genomes

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt18 Summary Failure of established tools for mTP pred. There are general differences in AA usage between P. falciparum and other eukaryotes Low G+C-Content of coding regions New tool PlasMit outperforms existing algorithms

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt19 Outlook Question: Why are there so many positive predictions in P. falciparum ? Using PlasMit for assembling putative metabolic pathways in the mitochondria will now be possible Final goal: Full map of P. falciparum´s metabolism

Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt20 Thank you!