ABSTRACT We have conducted an extensive computational analysis of the Culex quinquefasciatus genome to find and annotate a specific subfamily of the TEs:

Slides:



Advertisements
Similar presentations
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Advertisements

Pfam(Protein families )
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Differential insertion of transposable elements in Anopheles gambiae M & S genomes Jenica L. Abrudan, Ryan C. Kennedy, Maria F. Unger, Michael R. Olson,
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
BIOINFORMATICS Ency Lee.
Genome analysis and annotation. Genome Annotation Which sequences code for proteins and structural RNAs ? What is the function of the predicted gene products.
Structural bioinformatics
Linux Platform  Download the source tar ball from the BLAST source code link  ncbi-blast src.tar.gz  Compilation  cd /BLASTdirectory/c++ ./configure.
Molecular Evidence Using DNA, RNA or Protein Sequences to Classify Organisms.
Specie: Anopheles gambiae PEST Genome size: 260 Mb Status: 3rd assembly and annotation NIAID funded.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Protein Modules An Introduction to Bioinformatics.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
VIRUS PROPERTIES Infectious – must be transmissible horizontally Intracellular – require living cells RNA or DNA genome, not both* Most all have protein.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Novel computational methods for large scale genome comparison PhD Director: Dr. Xavier Messeguer Departament de Llenguatges i Sistemes Informàtics Universitat.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Protein Tertiary Structure Prediction
Metagenomic Analysis Using MEGAN4
Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
Abstract Although transposable elements (TEs) were discovered over 50 years ago, the robust discovery of them in newly sequenced genomes remains a difficult.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)
Repetitive Elements May Comprise Over Two-Thirds of the Human Genome
Development: differentiating cells to become an organism.
Small protein modules with similar 3D structure but different amino acid sequence Institute of Evolution, University of Haifa, ISRAEL Genome Diversity.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Discovering the Correlation Between Evolutionary Genomics and Protein-Protein Interaction Rezaul Kabir and Brett Thompson
Non-Coding Areas & Mutations Within the human genome the majority of the DNA (~75%) is made up of sequences not involved in coding for proteins, RNA, or.
ARE THESE ALL BEARS? WHICH ONES ARE MORE CLOSELY RELATED?
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Vectorbase and Galaxy Jarek Nabrzyski On behalf of VectorBase Center for Research Computing University of Notre Dame VectorBase Bioinformatics Resource.
Genome Annotation Rosana O. Babu.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Overview of Bioinformatics 1 Module Denis Manley..
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
PREDICTION OF CATALYTIC RESIDUES IN PROTEINS USING MACHINE-LEARNING TECHNIQUES Natalia V. Petrova (Ph.D. Student, Georgetown University, Biochemistry Department),
Map-based Exploration of Population Biology Data in VectorBase What is VectorBase? We are a consortium of institutions that hosts the genomes of invertebrate.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Overview and History of VectorBase Frank Collins March 31, 2015.
ABSTRACT Isolation and phylogeny of endogenous retroviral elements belonging to the HERV-K LTR in cDNA library of human fetal brain and X q 21.3 region.
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Finding genes in the genome
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Bos taurus Olfactory Receptor Katie Davis 1,2 and Sandra Rodriguez-Zas 1 1 Department of Animal Sciences, University of Illinois Urbana-Champaign, 2 ACES.
Bioinformatics Overview
Metagenomic Species Diversity.
Introduction to Bioinformatics Resources for DNA Barcoding
Renée J. Estephan, Biology
Bioinformatics in the Dynamic Genome Course
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Pipelines for Computational Analysis (Bioinformatics)
Genomes and their evolution
Genomes and their evolution
Predict Protein Sequence by Fuzzy-Association Rules
Overview Bioinformatics: Analyzing biological data using statistics, math modeling, and computer science BLAST = Basic Local Alignment Search Tool Input.
Benson Otarigho 1&2 & Mofolusho O. Falade 2
Explore Evolution: Instrument for Analysis
BSC1010: Intro to Biology I K. Maltz Chapter 21.
Mock Presentation 11/27/18 Amrita Kondeti
Evolution of Genomes Chapter 21.
Mosquito Mayhem? What do these mosquitoes want from us?
Recently Mobilized Transposons in the Human and Chimpanzee Genomes
Figure 1: Map of Study Area
Presentation transcript:

ABSTRACT We have conducted an extensive computational analysis of the Culex quinquefasciatus genome to find and annotate a specific subfamily of the TEs: Class-I non-long terminal repeat retrotransposons (non-LTRs), by building a semi-automated pipeline 9. Initially we conducted BLAST searches to find the similarity to the known non-LTRs using amino acid sequences of Reverse-Transcriptase (RT) of known non-LTRs as the starting queries 5,6. Consequently Blast- hits (DNA sequences) were combined and extracted utilizing PERL scripts, to obtain non-LTR candidates of Culex. These sequences were than assembled using SEQMAN module of DNA-STAR, manually truncated, adjusted, and annotated. Annotation was done by two steps: I.- we annotated all the sequences using BLAST to nr database (NCBI), and identified some of Culex non-LTR consensuses as belonging to known non-LTR families; II.- we conducted phylogenetic analysis on all Culex non-LTRs, allowing us to further annotate our consensus sequences. Some of the elements were deteriorated and not possible to classify as a specific clade. Upon completing preliminary annotation, a copy number of each element in the genome within the threshold was found. Comparison between Aedes aegypti, Anopheles gambiae, and Culex quinquefasciatus, has shown different non-LTR clade composition, suggesting different evolutionary development of these species. INTRODUCTION Culex quinquefasciatus is an important vector of human pathogens in the United States and world-wide, including West Nile encephalitis and lymphatic filariases. Genomic analysis can help us better understand the adapting capabilities of this mosquito to various climatic environments and to the parasite. A significant part of any eukaryotic genome consists of the various types of repeats, including DNA and RNA Transposable Elements (TEs). The presence of TEs makes genomes difficult to assemble because of their repetitive nature and mobile activity. Thus it is one of the essential tasks of any genome project to annotate and characterize TEs. The recent Culex quinquefasciatus genome sequencing project provided us an opportunity to identify and annotate non-LTR retrotransposons. RESULTS Phylogram produced by PhyloDraw 7 visualizing tool, using as input Multiple Alignment file created by ClustalX 8 (N.J. algorithm) number of elements in: non-LTR clade Aedes aegypti 3 Anopheles gambiae 2 Culex quinquefasciatus L CR L I Jockey LOA 7 9 RTE 626 Loner 432 R CM-gag 2 Outcast 1511 R4 11 non-LTR% of thecopy cladegenomenumber Jockey unclassified CM-gag RTE CR L R Loner LOA L I total genomenon-LTR %Total TE % Aedes aegypti Anopheles gambiae Culex quinquefasciatus4.827 This Work Was Supported by the US National Institute of Allergy and Infectious Diseases (NIAID) contract HHSN C. REFERENCES 1. R. Holt, et al., The Genome Sequence of the Malaria Mosquito Anopheles gambiae, Science, 298: , J. Biedler, Z. Tu, Non-LTR Retrotransposons in the African Malaria Mosquito, Anopheles gambiae: Unprecedented Diversity and Evidence of Recent Activity. Molecular Biology and Evolution, 20(11): , V. Nene, et al., Genome sequence of Aedes aegypti, a Major Arbovirus Vector. Science, 316:1718, D. Lawson, et al., VectorBase: a data resource for invertebrate vector genomics. Nucleic Acids Research, 37:D58307, Repbase TEfam PhiloDraw 8. ClustalX2: clustalx win 9. VectorBase. CR1 L2 Jockey I Loner R1 RTE L1 LOA DISCUSSION Using only protein sequences in our semi-automated pipeline as starting queries, a large portion of elements (for which protein sequences were not available from TEfam 6 or Repbase 5 ) was overlooked. This problem was fixed by adding DNA sequences as BLAST queries to our pipeline, and we were able to identify and classify most of the overlooked elements. There is a rich diversity of non-LTRs present in Culex quinquefasciatus genome. Although there is no evidence of Outcast and R4 clades members in C. quinquefasciatus genome, there is a CM-gag, a unique Gag-only non-LTR retrotransposon, and LOA (which is not present in A. gambiae). Non-LTR clades vary widely in copy number. Jockey, CR1 and CM-gag have thousands of copies, while I, L2, LOA, Loner and R1 have only hundreds. Jockey contributes more to the genome size then any other non-LTR clade, 1.76% of the genome. The total non-LTR number makes up 4.8% of the Culex genome. CONCLUSIONS Using a semi-automated pipeline approach we identified 9 non-LTR clades in Culex quinquefasciatus genome. Phylogenetic analysis classifies C. quinquefasciatus non-LTR clades representatives, in the same way as semi-automated pipeline does. This supports the correctness of the semi-automated pipeline. L1, CR1, and Jockey clades have a wide variety of elements and a high copy number in the genome, which suggests the recent non-LTR activity. ACKNOWLEDGEMENTS We thank James Biedler, Vladimir Kapitonov, Scott Christley, Karine Mouline, members of Frank H. Collins and Nora J. Besansky labs and VectorBase for helpful discussions and support. Comparison of number of elements per clade within three mosquito genomes. Comparative contribution of TEs to mosquito the genome sizes Culex quinquefasciatus non-LTR: genome distribution Fig.1 Phylogenetic analysis classifies Culex quinquefasciatus non-LTR clades same way as semi-automated pipeline does. (C. quinquefasciatus non-LTRs indicated as light green leaves.) FUTURE GOALS Identify all possible protein sequences of the elements and conduct phylogenetic analysis. Identify, if possible, active non-LTRs in Culex quinquefasciatus genome. Bioinformatic detection and annotation of non-LTR retrotransposons in the Culex quinquefasciatus mosquito genome. Maria F. Unger ×, Ryan C. Kennedy *, Jenica L. Abrudan ×, Peter Arensburger ¤, Greg Madey * ×, Frank H. Collins × * × Eck Institute of Global Health, University of Notre Dame * Department of Computer Science & Engineering, University of Notre Dame ¤ Department of Entomology, University of California, Riverside