Abstract Although transposable elements (TEs) were discovered over 50 years ago, the robust discovery of them in newly sequenced genomes remains a difficult.

Slides:



Advertisements
Similar presentations
Recombinant DNA Technology
Advertisements

Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
Differential insertion of transposable elements in Anopheles gambiae M & S genomes Jenica L. Abrudan, Ryan C. Kennedy, Maria F. Unger, Michael R. Olson,
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Bioinformatics and the Engineering Library ASEE 2008 Amy Stout.
BRC6 28 th October 2008 Collective annotation of the Ixodes scapularis genome: VectorBase, MSCs and the tick community. Daniel Lawson, VectorBase.
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Specie: Anopheles gambiae PEST Genome size: 260 Mb Status: 3rd assembly and annotation NIAID funded.
Graduate Opportunities in Bioinformatics By Tristan Butterfield Alternative Career Presentation Senior Seminar,
Algorithm Animation for Bioinformatics Algorithms.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
ABSTRACT We have conducted an extensive computational analysis of the Culex quinquefasciatus genome to find and annotate a specific subfamily of the TEs:
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Influenza Research Database (IRD): A Web-based Resource for Influenza Virus Data and Analysis Victoria Hunt 1 *, R. Burke Squires 1, Jyothi Noronha 1,
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
Tae-Hyung Kim 1 Gil-Mi Ryu 1,2 InSong Koh 2 Jong Park 3 1.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
VectorBase Seth Redmond Imperial College, London
Comparative Genomics Tools in GMOD GMOD.org Dave Clements 1, Sheldon McKay 2, Ken Youns-Clark 2, Ben Faga 3, Scott Cain 4, and the GMOD Consortium 1 National.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
Melissa Armstrong – Sponsor Dr. Eck Doerry – Mentor Greg Andolshek Alex Koch Michael McCormick Department of Computer Science SolutionProblemDesign User.
BioHealthBase: The Bioinformatics Resource Center for Francisella tularensis Shubhada Godbole 1, Stephen M. Beckstrom-Sternberg 2,3, Paul S. Keim 2,3,
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
Conclusions and Future Work (301) Kamal Kumar, Valmik Desai, Li Cheng, Maxim Khitrov, Deepak Grover, Ravi Vijaya Satya,
NCBI Vector-Parasite Genomic Related Databases Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 12, 2004
GMOD: Managing Genomic Data from Emerging Model Organisms Dave Clements 1, Hilmar Lapp 1, Brian Osborne 2, Todd J. Vision 1 1 National Evolutionary Synthesis.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
PreDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Department.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
VectorBase BRC The evolving VectorBase gene build: mixing automated and manual approaches when annotating vector genomes Daniel Lawson VectorBase-EBI,
Vectorbase and Galaxy Jarek Nabrzyski On behalf of VectorBase Center for Research Computing University of Notre Dame VectorBase Bioinformatics Resource.
Managing Next Generation Sequence Data with GMOD Dave Clements 1, Scott Cain 2, Paul Hohenlohe 3, Nicholas Stiffler 3, Paul Etter 3, Eric Johnson 3, William.
2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute.
Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.
Data Integration and Management A PDB Perspective.
Developed at the Broad Institute of MIT and Harvard Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, and Mesirov JP. GenePattern 2.0. Nature Genetics 38.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Valentina Di Francesco Senior Program Officer for Bioinformatics, Structural Genomics and Systems Biology Microbial Genomics.
BLAST FOR GENOMICS BLAST FOR GENOMICS Jianxin Ma Department of Agronomy Purdue University.
Lettuce/Sunflower EST CGPDB project. Data analysis, assembly visualization and validation. Alexander Kozik, Brian Chan, Richard Michelmore. Department.
Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic.
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
Map-based Exploration of Population Biology Data in VectorBase What is VectorBase? We are a consortium of institutions that hosts the genomes of invertebrate.
Variation data in VectorBase NIH/NIAID VectorBase site visit March 2015.
VectorBase Vectorbase probe mapping. VectorBase Automatic Annotation browser Array data CHADO Manual Annotation XML vectorbase Automatic Annotation.
A collaborative tool for sequence annotation. Contact:
Bioinformatics and Computational Biology
Overview and History of VectorBase Frank Collins March 31, 2015.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Accessing and visualizing genomics data
Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania.
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
1 Aurélien Barré, 2 Pascal Sirand-Pugnet, 2 Xavier Foissac, 3 Eduardo P. C. Rocha, 1 Antoine de Daruvar and 2 Alain Blanchard 1 Centre de Bioinformatique.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
Biological Databases By: Komal Arora.
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
University of Pittsburgh
Functional Annotation of the Horse Genome
Acknowledgements and References
Ensembl Genome Repository.
Presentation transcript:

Abstract Although transposable elements (TEs) were discovered over 50 years ago, the robust discovery of them in newly sequenced genomes remains a difficult problem. Numerous types with different structural characteristics, sequence degradation, multiple insertions within existing elements, and co-option by the organism’s regulatory system are some of the issues confounding the discovery process. We have developed an automated pipeline employing a homology-based approach, complemented with de novo- and structure-based approaches, to discover and annotate TEs in invertebrate genomes. Once fully automated, our pipeline will be integrated with VectorBase, an NIAID Bioinformatics Resource Center for invertebrate vectors of human pathogens, to produce a first-pass discovery and annotation of TEs for newly sequenced genomes. Currently hosting five organisms with more on the way, VectorBase provides the Ensembl genome browser, computational tools, and other data specific to the study of invertebrate vectors. The annotation component of our pipeline includes enhancements to the Ensembl genome browser, elevating the importance of TEs by displaying genomic location, structural details, alignments with consensus TEs, and homology with other organisms. VectorBase has developed a community annotation system whereby the research community can upload annotation corrections to genes for curation and broad dissemination; we plan to extend this to TEs. We hope this will provide an invaluable resource for researchers studying the biology of TEs and their genomic impact. Ryan C. Kennedy 1,2, Maria F. Unger 1,3, Scott Christley 4, Jenica L. Abrudan 1,3, Neil F. Lobo 1,3, Greg Madey 1,2, Frank H. Collins 1,2,3 1 Eck Institute for Global Health, University of Notre Dame 2 Department of Computer Science and Engineering, University of Notre Dame 3 Department of Biological Sciences, University of Notre Dame 4 Department of Mathematics & Department of Computer Science, University of California, Irvine The VectorBase project is funded by the US National Institute of Allergy and Infectious Diseases (NIAID), contract HHSN C. TE Discovery TEs are difficult to thoroughly characterize because of their complex and varying structure (or lack thereof). Most current TE discovery techniques fall into the following categories: homology-based, structure-based, and de novo. Popular tools exist within each of these categories, yet most are not automated or easily accessible for all researchers. We have developed a semi-automated discovery pipeline that utilizes a homology- based approach and is complemented with de novo and structure-based components. Our pipeline is reliant on several well-known technologies, including BLAST, Perl (and BioPerl), and DNASTAR SeqMan II. We also require a library of representative TEs, which we obtain from Repbase, TEfam, and the literature. VectorBase VectorBase is an NIAID bioinformatics resource center that serves as a web-based facilitator to information and tools pertaining to invertebrate vectors of human pathogens. VectorBase currently houses genome information for the mosquito species Aedes aegypti, Anopheles gambiae, and Culex quinquefasciatus, as well as the body louse, Pediculus humanus, and the tick, Ixodes scapularis. Current features and capabilities include: Integrated use of the Ensembl genome browser Integrated tools, including BLAST, ClustalW, and HMMER Community annotation pipeline for genes Microarray and gene expression repository Controlled vocabularies TE Discovery Pipeline Our homology-based TE discovery pipeline can be broken down into the following steps and is also shown graphically in Figure 2: 1. BLAST representative sequences against the genome Sequences are individually blasted against the genome 2. Process results files and extract hits Hits that are within a prespecified threshold are combined and represented as a single hit Hits that do not meet a minimum length threshold are ignored Corresponding sequence from the genome is extracted, including flanking sequence 3. Assemble sequences Sequences are assembled into contigs Biologists familiar with TEs manually determine TE boundaries and consensus sequences are generated 4. BLAST results from step 3 against the genome and characterize results Generated consensus sequences are blasted against the genome to determine coverage Hits are then analyzed by scripts and TEs are annotated Community Gene Annotation Annotation is the process by which meaning is given to genomic data. Ensembl’s automatic gene annotation system is one of the better-known gene annotation systems. VectorBase currently hosts a community annotation pipeline, whereby registered users of the site can contribute annotation data for one of the hosted genomes. VectorBase can accept four types of annotation information: gene models, publications, controlled vocabulary terms, and comments. The following steps are taken by users and curators to submit gene models: 1. Users download, fill out, and upload the gene submission form through VectorBase 2. Users preview and submit the data 3. Curators can then approve the data 4. If approved, genes are integrated into the manual annotation DAS track and displayed in the genome browser, shown in Figure 1 Figure 2. Simplified visual diagram of homology-based discovery pipeline. Community TE Annotation While not yet fully implemented on VectorBase, annotation of TEs on VectorBase will follow the same general steps as genes and TEs will be shown within the genome browser. Current work has led to a means to store consensus TEs in the same Chado database schema as genes and also to provide a structural display of TEs. Current TE online repositories traditionally lack this structural display as well as the user-feedback system that VectorBase employs. Additionally, BLAST will be utilized to provide a mechanism to show coverage of TEs within a genome. Figure 1 graphically shows the information flow for TE community annotation on VectorBase. References D. Lawson, et al., VectorBase: a data resource for invertebrate vector genomics. Nucleic Acids Research, 37:D58307, Repbase. TEfam. Goal We aim to provide an automatic and easy-to-use method, integrated with VectorBase, to identify and annotate TEs in invertebrate genomes. Figure 1. Information flow diagram for TE annotation. Discovery and Annotation of Transposable Elements on VectorBase