Comprehensive Microbial Resource www.tigr.org/CMR Bioinformatics Visualization Workshop Owen White May 30, 2002.

Slides:



Advertisements
Similar presentations
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Advertisements

Prokaryotic Annotation at TIGR Michelle Gwinn Giglio June, 2005.
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
On line (DNA and amino acid) Sequence Information Lecture 7.
Gene Ontology John Pinney
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Presented by ORNL–University Partnerships in Computational Biology Igor B. Jouline Joint Institute for Computational Sciences The University of Tennessee–Oak.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Genome analysis and annotation Part II. THE INSTITUTE FOR GENOMIC RESEARCH TIGRTIGR Evidence View S.mansoni PASA assemblies S. japonicum EST alignments.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Proteins and Protein Function Charles Yan Spring 2006.
Protein Modules An Introduction to Bioinformatics.
How to access genomic information using Ensembl August 2005.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
On line (DNA and amino acid) Sequence Information
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
BIOINFORMATIK I UEBUNG 2 mRNA processing.
Savita Shrivastava Feb 25 th, 2005 Lab Presentation BASys A Web Server for Automated Bacterial Annotation.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Predicting protein degradation rates Karen Page. The central dogma DNA RNA protein Transcription Translation The expression of genetic information stored.
Protein and RNA Families
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
Motif discovery and Protein Databases Tutorial 5.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Genome sequence of the dissimilatory metal ion–reducing bacterium Shewanella oneidensis Heidelberg, J. F., Paulsen, I. T., Nelson, K. E., Gaidos, E. J.,
(H)MMs in gene prediction and similarity searches.
InterPro Sandra Orchard.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Welcome to the combined BLAST and Genome Browser Tutorial.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
BUSINESS SENSITIVE 1 SAAW - Sequence Annotation and Analysis Workshop Boyu Yang and Gene Godbold Battelle Memorial Institute, Charlottesville Operations.
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
bacteria and eukaryotes
Demo: Protein Information Resource
Sequence based searches:
Genome Annotation Continued
Bioinformatics and BLAST
Ensembl Genome Repository.
What do you with a whole genome sequence?
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Comprehensive Microbial Resource Bioinformatics Visualization Workshop Owen White May 30, 2002

Curation  Genome Annotation  Michelle Gwinn  Bob Dodson  Bob DeBoy  James Kolonay  Bill Nelson  Ramana Madupu  Sean Daugherty  Maureen Beanan  Scott Durkin  Lauren Brinkac  Bioinformatics Engineers  Jeremy Peterson  Lowell Umayam  Samual Angiuoli  TIGRFAMs/Groups  Dan Haft  Jeremy Selengut  Maria Ermolaeva (Operons/Terminators)  Erik Ferlanti (All vs. All)  Faculty  Jonathan Eisen (DNA repair)  Ian Paulsen (transporters)  Steven Salzberg  Collaborators  Swiss-prot  Monica Riley  The open source crowd  Art Delcher (Glimmer)

Retrieval Heterocercal- Forked- Lunate- Emarginate- Truncate- Rounded- Pointed- Caudal Fins

Caudal FinsDorsal SpinesDorsal Rays Retrieval across data types.

Typical annotation datatypes clone_info: Tracks information related to the parent nucleotide assembly, including its annotation status, which institution the sequence was derived, and whether it is part of a larger assembly such as a chromosome. asm_feature: All major features of the parent assembly are stored here, including annotated genes, predicted genes, repetitive elements, splice sites, and all underlying components of a gene (models, transcript exons, and cds exons). phys_ev: Attribute for each gene component within the asm_feature table. For example, each predicted and annotated gene has a model and multiple exons stored in the asm_feature table. Linking the feature to phys_ev will identify the type of feature present: ie. glimmer, genscan+, genemarkHMM, or working (annotation). This becomes important if a single feature in the asm_feature table is shared by multiple model types. feat_link: This table is key to the principles behind representing gene models in the database. All parent and child relationships are defined here. evidence: The main repository for all sequence database search results. Also, it retains information regarding gene model attributes such as the best blast match and all Pfam matches. ident: Stores attributes for the highest element of the gene component hierarchy, the transcriptional unit. Gene names, loci, EC symbols, and other attributes are available. role_link: The role category assignments for each gene are available here. Roles include examples such as ‘transcription’, ‘DNA synthesis’, ‘translation’, ‘DNA repair’, ‘amino acid metabolism’, etc.

Omniome Content, Genes Total # of genes: 132,998 from world-wide effort. (43,311 TIGR projects). 36,274 w/ genetic names. 15,098 genes placed into 5,451 paralogous families. 413 rRNAs tRNAs. 49 sRNAs. 293 IS elements.

Omniome Content Evidence: 1073 distinct EC#s, assigned to genes Rows of allVall data: 3,996,851 Rows of HMM TIGRFAM data: 91,550 Rows of HMM Pfam data: 131,963 Rows of COG data: 149,940 Rows of Interpro data: 175,760 Rows of Prosite data: 53,132 Rows of BER data: 91,899

TIGRFAM Matrix

The Genome Browser: Linear Display of DNA Molecules

Genome vs. Genome Protein Hits

MUMmer: The Whole Genome Alignment Tool

Role Category Graph

Multi-Genome Query Tool Query across all genomes based on different properties MW, pI, membrane spanning regions Taxon, Paralogous families, TIGRFAMs, Role Category Best Match to: organism, locus, kingdom, etc. “Genes with >5 membrane spanning regions and MW 36,000-51,000d.” “E. coli genes with best match to Archeoglobis involved in DNA metabolism.”

Pseudo-Restriction Digest and Linear Depiction of Cuts

Position effect: