New data and tools at TAIR (The Arabidopsis Information Resource)

Slides:



Advertisements
Similar presentations
Model Organism Databases and Community Annotation
Advertisements

Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us:
TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource
Annotation of Gene Function …and how thats useful to you.
TAIR: Bringing together data for the global plant biology community kate dreher curator TAIR/PMN.
Challenges in Biocuration Philippe Lamesch, PhD Carnegie Institution of Washington Stanford CA.
The Arabidopsis Information Resource (TAIR)
Arabidopsis as a model for plant development Eva Huala.
Gene Structure Annotation Philippe Lamesch International Arabidopsis conference July 23, 2008, Montreal.
Kate Dreher AraCyc, TAIR, PMN Carnegie Institution for Science
Part I: Tips and techniques from curators Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science.
RNA-Seq based discovery and reconstruction of unannotated transcripts
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for.
EBI Proteomics Services Team – Standards, Data, and Tools for Proteomics Henning Hermjakob European Bioinformatics Institute SME forum 2009 Vienna.
August 19, 2002Slide 1 Bioinformatics at Virginia Tech David Bevan (BCHM) Lenwood S. Heath (CS) Ruth Grene (PPWS) Layne Watson (CS) Chris North (CS) Naren.
Internet tools for genomic analysis: part 2
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Pathway Informatics 6 th July, 2015 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System University of.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Structure Function and Evolution of the Graham Cromar and Dr. John Parkinson Program in Molecular Structure and Function Hospital for Sick Children Toronto,
TAIR resources for plant biology research kate dreher curator TAIR/PMN.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
TAIR, PMN, SGN and Gramene workshop Focus on comparative genomics and new tools Philippe Lamesch, A. S. Karthikeyan, Aureliano Bombarely Gomez, Pankaj.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
Accessing information in plant metabolic pathway databases at the PMN, Gramene, and SGN Part I: Contents, Search Strategies, and Data Sharing Opportunities.
TAIR Workshop Model Organism Databases and Community Annotation Plant and Animal Genome XVI Conference, San Diego January 13, 2008.
Web Apollo and the VectorBase user community Gloria I. Giraldo-Calderón March 31, 2015.
Copyright OpenHelix. No use or reproduction without express written consent1.
Improving Curation Efficiency: User Contributions and Textpresso-Based Semi-Automation SAB 2008 WormBase Literature Curators Textpresso.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Copyright OpenHelix. No use or reproduction without express written consent1.
EB3233 Bioinformatics Introduction to Bioinformatics.
An overview of Bioinformatics. Cell and Central Dogma.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Building and Refining AraCyc: Data Content, Sources, and Methodologies Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science.
1 AraCyc Metabolic Pathway Annotation. 2 AraCyc – An overview  AraCyc is a metabolic pathway database for Arabidopsis thaliana;  Computational prediction.
BIOL 433 Plant Genetics Term 2, Instructors: Dr. George Haughn Dr. Ljerka Kunst BioSciences 2239BioSciences Tel
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
Pathway Informatics 30 th March, 2016 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System University.
The National Science Foundation Independent Federal Agency Support for all fields of fundamental science and engineering.
Building a community for genome and proteome annotation
CottonGen: An Up-to-Date Resource Enabling Genetics, Genomics and Breeding Research for Crop Improvement Plant and Animal Genome Conference XXV Jing Yu1,
Workshop Aims TAMU GO Workshop 17 May 2010.
Department of Genetics • Stanford University School of Medicine
Functional Annotation of the Horse Genome
Annotation: linking literature to gene products
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Genome Annotation w/ MAKER
Strategies for annotation of a genome
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
A User’s Guide to GO: Structural and Functional Annotation
Ensembl Genome Repository.
for the Cotton Community
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Schematic representation of proteogenomic annotation strategy.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Integrative omic approaches for the study of host–pathogen interactions Integrative omic approaches for the study of host–pathogen interactions (A) Proteomic.
Part II SeqViewer AraCyc Help
Dr.s Khem Ghusinga and Alan Jones
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

New data and tools at TAIR (The Arabidopsis Information Resource)

Overview of TAIR Genome release Published papers Gene function Journal collaborations Direct submission RNA-seqProteomic Corrections Other data: Markers Ecotypes Gene symbols New genomes New tools Researchers Directly (TAIR pages) AND via other databases

TAIR10 Genome Release Genome release RNA-seqProteomic Corrections No assembly updates Will incorporate: –200M Ecker and Mockler RNA-seq reads –Additional proteomics data –Individual gene structure corrections sent to us

Mapping and Assembly 1.Mapping RNA-seq sequences (Tophat (C. Trapnell), Supersplat (T.C. Mockler)) Peptides (6-frame translation, spliced exon graph) 2.Assembly approaches Augustus (M. Stanke) o Uses spliced RNA seq reads, peptides o Aim: Identify additional splice-variants, update existing genes TAU (T.C. Mockler) o Uses spliced RNA seq reads o Aim: Identify additional splice-variants Cufflinks (C. Trapnell) o Uses spliced and unspliced RNA seq data o Aim: Identify novel genes

Preliminary Results Augustus/TAU/Cufflinks predicted models are classified into categories: Novel genes 21 Updated genes 812 Splice-variants2134 B-list1586 Rejects2318

TAIR10 Genome Release Genome release RNA-seqProteomic Corrections No assembly updates Will incorporate: –200M Ecker and Mockler RNA-seq reads –Additional proteomics data –Individual gene structure corrections sent to us Release expected in August 2010

Experimentally Verified Gene Function From research articles read by TAIR curators From TAIR’s collaboration with journals From direct submissions by researchers to TAIR Published papers Gene function Journal collaborations Direct submission Where does it come from???

How? –Papers are prioritized according to novelty of gene function results –Highest priority papers are read and gene function is extracted Why? –A lot of high quality experimental gene function information is only available in the form of articles How many? –About 1/3 of all new articles containing gene function data are curated at TAIR each year Published papers Gene function Literature Curation

How? –Author instructions, Excel sheet or online form Why? –To capture a larger fraction of gene function data –Because publication is the right time to get the data into TAIR What journals? Gene function Journal collaborations Journal Collaboration

How? –Author instructions, Excel sheet or online form Why? –To capture a larger fraction of gene function data –Because publication is the right time to get the data into TAIR What journals? Gene function Journal collaborations 2010: Journal of Integrative Plant Biology Journal of Experimental Botany Plant Science Environmental Botany Plant Physiology and Biochemistry Plant, Cell and Environment Plant Physiology (2008) The Plant Journal (2009) Journal Collaboration

Direct Submission of Gene Function How? –Excel sheet or online form Why? –To capture more data with a small curation team –Because researchers are the experts on the genes they study Gene function Direct submission

New online submission form

Why Gene Ontology? Standardization allows comparison across experiments and species Hierarchical structure allows high level categorization Well structured ontology framework facilitates computational analysis Attached to data source (peer reviewed published research) Experimental evidence can be distinguished from predictions

Example Gene Ontology annotations GeneGO termEvidenceReference Phot1PhototropismMutant phenotypeHuala et al 1997 Phot1CytoplasmDirect assaySakamoto et al 2002 Phot1Serine / threonine kinase activity Direct assayChristie et al 1998 Biological process Cellular component Molecular function 3 GO flavors

New online submission form Autocomplete (just start typing to get a list of matching terms)

New online submission form

What is the result of TAIR’s effort to capture gene function? How many genes have experimental gene function in TAIR? Published papers Gene function Journal collaborations Direct submission

Number of genes 9342 genes (May ) Genes in TAIR with experimental evidence for biological process, molecular function or cellular component

Arabidopsis Gene Function in TAIR Year Genes Protein coding genes Predicted function Experimental function

Overview of TAIR Genome release Published papers Gene function Journal collaborations Direct submission RNA-seqProteomic Corrections Other data: Markers Ecotypes Gene symbols New genomes New tools Researchers Directly (TAIR pages) AND via other databases

GBrowse_syn Tool by Sheldon McKay, CSHL Alignment data from Pedro Pattyn, Van de Peer lab, U. of Ghent

GBrowse_syn A. lyrata A. thaliana poplar

NBrowse Tool by H.-L. Kao, F. Piano, M. Schuman, M. Gibson, Kris Gunsalus, NYU Interaction datasets curated by TAIR, BioGRID and IntAct

NBrowse Tool by H.-L. Kao, F. Piano, M. Schuman, M. Gibson, Kris Gunsalus, NYU Interaction datasets curated by TAIR, BioGRID and IntAct

NBrowse Tool by H.-L. Kao, F. Piano, M. Schuman, M. Gibson, Kris Gunsalus, NYU Interaction datasets curated by TAIR, BioGRID and IntAct

Genes have been loaded Working on adding some gene function information and improving searching Arabidopsis lyrata

Overview of TAIR Genome release Published papers Gene function Journal collaborations Direct submission RNA-seqProteomic Corrections Other data: Markers Ecotypes Gene symbols New genomes New tools Researchers Directly (TAIR pages) AND via other databases

Central registry for Gene Symbols

Helpdesk

RSS news feed

TAIR Facebook Page

TAIR Twitter Feed

Tanya Berardini Donghui Li Gene Function/GO: Bob Muller Larry Ploetz Chris Wilks (50%) ? David Swarbreck Philippe Lamesch Rajkumar Sasidharan Genome Annotation: TAIR Staff Tech Team: Cynthia Lee Shanker Singh

TAIR Sponsors: Funding Agencies: Host Institution: Partner: