New data and tools at TAIR (The Arabidopsis Information Resource)
Overview of TAIR Genome release Published papers Gene function Journal collaborations Direct submission RNA-seqProteomic Corrections Other data: Markers Ecotypes Gene symbols New genomes New tools Researchers Directly (TAIR pages) AND via other databases
TAIR10 Genome Release Genome release RNA-seqProteomic Corrections No assembly updates Will incorporate: –200M Ecker and Mockler RNA-seq reads –Additional proteomics data –Individual gene structure corrections sent to us
Mapping and Assembly 1.Mapping RNA-seq sequences (Tophat (C. Trapnell), Supersplat (T.C. Mockler)) Peptides (6-frame translation, spliced exon graph) 2.Assembly approaches Augustus (M. Stanke) o Uses spliced RNA seq reads, peptides o Aim: Identify additional splice-variants, update existing genes TAU (T.C. Mockler) o Uses spliced RNA seq reads o Aim: Identify additional splice-variants Cufflinks (C. Trapnell) o Uses spliced and unspliced RNA seq data o Aim: Identify novel genes
Preliminary Results Augustus/TAU/Cufflinks predicted models are classified into categories: Novel genes 21 Updated genes 812 Splice-variants2134 B-list1586 Rejects2318
TAIR10 Genome Release Genome release RNA-seqProteomic Corrections No assembly updates Will incorporate: –200M Ecker and Mockler RNA-seq reads –Additional proteomics data –Individual gene structure corrections sent to us Release expected in August 2010
Experimentally Verified Gene Function From research articles read by TAIR curators From TAIR’s collaboration with journals From direct submissions by researchers to TAIR Published papers Gene function Journal collaborations Direct submission Where does it come from???
How? –Papers are prioritized according to novelty of gene function results –Highest priority papers are read and gene function is extracted Why? –A lot of high quality experimental gene function information is only available in the form of articles How many? –About 1/3 of all new articles containing gene function data are curated at TAIR each year Published papers Gene function Literature Curation
How? –Author instructions, Excel sheet or online form Why? –To capture a larger fraction of gene function data –Because publication is the right time to get the data into TAIR What journals? Gene function Journal collaborations Journal Collaboration
How? –Author instructions, Excel sheet or online form Why? –To capture a larger fraction of gene function data –Because publication is the right time to get the data into TAIR What journals? Gene function Journal collaborations 2010: Journal of Integrative Plant Biology Journal of Experimental Botany Plant Science Environmental Botany Plant Physiology and Biochemistry Plant, Cell and Environment Plant Physiology (2008) The Plant Journal (2009) Journal Collaboration
Direct Submission of Gene Function How? –Excel sheet or online form Why? –To capture more data with a small curation team –Because researchers are the experts on the genes they study Gene function Direct submission
New online submission form
Why Gene Ontology? Standardization allows comparison across experiments and species Hierarchical structure allows high level categorization Well structured ontology framework facilitates computational analysis Attached to data source (peer reviewed published research) Experimental evidence can be distinguished from predictions
Example Gene Ontology annotations GeneGO termEvidenceReference Phot1PhototropismMutant phenotypeHuala et al 1997 Phot1CytoplasmDirect assaySakamoto et al 2002 Phot1Serine / threonine kinase activity Direct assayChristie et al 1998 Biological process Cellular component Molecular function 3 GO flavors
New online submission form Autocomplete (just start typing to get a list of matching terms)
New online submission form
What is the result of TAIR’s effort to capture gene function? How many genes have experimental gene function in TAIR? Published papers Gene function Journal collaborations Direct submission
Number of genes 9342 genes (May ) Genes in TAIR with experimental evidence for biological process, molecular function or cellular component
Arabidopsis Gene Function in TAIR Year Genes Protein coding genes Predicted function Experimental function
Overview of TAIR Genome release Published papers Gene function Journal collaborations Direct submission RNA-seqProteomic Corrections Other data: Markers Ecotypes Gene symbols New genomes New tools Researchers Directly (TAIR pages) AND via other databases
GBrowse_syn Tool by Sheldon McKay, CSHL Alignment data from Pedro Pattyn, Van de Peer lab, U. of Ghent
GBrowse_syn A. lyrata A. thaliana poplar
NBrowse Tool by H.-L. Kao, F. Piano, M. Schuman, M. Gibson, Kris Gunsalus, NYU Interaction datasets curated by TAIR, BioGRID and IntAct
NBrowse Tool by H.-L. Kao, F. Piano, M. Schuman, M. Gibson, Kris Gunsalus, NYU Interaction datasets curated by TAIR, BioGRID and IntAct
NBrowse Tool by H.-L. Kao, F. Piano, M. Schuman, M. Gibson, Kris Gunsalus, NYU Interaction datasets curated by TAIR, BioGRID and IntAct
Genes have been loaded Working on adding some gene function information and improving searching Arabidopsis lyrata
Overview of TAIR Genome release Published papers Gene function Journal collaborations Direct submission RNA-seqProteomic Corrections Other data: Markers Ecotypes Gene symbols New genomes New tools Researchers Directly (TAIR pages) AND via other databases
Central registry for Gene Symbols
Helpdesk
RSS news feed
TAIR Facebook Page
TAIR Twitter Feed
Tanya Berardini Donghui Li Gene Function/GO: Bob Muller Larry Ploetz Chris Wilks (50%) ? David Swarbreck Philippe Lamesch Rajkumar Sasidharan Genome Annotation: TAIR Staff Tech Team: Cynthia Lee Shanker Singh
TAIR Sponsors: Funding Agencies: Host Institution: Partner: