Issues in Managing and Disseminating Changing Information in Biology Sue Rhee Carnegie Institution Department of Plant Biology.

Slides:



Advertisements
Similar presentations
Model Organism Databases and Community Annotation
Advertisements

50 Years of Experience in Making Grey Literature Available Matching the Expectations of the Particle Physics Community Carmen ODell.
© 2008 EBSCO Information Services SUSHI, COUNTER and ERM Systems An Update on Usage Standards Ressources électroniques dans les bibliothèques électroniques.
Annotation of Gene Function …and how thats useful to you.
The Arabidopsis Information Resource (TAIR)
Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Features and Uses of a Multilingual Full-Text Electronic Theses and Dissertations (ETDs) System Yin Zhang Kent State University Kyiho Lee, Bumjong You.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
NCBI resources III: GEO and expression data analysis Yanbin Yin Fall
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
National Aeronautics and Space Administration nai.nasa.gov NASA Astrobiology Institute1 NAI Website: Statistics and Content Management Marco Boldt Sr.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Moving beyond free text. Authors Scientist does research Scientist publishes research results in journal article Old Paradigm:
Sam Kalb Scholarly Communication Services Coordinator QUEEN’S.
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
World Bank, Africa Region, Africa Household Survey Databank - The World Bank - Africa.
TAIR resources for plant biology research kate dreher curator TAIR/PMN.
SCRAM Software Configuration, Release And Management Background SCRAM has been developed to enable large, geographically dispersed and autonomous groups.
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
Gene Expression Omnibus (GEO)
New data and tools at TAIR (The Arabidopsis Information Resource)
CountryData Development Improving the collation, availability and dissemination of development indicators (including the MDGs) Nairobi, 27 November 2013.
Introduction to Mendeley. What is Mendeley? Mendeley is a reference manager allowing you to manage, read, share, annotate and cite your research papers...
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
TAIR Workshop Model Organism Databases and Community Annotation Plant and Animal Genome XVI Conference, San Diego January 13, 2008.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Scratchpads The virtual research environment for biodiversity data Simon Rycroft, Dave Roberts, Vince Smith, Alice Heaton, Katherine Bouton, Laurence Livermore,
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
Community Interactions: Feedback, Support and Curation Eva Huala The Arabidopsis Information Resource (TAIR)
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
Copyright OpenHelix. No use or reproduction without express written consent1.
DATA MANAGEMENT AND CURATION AT TAIR
PubSearch Danny Yoo, Iris Xu, Behzad Mahini Pub* Tools Website: Literature Curaotors’ Website:
The Public Face of TAIR User Interface Design Responsiveness to User Input.
Proteomics databases for comparative studies: Transactional and Data Warehouse approaches Patricia Rodriguez-Tomé, Nicolas Pinaud, Thomas Kowall GeneProt,
Gene Expression Omnibus (GEO)
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic.
A collaborative tool for sequence annotation. Contact:
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Hussein Suleman University of Cape Town Department of Computer Science Digital Libraries Laboratory February 2008 Data Curation Repositories:
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
Development and Use of Controlled Vocabularies at the Arabidopsis Information Resource (TAIR) Sue Rhee Carnegie Institution Dept. Plant Biology
Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
2006 ICAR: TAIR workshop Organizers: Katica Ilic and Peifen Zhang Location: Reception Room, 4th floor A general overview of TAIR website and demonstration.
Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
1 st The Arabidopsis Information Resource (TAIR) Workshop for Database/Web Resource Developers (those currently developing or want to develop or interested.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
Lab Interactions and Ontologies LAB CBW Bioinformatics Workshop February 23 th 2006, Toronto Christopher Hogue Blueprint Initiative.
Witness Statement – TAIR
Ingest and Dissemination with DAITSS
Using ArrayExpress.
Department of Genetics • Stanford University School of Medicine
Functional Annotation of the Horse Genome
Capturing and Organizing Scientific Annotations
Springshare’s LibInsight: E-Journals/Databases Dataset
Presentation transcript:

Issues in Managing and Disseminating Changing Information in Biology Sue Rhee Carnegie Institution Department of Plant Biology Stanford, CA

Information Dissemination Media in Biology Journals ~150 years peer-reviewed highly referenced limited size static Public Repositories ~20 years minimum review minimum reference unlimited size static Community Databases ~5 years Curator-review Moderately referenced unlimited size dynamic

TAIR: the Arabidopsis Information Resource A Community Database about Arabidopsis Information Researchers can search, download, analyze data via commonly-used web browsers and ftp NSF funded project ( ) Collaboration between Carnegie (Stanford, CA), NCGR (Santa Fe, NM) and ABRC (Columbus, OH)

Who are the users? PeopleGroupsOrganism of Interest Total: 12,300 inviduals and 4700 labs working on plant research

Usage Statistics Monthly: ~5 million files served ~900,000 page views ~29,000 IP addresses ~30 Gb served

What do we do? 1.Capture data generated by large genome projects and individual researchers –Read and extract info from literature, establish contact with large-scale project groups 2.Curate and analyze the information –Error checking, making associations, synthesizing summary, adding quality control filters through a series of standard operation procedures and analysis pipelines 3.Make information accessible to users in intuitive form –In-house biologists and user feedback from surveys & workshops 4.Develop data query, analysis, curation, visualization tools –Collaboration between software developers and biologists, iterative process 5.Communicate with the users –Data submssion, suggestions, error and other problem reports

What is PubSearch? A web application and database for literature curation Stores complete literature information –References, abstracts, full text articles (pdf) Stores biological information –Genes, proteins, descriptions Stores ontologies (GO Terms) Links literature, GO terms and biological information. Assists manual curation with fast, automatic matching (using suffix trees indicer) Is password-protected, and easy to set up and use.

PubSesarch System Architecture

TAIR Installation Statistics (9/12/03) 20,272 literature references 14,920 research papers with abstracts 8,642 full-text papers (58%) 16,956 controlled vocabulary terms 105,671 hits between terms and articles (2359 terms) 38,010 gene names 29,841 hits between genes and articles (4268 genes) 14,943 hits validated –(70% valid, 29% not valid, 0.5% maybe) 11,497 manual annotations to 5981 genes from 2113 articles 38 relationship types for gene2term and gene2gene 103 evidence types

Pub* Tools Website:

TAIR Data Size Type of Info StoredSize in 1999 Size in 2003 WebsiteGeneral information, help, external sites 0.7 Gb25 Gb DatabaseData, external links, definition of database fields 3 Gb20 Gb FTP directoryLarge datasets generated from database or external sites N.D.13 Gb DVD ArchiveMicroarray raw data01.6 Gb

Current Issues in Community Databases 1.How to maximize connection with public repositories and journals? 2.How to ensure information is up-to-date? 3.How to cross-reference all the information in independent sites? 4.What happens after the funding?

Overlap and Interconnection Between Existing Media Journals Public Repositories Community Databases

Overlap and Interconnection Between Existing Media Journals Public Repositories Community Databases

Making Connections with Public Repositories 1.Utilizing existing standards A.LinkOut A.Data capture includes Genbank accession (e.g. seed stock containing an insertion and the insert-site sequence with Genbank accession) B.Data downloaded from Genbank using the accession using e-utilities C.Data curation/analysis generates additional associations (e.g. the insertion site used to identify the associating gene and a polymorphism for that gene) D.Sequence-associated information sent back to Genbank using the LinkOut XML format 2. Collaborating to make new standards A.Plant microarray submission standards with ArrayExpress B. MIAME standards for microarrays A.Researchers submit microarray data in prefilled Excel sheets B.Convert Excel into XML and load into TAIR database C.Data curation/analysis generates additional associations (e.g. usage of controlled vocabularies) D.Data exported into XML and sent to ArrayExpress

Making Connections with Journals 1.Publication requirement to adhere to existing standards A.Stock Accessions B.Gene symbol Registry (currently under discussion) 2.Data sharing A.Image data for gene expression B.Supplementary data (e.g. microarray results) 3.Resource sharing A. Publication through community databases?

Keeping Information Up-To-Date 1. In-house curation -pro: experience and standard operation procedures can ensure consistency -con: becoming difficult keep up as the amount and complexity of information increases 2. Community involvement -pro: expertise and sheer number of the community -con: has not worked successfully (no incentive in the current academic reward structure, not considered to be a typical role of a scientist) 3. Others?

Impact Factor of Top Journals

Impact Factor of Top Databases?

Impact of TAIR

Current Issues in Community Databases 1.How to maximize connection with public repositories and journals? 2.How to ensure information is up-to-date? 3.How to cross-reference all the information in independent sites? 4.What happens after the funding?

The End

People Involved TAIR-Carnegie Tanya Berardini Marga Garcia-Hernandez Eva Huala Suparna Mundodi Leonore Reiser Julie Tacklind Iris Xu Danny Yoo Peifen Zhang Nick Moseyko Brandon Zoekler Jessie Zhang TAIR-NCGR Dan Weems Neil Miller Mary Montoya ABRC Randy Scholl Debbie Crist Emma Knee Luz Rivero

Information Dissemination Media in Biology 1.Scientific Journals Traditional medium of knowledge dissemination Long history of publishing Recently have move to electronic publishing 3. Community Databases Information resources that are created, maintained, and improved by research community Funded by governments, not permanent. A few large databases share similar history as public repositories Recently there has been a radiation of the community databases 2. Public Repositories Permanent operations for electronic storage and dissmination of basic data Shorter history than journals, about 20 years A good example is NCBI’s Genbank

What is the infrastructure? Web browser applications TAIR DB Data object layer Application Program Interface Analysis cluster FTP Directory DVD archive Software Development, Curation, Testing, Staging Environments