Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.

Slides:



Advertisements
Similar presentations
EBSCO Discovery Service
Advertisements

National Institute of Statistics, Geography and Informatics (INEGI) Implementation of SDMX in Mexico.
Digital Repositories – Linked Open Data – the possible Role of D4Science Workshop, December 2010, FAO use cases A tool to create Linked Data providers.
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
2009 Mid–Term Review El Verde Field Station June 4, 2009.
DNA Barcodes: Linking GenBank records to Museum Specimens David E. Schindel, Executive Secretary, CBOL Robert Hanner, University of Guelph.
8/29/2000Database Management -- Fall R. Larson Database Management: Introduction University of California, Berkeley School of Information Management.
Introducing Symposia : “ The digital repository that thinks like a librarian”
“DOK 322 DBMS” Y.T. Database Design Hacettepe University Department of Information Management DOK 322: Database Management Systems.
8/28/97Information Organization and Retrieval Files and Databases University of California, Berkeley School of Information Management and Systems SIMS.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Software Development Unit 2 Databases What is a database? A collection of data organised in a manner that allows access, retrieval and use of that data.
Databases & Data Warehouses Chapter 3 Database Processing.
RMIS - Building a Research Management Information System at the University of Glamorgan Leanne Beevers & Neil Williams.
Status of ICT structure, infrastructure and applications existed to manage and disseminate information and knowledge of Agricultural Biotechnology Innovations.
ACCESS CHAPTER 1. OBJECTIVES Tables Queries Forms Reports Primary and Foreign Keys Relationship.
Crystal Hoyer Program Manager IIS Team Preview of features that will be announced at MIX09 Please do not blog, take pictures or video of session.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Classroom User Training June 29, 2005 Presented by:
GCE-LTER Taxonomic Database: An automated database application for displaying custom species lists on the web Wade Sheldon GCE Information Manager GCE.
U.S. Department of the Interior U.S. Geological Survey NWIS, STORET, and XML National Water Quality Monitoring Council August 20, 2003.
What’s New in VRS? GUGM May 15, 2008 Presenter: Kelly P. Robinson GIL Service Georgia State University
Support for MAGE-TAB in caArray 2.0 Overview and feedback MAGE-TAB Workshop January 24, 2008.
Introduction to OBIS-USA Biological Data, Applications, & Relationships March 14, 2011.
5-7 November 2014 DR Workflow Practical Digital Content Management from Digital Libraries & Archives Perspective.
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Let VRS Work for You! ELUNA Conference 2008 Presenter: Kelly P. Robinson GIL Service Georgia State University
The Auditor Role The auditor has the same view of the course as the student does, but no marks are recorded for auditors.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Scratchpads The virtual research environment for biodiversity data Simon Rycroft, Dave Roberts, Vince Smith, Alice Heaton, Katherine Bouton, Laurence Livermore,
Dynamic Document Sharing Detailed Profile Proposal for 2010 presented to the IT Infrastructure Technical Committee Karen Witting November 10, 2009.
ACCESS CHAPTER 4 Tables and Queries Learning Objectives: Define table structure Enter data into a table Alter table structure Set a table’s field properties.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Strategies for Adding EML Support to the GCE Data Toolbox for Matlab Wade Sheldon Georgia Coastal Ecosystems LTER (WWW: gce-lter.marsci.uga.edu/lter)
26 Mar 04 1 Application Software Practical 5/6 MS Access.
Introduction to Morpho BEAM Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
NCBI/WHO PubMed/HINARI Course NCBI Literature Databases: PubMed: MyNCBI Session #1: Sept 13, 2005 Session #2: Sept 14, 2005 Ho Chi Minh City, VietNam.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
OWL Representing Information Using the Web Ontology Language.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Introduction to Morpho RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
1 Laughing stock or... Sorting out the mess: How OECD re-published 1000 working papers properly.
Faculty Faculty Richard Fikes Edward Feigenbaum (Director) (Emeritus) (Director) (Emeritus) Knowledge Systems Laboratory Stanford University “In the knowledge.
Microsoft FrontPage 2003 Illustrated Complete Integrating a Database with a Web Site.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
1 MS Access. 2 Database – collection of related data Relational Database Management System (RDBMS) – software that uses related data stored in different.
Copyright OpenHelix. No use or reproduction without express written consent1.
Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu Advisor: Prof. Geoffrey C. Fox 1/14/2009.
GEM METADATA DEVELOPMENT Xiaoping Wang, Macrosearch Allen Macklin, PMEL and Bernard Megrey, AFSC.
Using RSNA’s Teaching File Software (MIRC): A Hands on Course Mary Wyers, MD.
SAGExplore web server tutorial. The SAGExplore server has three different modules …
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Collection Management Systems
Copyright OpenHelix. No use or reproduction without express written consent1.
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
Module 2: Authoring Basic Reports. Overview Creating a Basic Table Report Formatting Report Pages Calculating Values.
High throughput biology data management and data intensive computing drivers George Michaels.
IHE ITI XDStar Volume 3, Section 4 Redocumentation Debrief Gila Pyke Lead Facilitator/Cognaissance.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
National Cancer Institute Uma Mudunuri ABCC, NCI-Frederick ISRCE Monthly Meeting, Nov 9th 2010 bioDBnet The biological DataBase network.
Introduction to CMIS, an Electronic Design Change Process Presented by: Mark Gillis (FirstEnergy) and Brad Diggans (Rolls-Royce)
Data sharing and exchange: Experiences within the
Building the Literature Review
Database Design Hacettepe University
Bird of Feather Session
Reportnet 3.0 Database Feasibility Study – Approach
Presentation transcript:

Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh

Genetic Sequence Databases  Major informatics success story  Large repositories for nucleotide sequences (e.g. GenBank/EMBL/NDDJ ~16M)  Automated and web-based data submission - required as part of publication process  Standardized alignment/search tools support use for classification  Numerous ‘environmental sequences’ – ecologists now using to study biogeography, community structure, eco-physiology

Problems with GenBank  Metadata voluntary – limited in scope  Title (definition), authors, key words, comments, literature citation  Many sequences unpublished, undescribed  Quality control standards poorly enforced  No direct way to provide links to ancillary data (URLs not officially supported, often removed)  Very inefficient and often impossible for investigators to obtain ecological context information, even from journals  Comparisons of matched taxa by traits not possible

Consequence  Tremendous amount of bacterial sequence data relevant to microbial ecologists  No established interface

Example – Insufficient Metadata

Sapelo Island Microbial Observatory (  MObs – NSF-funded network of sites or "microbial observatories" established to discover novel microorganisms, microbial consortia, communities, activities and other novel properties, and to study their roles in diverse environments  Projects supported are expected to establish or participate in an established, Internet-accessible knowledge network to disseminate the information resulting from these activities  SIMO - Investigating the diversity of prokaryotes, their physiological and genetic characteristics, and their biogeochemical activities in a salt marsh/estuarine ecosystem in the southeastern U.S.  Knowledge networks:  GenBank  GCE-LTER IS  SIMO 16S rRNA Database

SIMO 16S rRNA Database  Purpose: LIMS, research tool, data dissemination  Designed to store sequence data and all supporting SIMO research information  Hierarchical structure modeled after research workflow  Metadata on site geography, sample collection, all methodology, personnel, ancillary measurements  Extensive content control, error checking  Links to information in external databases (RDP II, GenBank, GCE-LTER)  Queries by phylogenic and/or ecological characteristics

Conceptual Diagram of the SIMO Database

List-based data entry linked to metadata tables

Controlled vocabulary supports finely-targeted queries Automatic hyperlinks provide links to tasks

List-based queries also simplify public interface

Phylogenetic and ecological characteristics combined dynamically to create overview and query interface

SIMO Metadata  Metadata primarily stored in managed lists, linked to records by foreign key fields  Scalable design – details can be added independently without altering data records  Complete metadata for sequences generated by relational joins  Links to external metadata in GCE-LTER database adds site geography, research history, long-term environmental characteristics

Metadata Standards  No existing standard for environmental sequence metadata  Sequence formats (FASTA, BIOML, BSML) designed for data parsing, sequence annotation  SIMO metadata currently displayed in summary form on sequence detail pages  Exploring adopting emerging standards like EML

Sequence Details

Future Directions  Incorporating batch upload features for library submissions  Integrating database with ‘RDP SeqMatch Agent’ programs for automatic phylogenetic analysis, sequence annotation  Provide full metadata in formatted/printable and parsable ASCII formats (XML)  Participate in Entrez Link-Out to provide links to SIMO sequence entries from GenBank