Presentation is loading. Please wait.

Presentation is loading. Please wait.

USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame.

Similar presentations


Presentation on theme: "USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame."— Presentation transcript:

1 USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

2 3 2 1 USGS Bioinformatics Activities Potential areas of collaboration Questions Topics for Discussion

3 Tools Protocols Standards Collecting Bioinformatics USGS NBII – addressing bioinformatics challenges through collaboration, content development, technology, and creating long-term infrastructure Cross- referencing Relationship of data Linking DBMS Central & Distributed Security Backups Archival Standards Storage Structure Governance Standards Policies Organization Multi-levels Difficult Mashups Standards Integration Tools Standards Usability Training Non-biased AnalysisSynthesis Tools Governance Infrastructure User analysis Delivery Tools Protocols Standards Applications for Fusion Blending Related Integration Analysis Models Research Decision Making Policies Education Outreach Sustainable Reliable Outreach Training

4 Biological Spatial Infrastructure NBII  Over 72,000 records  Based on FGDC BDP  Training Program  QA/QC Program  Standards Cross-walks  EML  Dublin Core  Establishing Administrative Tools  Expanding internationally  Embedding in-line visualization

5 World Data Center for Biodiversity & Ecology World Data System created through the International Council of Scientific Unions (ICSU) in 1957 Currently 50 World Data Centers (WDC) in place internationally USGS National Biological Information Infrastructure (NBII) network designated as the WDC for Biodiversity & Ecology in 2002

6 WDC Current Activities Renewable Energy Project Prequalification Demonstration project –Goal: support rapid prequalification of sites across the nation that are potentially suitable for renewable energy (with an initial focus on federal lands). Data sets include, but are not limited to: Land Cover (GAP), Protected areas/Stewardship (GAP), Species Distributions/Habitat Affinities (GAP), Species Occurrences (US-GBIF Mirror Site and NBII), Integrated Taxonomic Information System (ITIS) Topography (USGS), Landforms (USGS/GAM), Soil Moisture (USGS/GAM), Ecosystems (USGS/GAM), Renewable Energy Potential (i.e., wind, solar, geothermal, and biofuels; NREL), and Infrastructure (i.e., power grid, projected smart grid, and roads; NREL and USGS). Protected areas – working with WDPA, USGS GAP Sponsoring WDC for Biodiversity & Human Health –South Africa is hosting –Providing workshops, training, demonstration projects –Evaluating how to leverage ILTER activities

7 Multilingual IABIN Catalog Ability to search by: IABIN TN Map interface Resource Type Language Taxonomy Multi-lingual thesaurus Thesaurus web-services English Spanish Portuguese

8 NBII Search Unique Facets Dynamic biological clusters Refine Results Biological images Map Display

9 Additional Unique Facets Thesaurus integration Publisher refinement Diverse Sources DBMS Websites Federation Documents Weighting of sources

10 Integrated Taxonomic Information System Multi-agency partnership Primarily North America Taxa Used Globally Web-services released Summer 2009 Taxonomic Workbench 2010

11 NBII Species Mashups Designed for –One-stop-shop for species information in SE –Integrate diverse sources Content Type UI Presentation

12 USGS Data Integration 3 Major Goals: 1.Establishing corporate data available via ESRI services 2.Improving access to Modeling data, including Water quality, stream, etc. 3.Providing easy to use “data upload”, “registry”, and “discovery tools”

13 North American EOL Multi-agency partnership designed to develop a prototype for “species” information” within the Great Lakes and Chesapeake Bay regions

14 NSF DataNet Grant Background NSF solicitation to establish –Long-term archives for science data –Develop sustainable business model to support these activities –Involve multi-disciplinary domains –Develop various R&D needed to support effort –Provide ongoing “operational” support Funded 2: DataONE The Data Conservancy

15 DataONE Areas of emphasis Data loss: preserving all the work that has been done; by preserving at-risk (orphaned) biological ecological environmental data from individual scientists Data dispersion: finding the needle in the haystack; by facilitating discovery and access of data through a single easy- to-use portal Data deluge: navigating the flood of increasingly heterogeneous data; by providing a toolbox that empowers scientists and organizations to more easily and effectively manage, analyze, and synthesize data Data Practice: using the best tools to do the job; by creating an informatics-literate workforce through innovative outreach and training efforts (e.g., best-practice videos, podcasts, on-line certificate programs, downloadable best practice guides and exemplars of data management plans)

16 16 DataONE Technology Directions DataONE will enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it by: –making the scientist an active member of the data preservation process, –creating cyberinfrastructure that supports the full data life cycle, –promulgating cultural changes that value data stewardship and data sharing, –broadly promoting best practices –engaging citizens in science –domain-agnostic Solutions

17 17 Partnering organizations Libraries & digital libraries Academic institutions Research networks NSF- and government- funded synthesis & supercomputer centers/networks Governmental organizations International organizations Data and metadata archives Professional societies NGOs Commercial sector

18 Why is this relevant to Ecoinformatics  Share similar Cyber infrastructure needs  Architecture  Portals  Distributed approaches  Replication  Secure, controlled access  Authentication methods  Tools deployed, and supported  Data discovery & interoperability methods  Standards developed, deployed  Life Cycle Data Management tools (i.e Investigator toolkit, CI)  R&D activities in the areas of CS, IS, SS, GIS, Env., etc.  Opportunity for broad Governmental & International Participation (i.e. working groups, tool evaluations, etc.)  Complementary to several of our groups goals, projects, activities  Potential Microsoft related projects (i.e. MS Excel)

19 Potential areas of collaboration NBII Metadata Expansion Incorporation of additional species data into NA EOL, NBII Species Mashups, etc USGS Data Integration activities NSF DataONE Grant Potential Microsoft tools

20 Questions & Comments Mike Frame mike_frame@usgs.gov 865 576-3605 Gladys Cotter Gladys_cotter@usgs.gov 703 648-4182

21 Technical Architecture & Discussions DataONE: Enabling Data-Intensive Biological and Environmental Research

22 22 Existing biological data archives ESA’s Ecological Archive Long Term Ecological Research Network Fire Research & Management Exchange System National Biological Information Infrastructure Distributed Active Archive Center Knowledge Network for Biocomplexity

23 23 Example data holdings Data ArchiveTypes of Data Managed Metadata Standard(s) Biodiversity, taxonomic, ecologicalBDP, DwC, DC, OGIS Biogeochemical dynamics, terrestrial ecological Earth observation imagery DIF, BDP, ECHO Ecological, biodiversity, biophysical, social, genomics, and taxonomic EML Avian populations and molecular biologyDwC Biological and taxonomicDC subset Biophysical, biodiversity, disturbance, and Earth observation imagery EML Biodiversity, biotic structure, function/process, biogeochemical, climate, and hydrologic EML Metadata Interoperability Across Data Holdings EML=Ecological Metadata Language BDP=Biological Data ProfileDwC=Darwin Core DC=Dublin Core ECHO=EOS ClearingHOuse OGIS=OpenGIS DC subset=Dublin Core subset DIF=Directory Interchange Format

24 Distributed framework Member Nodes diverse institutions serve local community provide resources for managing their data Coordinating Nodes retain complete metadata catalog subset of all data perform basic indexing provide network-wide services ensure data availability (preservation) provide replication services Flexible, scalable, sustainable network

25 Supporting the data lifecycle UCSB Node UNM Node ORC Node 1.Deposition/acquisition/ingest 2.Curation and metadata management 3.Protection, including privacy 4.Discovery, access, use, and dissemination 5.Interoperability, standards, and integration 6.Evaluation, analysis, and visualization The data lifecycle }

26 Use Cases, Architecture Planning http://mule1.dataone.org/ArchitectureDocs/index.html

27 Changing science culture 1.Education and training 2.Engaging citizens in science 3.Building global communities of practice

28 Career Long Learning: best practice guides exemplary data management plans podcasts, web-casts workshops and seminars downloadable curricula Education and training Best Practice Guide How to Cite Your Data 6 in a series Best Practice Guide Using Metadata for e-research 5 in a series Gold Star Data Management Plan Here’s How Best Practice Guide How to Cite Your Data 6 in a series

29 www.CitizenScience.org Engaging citizens in science

30 Building global long-lived communities of practice: Broad, active community engagement –Involvement of library and science educators engaging new generations of students in best practices –Existing outreach and education programs Transparent, participatory governance Adoption/creation of innovative and sustainable business and organizational models

31 Engagement Working Groups External Advisory Committee DIUG Infrastructure and Research Working Groups Director Development & Operations Principal Investigator R&D Operations Coordinating Nodes Member Nodes Sociocultural barriers to data sharing and preservation Long-term sustainability and governance Community engagement and education Citizen science and public outreach Usability and assessment Data integration and semantics Data preservation, metadata, and interoperability Distributed storage Federated security Scientific workflows Usability and assessment Director Community Engagement & Outreach Education and Outreach Team Operations Core CI Team R&D Executive Director Exploration, Visualization, Analysis DataNet Partners NSF Leadership Team DataONE Office

32 Why is this relevant to Ecoinformatics  Share similar Cyber infrastructure needs  Architecture  Portals  Distributed approaches  Replication  Secure, controlled access  Authentication methods  Tools deployed, and supported  Data discovery & interoperability methods  Standards developed, deployed  Life Cycle Data Management tools (i.e Investigator toolkit, CI)  R&D activities in the areas of CS, IS, SS, GIS, Env., etc.  Opportunity for broad Governmental & International Participation (i.e. working groups, tool evaluations, etc.)  Complementary to several of our groups goals, projects, activities  Potential Microsoft related projects (i.e. MS Excel)

33 33 Thanks! Management Team: Suzie Allard – UT John Cobb – ORNL Bob Cook - ORNL Patricia Cruse – CDL Mike Frame - USGS Stephanie Hampton - UCSB Viv Hutchison - USGS Matt Jones - UCSB Steve Kelling - Cornell Kathleen Smith – UNC Carol Tenopir – UT Bruce Wilson - Joint ORNL – UT Projects and Funding Sources: DataONE Partners Virtual Data Center – InterOP Kepler-CORE Team SEEK & KNB Teams Leadership Team: Bill Michener – UNM, PI Suzie Allard – UT John Cobb – ORNL Bob Cook – ORNL Patricia Cruse – CDL Mike Frame – USGS Stephanie Hampton – UCSB Viv Hutchison – USGS Matt Jones – UCSB Steve Kelling – Cornell Kathleen Smith - Duke Carol Tenopir – UT Dave Vieglais – KU, DataONE Bruce Wilson – Joint ORNL – UT


Download ppt "USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame."

Similar presentations


Ads by Google