Http://knb.ecoinformatics.org http://seek.ecoinformatics.org Ecological Informatics: Challenges and Benefits Presentation to ESA Visions Committee March.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
SONet (Scientific Observations Network) and OBOE (Extensible Observation Ontology): Mark Schildhauer, Director of Computing National Center for Ecological.
Jennifer A. Dunne Santa Fe Institute Pacific Ecoinformatics & Computational Ecology Lab Rich William, Neo Martinez, et al. Challenges.
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
DEVA Data Management Workshop Devil’s Hole Pupfish Project Data Management Workshop Devil’s Hole Pupfish Program Death Valley National Park Introduction.
CASE Tools CIS 376 Bruce R. Maxim UM-Dearborn. Prerequisites to Software Tool Use Collection of useful tools that help in every step of building a product.
Center for Environmental Studies Arizona State University Digital Research Records at Center for Environmental Studies Peter McCartney.
SAN DIEGO SUPERCOMPUTER CENTER Developing a CUAHSI HIS Data Node, as part of Cyberinfrastructure for the Hydrologic Sciences David Valentine Ilya Zaslavsky.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
Research Data at NCAR 1 August, 2002 Steven Worley Scientific Computing Division Data Support Section.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Cyberinfrastructure Overview Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of.
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Directions in observational data organization: from schemas to ontologies Matthew B. Jones 1 Chad Berkley 1 Shawn Bowers 2 Joshua Madin 3 Mark Schildhauer.
Ecological Metadata Language (EML) and Morpho
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
Extensible Markup Language (XML) Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).ISO 8879 XML is a.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
The Saguaro Digital Library for Natural Asset Management Dr. Sudha RamSudha Ram Advanced Database Research Group Dept. of MIS The University of Arizona.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Ecoinformatics Workshop Summary SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM.
Introduction to Morpho BEAM Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
Using Desktop Data in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Information Management using Ecological Metadata Language Corinna Gries - CAP Margaret O’Brien - SBC.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Introduction to Morpho RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Registering your data with KNB BEAM Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Metadata ESA Workshop. In this session we will discuss…  Metadata: what are they? and why should they be created?  Metadata standards  Creating metadata.
Morpho – metadata management software SEEK Training January 2004.
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
The CUAHSI Hydrologic Information System Spatial Data Publication Platform David Tarboton, Jeff Horsburgh, David Maidment, Dan Ames, Jon Goodall, Richard.
CIS 375 Bruce R. Maxim UM-Dearborn
Building Enterprise Applications Using Visual Studio®
Data sharing and exchange: Experiences within the
Strategies for NIS Development
Network Information System Advisory Committee (NISAC)
DataNet Collaboration
Lecture 8 Database Implementation
Joseph JaJa, Mike Smorul, and Sangchul Song
Problem: Ecological data needed to address critical questions are dispersed, heterogeneous, and complex Solution: An internet-based mechanism to discover,
Understanding and Utilizing the ISP Analysis Process
Data Management: Documentation & Metadata
2. An overview of SDMX (What is SDMX? Part I)
Course: Module: Lesson # & Name Instructional Material 1 of 32 Lesson Delivery Mode: Lesson Duration: Document Name: 1. Professional Diploma in ERP Systems.
Metadata The metadata contains
School of Information Studies, Syracuse University, Syracuse, NY, USA
Reportnet 3.0 Database Feasibility Study – Approach
Metadata supported full-text search in a web archive
Palestinian Central Bureau of Statistics
Presentation transcript:

http://knb.ecoinformatics.org http://seek.ecoinformatics.org Ecological Informatics: Challenges and Benefits Presentation to ESA Visions Committee March 31, 2003 Mark Schildhauer, Ph.D. Director of Computing, NCEAS http://knb.ecoinformatics.org http://seek.ecoinformatics.org

Research Team and Collaborators PISCO LTER Network San Diego Supercomputer Center Arizona State University University of Kansas University of North Carolina OBFS Network UC NRS Sandy Andelman Chad Berkley Matthew Brooke John Harris Dan Higgins Matt Jones Jim Reichman Mark Schildhauer Jing Tao

What is Ecoinformatics? Data Acquisition Integration Storage, archiving Distributed Access Results

Ecoinformatics The Goal: to develop technology tools and services to enable more efficient acquisition, integration, and analysis of ecological data Specific Challenges An Approach to Technology Solutions (KNB) Future Directions a Science Environment for Ecological Knowledge, SEEK

Status of Ecological Data Highly dispersed Different individuals, organizations, and locations Extreme heterogeneity in Form, Content, and Meaning Lack of Documentation (metadata) Lack of metadata overall Many standards in use, many custom types Implementations are not modular

Data are Highly Dispersed… Data are distributed among: Independent researcher holdings Research station collections LTER Network (24 sites) Org. of Biological Field Stations (160+ sites) Univ. Cal Natural Reserve System (36 sites) Agency databases Museum databases

Data are physically dispersed… Visitors to NCEAS Field Stations in North America

Data are very heterogeneous… Population survey Experimental Taxonomic survey Behavioral Meteorological Oceanographic Hydrology … Syntax (format) Schema (organization) Semantics (meaning/methods)

Thematic heterogeneity due to Vast Scope of Ecology Biosphere Abiotic Biomes Communities Organisms Genes

Classifying Data Heterogeneity Syntax (format) Schema (organization) Semantics (knowledge/meaning/methods) Add pictures of these things here

Data Lacking in Documentation Majority of ecological data undocumented Lack information on syntax, structure and semantics of data Impossible to understand data without contacting the original researchers; even then memories can fail, individuals retire or expire Documentation conventions widely vary Requires large time investment to understand each data set

Summary of Technical Challenges Because of: Data dispersion Data heterogeneity Lack of documentation Integration and synthesis are limited to a manual process --difficult to scale integration efforts up to large numbers of data sets

Solutions Standardized measurements Changes needed in culture, training Technology development- metadata, data servers, desktop tools

Ecoinformatics Research Objectives Enhance access to ecological and environmental data Promote data sharing & re-use Enable national data discovery Provide access to research stations’ data resources Maintain local autonomy for data management Synthesis and Analysis Promote cross-cutting analysis Taxonomic, Spatial, Temporal, Conceptual integration of data Data preservation Long term data description Provide archiving capabilities

Functional breakdown for Analysis Data discovery Data access Data storage/archive Data interpretation Quality assessment Data Conversion & Integration Analysis & Modeling Visualization

KNB Development Projects (Knowledge Network for Biocomplexity) Ecological Metadata Language (EML) Prospective standard for ecological metadata Metacat A freely available database for storing metadata Morpho A freely available tool for creating metadata

KNB Overview Client Server Morpho Morpho Metacat Web Browser Web Metadata (EML) Data Client Server Morpho Morpho Metacat Web Browser Web Browser Metacat

KNB Development Projects Ecological Metadata Language (EML) Metacat Morpho

Why the big buzz about Metadata Metadata are the basis for the next generation of the Web: “The Semantic Web is a web of data, in some ways like a global database… The driver for the Semantic Web is …metadata” --Tim Berners-Lee, father of the Web Digital Library Community– “Era of Metadata 1998-200?” – Carol Mandel, Digital Librarian

Central Role of Metadata What are metadata? Data documentation Ownership, attribution, structure, contents, methods, quality, etc. Critical for addressing data heterogeneity issues Critical for developing extensible systems Critical for long-term data preservation Allows advanced services to be built

Data – just numbers 072998 29.5 17.0 073098 29.7 6.1 073198 29.1 0 A brief example may serve to illustrate the point. Here, data, consisting of rows and columns of numbers, have little or no information content. On the next slide,

Data + Metadata =numbers + context Date Temp (C) Precip. (mm) Obs. #1 072998 29.5 17.0 Obs. #2 073098 29.7 6.1 Obs. #3 073198 29.1 0 A minimal amount of metadata adds some information content to the data. However, unless you were the originator of this particular data set, you would not know where the data were collected, nor would you be able to effectively use or interpret the data.

Data Integration  synthesis B C

Rules of Thumb (Michener 2000) the more comprehensive the metadata, the greater the longevity (and value) of the data structured metadata can greatly facilitate data discovery, encourage “best metadata practices” and support data and metadata use by others metadata implementation takes time!!! start implementing metadata for new data collection efforts and then prioritize “legacy” and ongoing data sets that are of greatest benefit to the broadest user community There are at least four rules of thumb that may prove useful for implementing metadata: (1) the more comprehensive the metadata, the greater the longevity (and value) of the data. Nevertheless, bear in mind the caveat that the goal of 100% complete metadata that can meet the needs for all conceivable uses of a data set is probably unrealistic and, ultimately, unattainable. (2) structured metadata can greatly facilitate data discovery, encourage “best metadata practices” and support data and metadata use by others. For example, the checklist nature of metadata entry programs (e.g., MORPHO) greatly facilitates metadata authoring. (3) metadata implementation takes time!!! Build time into the project for metadata authoring by all contributors. Much of the metadata can be directly used in later project reports and methods sections of scientific papers. (4) start implementing metadata for new data collection efforts and then prioritize “legacy” and ongoing data sets that are of greatest benefit to the broadest user community. The idea is to start with a data set that is fresh in mind and, presumably, easier to document.

EML 2.0 a formal ecological metadata specification eml-resource -- Basic resource info eml-dataset -- Data set info eml-literature -- Citation info eml-software -- Software info eml-party -- People and Organizations eml-entity -- Data entity (table) info eml-attribute -- Attribute (variable) info eml-constraint -- Integrity constraints eml-physical -- Physical format info eml-access -- Access control eml-distribution -- Distribution info eml-project -- Research project info eml-coverage -- Geographic, temporal and taxonomic coverage eml-protocol -- Methods and QA/QC

KNB Development Projects Ecological Metadata Language (EML) Metacat Morpho

Metacat – metadata storage Metadata storage, search, presentation Schema independent – supports arbitrary XML types Multiple metadata standards Ecological Metadata Language NBII Biological Data Profile Data storage + preservation Replication Flexible access control system National distributed directory service Strong version control Configurable web interface (XSLT)

Metacat network Key SEV NRS OBFS Metacat AND SEV Metacat NCEAS Metacat CAP LTER Metacat Key Metacat Catalog Morpho clients Web clients SDSC Metacat Site metadata system XML output filter

Web interface Change this to screen shots of the KNB web interface

KNB Development Projects Ecological Metadata Language (EML) Metacat Morpho

Morpho – Window to the KNB Jones

Morpho Features Guided Metadata creation Wizards & editor Automatically extract metadata during data import Search all metadata – structured + free text Contribute to KNB Windows, Mac, Linux Multiple metadata standards EML NBII Biological Data Profile Extensible Standalone (non-networked) mode

Objectives of the KNB & SEEK National network for ecological data Data discovery Data access Data interpretation Enable advanced services Quality management Data integration thru advanced queries Visualization and analysis

Solutions KNB Ecological Metadata Language (EML) Metacat -- flexible metadata database Morpho -- data management for ecologists SEEK (partners include NCEAS, KU, SDSC, LTER Netw Offc, CAP, Napier Univ., UVM, UNC) Unified Portal to Ecological Data (ECOGRID) Quality Assurance engine Semantic Query Processor Data integration and Analytical Pipelines

SEEK – addressing semantic integration Ontologies EcoGrid One-stop access to ecological and environmental data Semantic Mediation Data integration using logic-based reasoning Science Environment for Ecological Knowledge Analysis and Modeling Pipelines Analysis workflows using semantic mediation

Quality Assessment Integrity constraint checking Data type checking Metadata completeness Data entry errors Outlier detection Check assertions about data e.g., trees don’t shrink e.g., sea urchins do

Semantic metadata Describes the relationship between measurements and ecologically relevant concepts Drawn from a controlled vocabulary Ontology for ecological measurements

Representing ontologies OWL –Web Ontology Language CKML – Conceptual Knowledge Markup Language RDF – Resource Description Framework

Ecological Ontologies

Semantic Data Discovery Knowledge of SQL or database languages is a barrier to data access and re-use SELECT dsname FROM dslist WHERE meas_type LIKE ‘pop_den’ AND location = ‘GBNPP’ AND common_name = ‘barnacles’; Semantic Queries: allow scientists to express data queries in familiar scientific terms What data sets contain population density estimates for barnacles in Glacier Bay National Park and Preserve? Functionality enabled through semantic metadata

Data Integration + + Integrated Data Set Semantic Researcher Data Metadata Researcher Decisions + + + Integrated Data Set

Re-using data from the KNB Goal – support visualization & analysis Scalability-- Efficiently process more data from investigators Broader Spatial extent, longer temporal extent, robust taxonomic extent Analytical Pipelines (Monarch prototype) Flexible tool for exploratory analysis of data Directly process data in the network Utilize powerful analytical environments (SAS, Matlab, R, …) Analysis audit trail Reproduce analyses Communicate about analyses Automate new analyses based on earlier ones

Analysis Pipelines Runtime Data Binding Analysis Step Inputs Outputs Description And Code Analysis Step Inputs Outputs Description And Code Analysis Step Inputs Outputs Description And Code Analysis Step Inputs Outputs Description And Code Analysis Step Inputs Outputs Description And Code Runtime Data Binding Analysis Step Inputs Outputs Description And Code Analysis Step Inputs Outputs Description And Code Analysis Step Inputs Outputs Description And Code

Scaling Analysis and Modeling

Data Acquisition (Jalama prototype) Application to assist in data collection Capture relevant metadata (e.g., EML) during initial data collection Encourage good informatics practice via automating design of field data forms Integration with Metadata and Data storage frameworks (e.g., Metacat)

Ecoinformatics Solutions! Integration: MORPHO Data Acquisition: JALAMA Storage, archiving: ECOGRID Distributed Access: METACAT Analysis & Viz: MONARCH

Fin http://knb.ecoinformatics.org