Download presentation
Presentation is loading. Please wait.
Published byHolger Nielsen Modified over 5 years ago
1
http://knb.ecoinformatics.org http://seek.ecoinformatics.org
Ecological Informatics: Challenges and Benefits Presentation to ESA Visions Committee March 31, 2003 Mark Schildhauer, Ph.D. Director of Computing, NCEAS
2
Research Team and Collaborators
PISCO LTER Network San Diego Supercomputer Center Arizona State University University of Kansas University of North Carolina OBFS Network UC NRS Sandy Andelman Chad Berkley Matthew Brooke John Harris Dan Higgins Matt Jones Jim Reichman Mark Schildhauer Jing Tao
3
What is Ecoinformatics?
Data Acquisition Integration Storage, archiving Distributed Access Results
4
Ecoinformatics The Goal: to develop technology tools and services to enable more efficient acquisition, integration, and analysis of ecological data Specific Challenges An Approach to Technology Solutions (KNB) Future Directions a Science Environment for Ecological Knowledge, SEEK
5
Status of Ecological Data
Highly dispersed Different individuals, organizations, and locations Extreme heterogeneity in Form, Content, and Meaning Lack of Documentation (metadata) Lack of metadata overall Many standards in use, many custom types Implementations are not modular
6
Data are Highly Dispersed…
Data are distributed among: Independent researcher holdings Research station collections LTER Network (24 sites) Org. of Biological Field Stations (160+ sites) Univ. Cal Natural Reserve System (36 sites) Agency databases Museum databases
7
Data are physically dispersed…
Visitors to NCEAS Field Stations in North America
8
Data are very heterogeneous…
Population survey Experimental Taxonomic survey Behavioral Meteorological Oceanographic Hydrology … Syntax (format) Schema (organization) Semantics (meaning/methods)
9
Thematic heterogeneity due to Vast Scope of Ecology
Biosphere Abiotic Biomes Communities Organisms Genes
10
Classifying Data Heterogeneity
Syntax (format) Schema (organization) Semantics (knowledge/meaning/methods) Add pictures of these things here
11
Data Lacking in Documentation
Majority of ecological data undocumented Lack information on syntax, structure and semantics of data Impossible to understand data without contacting the original researchers; even then memories can fail, individuals retire or expire Documentation conventions widely vary Requires large time investment to understand each data set
12
Summary of Technical Challenges
Because of: Data dispersion Data heterogeneity Lack of documentation Integration and synthesis are limited to a manual process --difficult to scale integration efforts up to large numbers of data sets
13
Solutions Standardized measurements Changes needed in culture, training Technology development- metadata, data servers, desktop tools
14
Ecoinformatics Research Objectives
Enhance access to ecological and environmental data Promote data sharing & re-use Enable national data discovery Provide access to research stations’ data resources Maintain local autonomy for data management Synthesis and Analysis Promote cross-cutting analysis Taxonomic, Spatial, Temporal, Conceptual integration of data Data preservation Long term data description Provide archiving capabilities
15
Functional breakdown for Analysis
Data discovery Data access Data storage/archive Data interpretation Quality assessment Data Conversion & Integration Analysis & Modeling Visualization
16
KNB Development Projects (Knowledge Network for Biocomplexity)
Ecological Metadata Language (EML) Prospective standard for ecological metadata Metacat A freely available database for storing metadata Morpho A freely available tool for creating metadata
17
KNB Overview Client Server Morpho Morpho Metacat Web Browser Web
Metadata (EML) Data Client Server Morpho Morpho Metacat Web Browser Web Browser Metacat
18
KNB Development Projects
Ecological Metadata Language (EML) Metacat Morpho
19
Why the big buzz about Metadata
Metadata are the basis for the next generation of the Web: “The Semantic Web is a web of data, in some ways like a global database… The driver for the Semantic Web is …metadata” --Tim Berners-Lee, father of the Web Digital Library Community– “Era of Metadata ?” – Carol Mandel, Digital Librarian
20
Central Role of Metadata
What are metadata? Data documentation Ownership, attribution, structure, contents, methods, quality, etc. Critical for addressing data heterogeneity issues Critical for developing extensible systems Critical for long-term data preservation Allows advanced services to be built
21
Data – just numbers A brief example may serve to illustrate the point. Here, data, consisting of rows and columns of numbers, have little or no information content. On the next slide,
22
Data + Metadata =numbers + context
Date Temp (C) Precip. (mm) Obs. # Obs. # Obs. # A minimal amount of metadata adds some information content to the data. However, unless you were the originator of this particular data set, you would not know where the data were collected, nor would you be able to effectively use or interpret the data.
23
Data Integration synthesis
B C
24
Rules of Thumb (Michener 2000)
the more comprehensive the metadata, the greater the longevity (and value) of the data structured metadata can greatly facilitate data discovery, encourage “best metadata practices” and support data and metadata use by others metadata implementation takes time!!! start implementing metadata for new data collection efforts and then prioritize “legacy” and ongoing data sets that are of greatest benefit to the broadest user community There are at least four rules of thumb that may prove useful for implementing metadata: (1) the more comprehensive the metadata, the greater the longevity (and value) of the data. Nevertheless, bear in mind the caveat that the goal of 100% complete metadata that can meet the needs for all conceivable uses of a data set is probably unrealistic and, ultimately, unattainable. (2) structured metadata can greatly facilitate data discovery, encourage “best metadata practices” and support data and metadata use by others. For example, the checklist nature of metadata entry programs (e.g., MORPHO) greatly facilitates metadata authoring. (3) metadata implementation takes time!!! Build time into the project for metadata authoring by all contributors. Much of the metadata can be directly used in later project reports and methods sections of scientific papers. (4) start implementing metadata for new data collection efforts and then prioritize “legacy” and ongoing data sets that are of greatest benefit to the broadest user community. The idea is to start with a data set that is fresh in mind and, presumably, easier to document.
25
EML 2.0 a formal ecological metadata specification
eml-resource -- Basic resource info eml-dataset -- Data set info eml-literature -- Citation info eml-software -- Software info eml-party -- People and Organizations eml-entity -- Data entity (table) info eml-attribute -- Attribute (variable) info eml-constraint -- Integrity constraints eml-physical -- Physical format info eml-access -- Access control eml-distribution -- Distribution info eml-project -- Research project info eml-coverage -- Geographic, temporal and taxonomic coverage eml-protocol -- Methods and QA/QC
26
KNB Development Projects
Ecological Metadata Language (EML) Metacat Morpho
27
Metacat – metadata storage
Metadata storage, search, presentation Schema independent – supports arbitrary XML types Multiple metadata standards Ecological Metadata Language NBII Biological Data Profile Data storage + preservation Replication Flexible access control system National distributed directory service Strong version control Configurable web interface (XSLT)
28
Metacat network Key SEV NRS OBFS Metacat AND SEV Metacat NCEAS Metacat
CAP LTER Metacat Key Metacat Catalog Morpho clients Web clients SDSC Metacat Site metadata system XML output filter
29
Web interface Change this to screen shots of the KNB web interface
30
KNB Development Projects
Ecological Metadata Language (EML) Metacat Morpho
31
Morpho – Window to the KNB
Jones
32
Morpho Features Guided Metadata creation
Wizards & editor Automatically extract metadata during data import Search all metadata – structured + free text Contribute to KNB Windows, Mac, Linux Multiple metadata standards EML NBII Biological Data Profile Extensible Standalone (non-networked) mode
33
Objectives of the KNB & SEEK
National network for ecological data Data discovery Data access Data interpretation Enable advanced services Quality management Data integration thru advanced queries Visualization and analysis
34
Solutions KNB Ecological Metadata Language (EML) Metacat -- flexible metadata database Morpho -- data management for ecologists SEEK (partners include NCEAS, KU, SDSC, LTER Netw Offc, CAP, Napier Univ., UVM, UNC) Unified Portal to Ecological Data (ECOGRID) Quality Assurance engine Semantic Query Processor Data integration and Analytical Pipelines
35
SEEK – addressing semantic integration
Ontologies EcoGrid One-stop access to ecological and environmental data Semantic Mediation Data integration using logic-based reasoning Science Environment for Ecological Knowledge Analysis and Modeling Pipelines Analysis workflows using semantic mediation
36
Quality Assessment Integrity constraint checking Data type checking
Metadata completeness Data entry errors Outlier detection Check assertions about data e.g., trees don’t shrink e.g., sea urchins do
37
Semantic metadata Describes the relationship between measurements and ecologically relevant concepts Drawn from a controlled vocabulary Ontology for ecological measurements
38
Representing ontologies
OWL –Web Ontology Language CKML – Conceptual Knowledge Markup Language RDF – Resource Description Framework
39
Ecological Ontologies
40
Semantic Data Discovery
Knowledge of SQL or database languages is a barrier to data access and re-use SELECT dsname FROM dslist WHERE meas_type LIKE ‘pop_den’ AND location = ‘GBNPP’ AND common_name = ‘barnacles’; Semantic Queries: allow scientists to express data queries in familiar scientific terms What data sets contain population density estimates for barnacles in Glacier Bay National Park and Preserve? Functionality enabled through semantic metadata
41
Data Integration + + Integrated Data Set Semantic Researcher Data
Metadata Researcher Decisions + + + Integrated Data Set
42
Re-using data from the KNB
Goal – support visualization & analysis Scalability-- Efficiently process more data from investigators Broader Spatial extent, longer temporal extent, robust taxonomic extent Analytical Pipelines (Monarch prototype) Flexible tool for exploratory analysis of data Directly process data in the network Utilize powerful analytical environments (SAS, Matlab, R, …) Analysis audit trail Reproduce analyses Communicate about analyses Automate new analyses based on earlier ones
43
Analysis Pipelines Runtime Data Binding Analysis Step Inputs Outputs
Description And Code Analysis Step Inputs Outputs Description And Code Analysis Step Inputs Outputs Description And Code Analysis Step Inputs Outputs Description And Code Analysis Step Inputs Outputs Description And Code Runtime Data Binding Analysis Step Inputs Outputs Description And Code Analysis Step Inputs Outputs Description And Code Analysis Step Inputs Outputs Description And Code
44
Scaling Analysis and Modeling
45
Data Acquisition (Jalama prototype)
Application to assist in data collection Capture relevant metadata (e.g., EML) during initial data collection Encourage good informatics practice via automating design of field data forms Integration with Metadata and Data storage frameworks (e.g., Metacat)
46
Ecoinformatics Solutions!
Integration: MORPHO Data Acquisition: JALAMA Storage, archiving: ECOGRID Distributed Access: METACAT Analysis & Viz: MONARCH
47
Fin
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.