Download presentation
Presentation is loading. Please wait.
Published byMavis Gray Modified over 9 years ago
1
Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara Scalable Information Networks for the Environment http://knb.ecoinformatics.org Funding: National Science Foundation (DEB99-80154, DBI99-04777)
2
NCEAS’ Mission Integrate existing data for broad ecological synthesis Use synthesis to inform policy and management
3
Synthesis at NCEAS Research Management Policy 200+ synthesis projects 1900+ participating scientists
4
Research projects Hunsaker – Quantification of Uncertainty in Spatial Data for Ecological Applications Ives & Frost – Intrinsic and Extrinsic Variability in Community Dynamics Osenberg -- Meta-Analysis, Interaction Strength and Effect Size; Application of Biological Models to the Synthesis of Experimental Data Murdoch – Complex Population Dynamics
5
Management projects Andelman – Designing and Assessing the Viability of Nature Reserve Systems at Regional Scales: Integration of Optimization, Heuristic and Dynamic Models Boersma & Kareiva – Prospectus For An Analysis of Recovery Plans and Delisting Kareiva – Habitat Conservation Planning for Endangered Species Lubchenco, Palumbi, & Gaines – Developing the Theory of Marine Reserves
6
Policy projects Costanza & Farber -- The Value of the World's Ecosystem Services and Natural Capital: Toward a Dynamic, Integrated Approach http://www.nceas.ucsb.edu/
7
Synthesis projects Use existing data... Distributed sources Varying protocols Varying formats Obtained via personal collaboration
8
Functional breakdown Functional breakdown for synthesis Data discovery Data access Data storage Data interpretation Quality assessment Data Conversion & Integration Analysis & Modeling Visualization
9
Presentation Outline Integration, Analysis, and Synthesis: Challenges
10
Population survey Experimental Taxonomic survey Behavioral Meteorological Oceanographic Hydrology … Data Heterogeneity Economic Social (urban ecology) Paleoecological Historical Land use Demographics
11
Types of Heterogeneity Intensional vs. Arbitrary Heterogeneity Syntax (format) CSV, Fixed ASCII, proprietary binary Schema (organization) Non-normalized models Semantics (meaning/methods) Protocol semantics (e.g., scale) Parameter semantics (e.g., bodysize (g)) Conceptual framework (e.g., experimental trts) Taxonomy + nomenclature
12
Data Dispersion Data are distributed among: Independent researcher holdings Research station collections LTER Network (24 sites) Org. of Biological Field Stations (168 sites) Univ. Cal Natural Reserve System (36 sites) MARINE (62 sites) PISCO Agency databases Museum databases Access via personal networking Not scalable
13
Lack of Metadata Majority of ecological data undocumented Lack information on syntax, schema and semantics of data Impossible to understand data without contacting the original researchers Documentation conventions widely vary Requires large time investment to understand each data set
14
Scaling Data Integration Because of: Data heterogeneity Data dispersion Lack of documentation Integration and synthesis are limited to a manual process Thus, difficult to scale integration efforts up to large numbers of data sets
15
Data Integration A B C
16
Presentation Outline Integration, Analysis, and Synthesis: Challenges Current work Knowledge Network for Biocomplexity Partnership for Biodiversity Informatics
17
Knowledge Network for Biocomplexity (KNB) National network for biocomplexity data Data discovery Data access Data interpretation Enable advanced services Data integration Analysis framework Hypothesis modeling Visualization
18
Central Role of Metadata What metadata? Ownership, attribution, structure, contents, methods, quality, etc. Critical for addressing data heterogeneity issues Critical for developing extensible systems Critical for long-term data preservation Allows advanced services to be built
19
KNB Components Ecological Metadata Language (EML) Morpho -- data management for ecologists Cross platform Java application Metacat -- flexible metadata & data system Analysis and Modeling engine Data integration engine Semantic Query Processor Hypothesis Modeling Engine
20
Ecological Metadata Language XML syntax for representing metadata Extensible – can add new metadata Modular – can subset metadata for specific applications
21
EML 2.0beta3 modules eml-resource -- Basic resource info eml-dataset -- Data set info eml-literature -- Citation info eml-software -- Software info eml-party -- People and Organizations eml-entity -- Data entity (table) info eml-attribute -- Attribute (variable) info eml-constraint -- Integrity constraints eml-physical -- Physical format info eml-access -- Access control eml-distribution -- Distribution info eml-project -- Research project info eml-coverage -- Geographic, temporal and taxonomic coverage eml-protocol -- Methods and QA/QC
23
Metacat metadata system LTER Metacat NCEAS Metacat Metacat Catalog Morpho clients Key SDSC Metacat Site metadata system AND SEV CAP OBFS Web clients XML wrapper NRS Metacat SEV Metacat
24
Metacat architecture
25
Metacat web interface
26
UC Natural Reserve System OBFS Network LTER Network
27
Functional breakdown Functional breakdown for synthesis Data discovery Data access Data storage Data interpretation Quality assessment Data Conversion & Integration Analysis & Modeling Visualization
28
Quality Assessment system Semantic Metadata + + + Researcher Decisions Data Quality Assessment Report
29
Quality Assessment Integrity constraint checking Data type checking Metadata completeness Data entry errors Outlier detection Check assertions about data e.g., trees don’t shrink e.g., sea urchins do
30
Data Integration Semantic Metadata + + + Researcher Decisions Data Integrated Data Set
31
Data Integration A B C
32
Scaling Analysis and Modeling
35
Semantic metadata Describes the relationship between measurements and ecologically relevant concepts Drawn from a controlled vocabulary Ontology for ecological measurements
36
Ecological Ontologies
37
What drives synthesis Science questions Hypotheses Analyses + Models Integrated Data Original Data
38
Conclusions Barriers to integration can be addressed using structured metadata Can accomplish a lot with ‘just’ mechanical transformations Domain ontologies + semantic mediation are paths to scaling integration Analysis drives all other phases of integration
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.