Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.

Similar presentations


Presentation on theme: "Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara."— Presentation transcript:

1 Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara Scalable Information Networks for the Environment http://knb.ecoinformatics.org Funding: National Science Foundation (DEB99-80154, DBI99-04777)

2 NCEAS’ Mission Integrate existing data for broad ecological synthesis Use synthesis to inform policy and management

3 Synthesis at NCEAS Research Management Policy 200+ synthesis projects 1900+ participating scientists

4 Research projects Hunsaker – Quantification of Uncertainty in Spatial Data for Ecological Applications Ives & Frost – Intrinsic and Extrinsic Variability in Community Dynamics Osenberg -- Meta-Analysis, Interaction Strength and Effect Size; Application of Biological Models to the Synthesis of Experimental Data Murdoch – Complex Population Dynamics

5 Management projects Andelman – Designing and Assessing the Viability of Nature Reserve Systems at Regional Scales: Integration of Optimization, Heuristic and Dynamic Models Boersma & Kareiva – Prospectus For An Analysis of Recovery Plans and Delisting Kareiva – Habitat Conservation Planning for Endangered Species Lubchenco, Palumbi, & Gaines – Developing the Theory of Marine Reserves

6 Policy projects Costanza & Farber -- The Value of the World's Ecosystem Services and Natural Capital: Toward a Dynamic, Integrated Approach http://www.nceas.ucsb.edu/

7 Synthesis projects Use existing data... Distributed sources Varying protocols Varying formats Obtained via personal collaboration

8 Functional breakdown Functional breakdown for synthesis Data discovery Data access Data storage Data interpretation Quality assessment Data Conversion & Integration Analysis & Modeling Visualization

9 Presentation Outline Integration, Analysis, and Synthesis: Challenges

10 Population survey Experimental Taxonomic survey Behavioral Meteorological Oceanographic Hydrology … Data Heterogeneity Economic Social (urban ecology) Paleoecological Historical Land use Demographics

11 Types of Heterogeneity Intensional vs. Arbitrary Heterogeneity Syntax (format) CSV, Fixed ASCII, proprietary binary Schema (organization) Non-normalized models Semantics (meaning/methods) Protocol semantics (e.g., scale) Parameter semantics (e.g., bodysize (g)) Conceptual framework (e.g., experimental trts) Taxonomy + nomenclature

12 Data Dispersion Data are distributed among: Independent researcher holdings Research station collections LTER Network (24 sites) Org. of Biological Field Stations (168 sites) Univ. Cal Natural Reserve System (36 sites) MARINE (62 sites) PISCO Agency databases Museum databases Access via personal networking Not scalable

13 Lack of Metadata Majority of ecological data undocumented Lack information on syntax, schema and semantics of data Impossible to understand data without contacting the original researchers Documentation conventions widely vary Requires large time investment to understand each data set

14 Scaling Data Integration Because of: Data heterogeneity Data dispersion Lack of documentation Integration and synthesis are limited to a manual process Thus, difficult to scale integration efforts up to large numbers of data sets

15 Data Integration A B C

16 Presentation Outline Integration, Analysis, and Synthesis: Challenges Current work Knowledge Network for Biocomplexity Partnership for Biodiversity Informatics

17 Knowledge Network for Biocomplexity (KNB) National network for biocomplexity data Data discovery Data access Data interpretation Enable advanced services Data integration Analysis framework Hypothesis modeling Visualization

18 Central Role of Metadata What metadata? Ownership, attribution, structure, contents, methods, quality, etc. Critical for addressing data heterogeneity issues Critical for developing extensible systems Critical for long-term data preservation Allows advanced services to be built

19 KNB Components Ecological Metadata Language (EML) Morpho -- data management for ecologists Cross platform Java application Metacat -- flexible metadata & data system Analysis and Modeling engine Data integration engine Semantic Query Processor Hypothesis Modeling Engine

20 Ecological Metadata Language XML syntax for representing metadata Extensible – can add new metadata Modular – can subset metadata for specific applications

21 EML 2.0beta3 modules eml-resource -- Basic resource info eml-dataset -- Data set info eml-literature -- Citation info eml-software -- Software info eml-party -- People and Organizations eml-entity -- Data entity (table) info eml-attribute -- Attribute (variable) info eml-constraint -- Integrity constraints eml-physical -- Physical format info eml-access -- Access control eml-distribution -- Distribution info eml-project -- Research project info eml-coverage -- Geographic, temporal and taxonomic coverage eml-protocol -- Methods and QA/QC

22

23 Metacat metadata system LTER Metacat NCEAS Metacat Metacat Catalog Morpho clients Key SDSC Metacat Site metadata system AND SEV CAP OBFS Web clients XML wrapper NRS Metacat SEV Metacat

24 Metacat architecture

25 Metacat web interface

26 UC Natural Reserve System OBFS Network LTER Network

27 Functional breakdown Functional breakdown for synthesis Data discovery Data access Data storage Data interpretation Quality assessment Data Conversion & Integration Analysis & Modeling Visualization

28 Quality Assessment system Semantic Metadata + + + Researcher Decisions Data Quality Assessment Report

29 Quality Assessment Integrity constraint checking Data type checking Metadata completeness Data entry errors Outlier detection Check assertions about data e.g., trees don’t shrink e.g., sea urchins do

30 Data Integration Semantic Metadata + + + Researcher Decisions Data Integrated Data Set

31 Data Integration A B C

32 Scaling Analysis and Modeling

33

34

35 Semantic metadata Describes the relationship between measurements and ecologically relevant concepts Drawn from a controlled vocabulary Ontology for ecological measurements

36 Ecological Ontologies

37 What drives synthesis Science questions Hypotheses Analyses + Models Integrated Data Original Data

38 Conclusions Barriers to integration can be addressed using structured metadata Can accomplish a lot with ‘just’ mechanical transformations Domain ontologies + semantic mediation are paths to scaling integration Analysis drives all other phases of integration


Download ppt "Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara."

Similar presentations


Ads by Google