Download presentation
Presentation is loading. Please wait.
Published byPaulina Simmons Modified over 9 years ago
1
Informatics and Computational Challenges for Satellite Monitoring of Global Biodiversity Mark SchildhauerRyan Pavlick NCEAS, UCSBNASA/JPL NASA/NCEAS Workshop, Dec 10, 2014
2
Analytical challenges… Ecology and Biodiversity Sciences: inherently multi-disciplinary: bio + earth critical, societally-relevant environmental questions typically not “local”– at regional if not global scale analyses become far more robust and efficient with faster access to wider range and larger volumes of DATA 2 From: Halpern et al. A Global Map of Human Impact on Marine Science Ecosystems, Science 15 February 2008: DOI: 10.1126/science.1149345
3
Good news– more and more data There is a growing deluge of environmental data to assist in these investigations …
4
4 Fundamental Problem: Big Data Ecological/biodiversity data are Big Data: –globally distributed, voluminous –highly heterogeneous in structure and content –rapidly growing! i.e., the 3 “V”s: Volume Variety Velocity
5
Informatics challenges –Discovering and integrating data across scales micro (and nano) to global –aligning heterogeneous schema and themes: land-use/land-cover, geology, soils, atmosphere, hydrology, oceanography genes to ecosystems human sciences: culture &traditions, demographics, economics, governance –dealing with volume: TB PB ++ satellite images sensors (aerial and ground-based) observational data access and storage even GB’s are problems at Desktop!!! Documenting effects of climate change on forest composition Large amounts of relevant data… –E.g., over 25,000 data sets are available in the Knowledge Network for Biocomplexity repository (KNB– http://knb.ecoinormatic.org)http://knb.ecoinormatic.org 5
6
Environmental Data– the status quo –Distributed: stewarded by many groups, individuals –Under-documented: sparsely and inconsistently documented; jargon and acronyms; critical details about data natural language (journals, white papers –Inaccessible: varying degrees and mechanisms of presentation via FTP, Web, etc. –Heterogeneous: broad range of relevant topics (semantics), lots of different data formats (structure), data access protocols (syntax), data models, etc.
7
Data collected by thousands of trained field scientists providing invaluable on-the-ground, in situ information: –fine-grain detail on biodiversity –but highly idiosyncratic approaches with methods, naming of measurements Also, there is the “long tail of dark data”* in ecology/biodiversity sciences… * Heidorn, P Bryan. 2008 DOI: 10.1353/lib.0.0036
8
Ground-truthing & Observational Data AGGREGATORS are KEY: –Plant Occurrences and Vegetation Plots BIEN, Turboveg, sPlot, CTFS, GBIF, Natural History Museum Collections, Map of Life –Plant Functional Traits (PFT) TRY, BIEN 8
9
Ground-truthing & Observational Data AGGREGATORS are KEY: –Sensor data NEON – Genomic data iPlant –Remote sensing and global climate data NASA DAAC’s, IPCC... others... and MANY independent, “dark-tail”, in situ data sets 9
10
Several Existing Resources eScience 201010
11
Geospatial Data Need better discovery, access to, and integration of remote-sensing data with ground- truthing and observational (in situ) data!! 11
12
Informatics Challenges –Preservation –Discovery/Integration –Attribution 12
13
for Preservation: ARCHIVES Archives should be permanent, reliable, powerful, comprehensive, AND useful (“usable”) NSF DataNet program: data stewardship and interoperability; exploring models for sustainability federating major earth science data archives distributed framework (shared responsibility) API (new groups can participate, and are welcome!) Data, metadata, ontologies 13
14
for Discovery, Integration: SHARED KNOWLEDGE MODELS Consistency and rigor in terminology Standardized protocols, methods when possible Semantics approaches Ontologies for terminologies Ontologies to describe data schemas Machine-assisted discovery, reasoning, integration 14
15
Environmental Data– the status quo –Under-documented: sparsely and inconsistently documented; jargon and acronyms; critical details about data natural language (journals, white papers) measurements: MAT, MAR, LL, LMA, LNA, PET, PLNTHT, VPD, VSWIR techniques: SMLR, PSLR projects and models: LOPEX, ACCP, PROSPECT instruments: AVIRIS, CASI, HyspIRI
16
Advance consistency and rigor in terminology and data descriptions Standardize protocols, methods when possible Development Tasks: Ontologies for domains Ontologies to describe data schemas Mechanisms to bind data with Knowledge Models Machine-assisted discovery, reasoning, integration Observational data model as foundational template 16 Semantics approaches to support machine- processing of data
17
Metadata-based Data Integration Metadata standards are step in right direction… Expose data in standard schema for transfer –Dublin Core –ISO 19115 (geospatial metadata) and OGC –Darwin Core (biodiversity specimen metadata) –EML (Ecological Metadata Language) –GeoSciML –Can map one format to another to resolve minor differences (but this gets arduous) –And these still allow for terminological inconsistencies, and don’t support well hierarchy, synonymy, complex relationships
18
Semantic Data Integration W3C Semantic standards-- RDF, OWL provide greater expressivity, formalization, enhanced search, reasoning –Class/subclass subsumption –Axioms/properties: reflexivity, transitivity, domain/rangy
19
Simple Darwin Core (2013)-- dwc:Occurrence dwc:Eventdcterms:Location detected_during to_taxon happened_at dwc:Identification dwc:Taxon basis_for dwc:MaterialSample documented_by derived_from basis_for Can formalize in RDF: leads to greater clarity of how concepts related; Conversion to triple format can enable basic graph traversal
20
RDFS-based inferencing ENVO:Tropical Broadleaf Forest Biome…! ** BENEFITS: Enhanced searching along subsumption hierarchies (classes or properties) Formalized descriptions
21
Observational Data Model Implemented as an OWL-DL ontology –Provides basic concepts for describing observations –Specific “extension points” for domain-specific terms 21 Entity Characteristic Observation Measurement Protocol Standard + precision : decimal + method : anyType 1..1 * * * * 0..1 1..1 * * Value 1..1 * * Context ObservedEntity
22
Semantic annotation 22 Attribute mappings
23
23 Open Open Science “Scientists should communicate the data they collect and the models they create, to allow free and open access, and in ways that are intelligible, assessable and usable for other specialists in the same or linked fields wherever they are in the world. Where data justify it, scientists should make them available in an appropriate data repository.” “Science as an open enterprise”, The Royal Society Science Policy Centre report 02/12
24
Why Open Science now? Technology is available to do it (Internet + Web + Semantics + FLOPS) Growing politicization of science: need for transparency Importance of large- scale/interdisciplinary science Efficiencies in re-using or sharing available data, code A return to fundamental premise of science: objective, repeatable, transparent, general 24
25
Open Science: Open Data: repositories (e.g. NASA DAAC’s, NSF’s DataONE) Open Source: code and algorithms (e.g. Python, R) Open Access: journals (e.g PLoS) Open Notebook: blog++ (e.g. iPython) 25
26
Open Data Rapid, highly affordable access to ALL the data supporting scientific findings 26
27
Open Source Easy, fast, low-to-no (cost) barriers to languages, code, libraries/packages, algorithms, and frameworks for accomplishing analyses Multi-platform, scalable 27
28
Open Access (OA) Rapid, highly affordable access to the latest scientific findings Issues: Peer-review process Copyright (IP issues) Costs 28
29
Researchers still struggling to … –Discover relevant datasets –Access and integrate these …its getting more difficult as volume, diversity and complexity of data increase and Data Quality is always a concern!!! 29 The (sad) status quo
30
Steps towards global-scale, Open Biodiversity Science? Aggregators (coordinated, international): service providers who assemble and harmonize distributed data for the scientific community Better services and interfaces: requires standardization of metadata and semantics overcome limitations of desktop tools/frameworks all FOSS Cross-scale, cross data-type, integration genomic, organismal, observational/ecological, sensors, remote-sensing Must train researchers in the use of these new tools, data types, and frameworks!! 30
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.