1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008
2 Roadmap Why multi-decadal research? –A brief history of LTER Data/information challenges Ecological informatics –Current state-of-the-art –Future
3 Roadmap Why multi-decadal research? –A brief history of LTER Data/information challenges Ecological informatics –Current state-of-the-art –Future
Cyberinfrastructure: Informatics Across the Biological Sciences 4 Long-Term Research is Required to Reveal: Slow processes or transients Episodic or infrequent events Decadal trends Multi-factor responses Processes with major time lags
Cyberinfrastructure: Informatics Across the Biological Sciences 5
6
7 Roadmap Why multi-decadal research? –A brief history of LTER Data/information challenges Ecological informatics –Current state-of-the-art –Future
Cyberinfrastructure: Informatics Across the Biological Sciences 8 Data Dispersion Data are massively dispersed –Ecological field stations and research centers (100s) –Natural history museums and biocollection facilities (100s) –Agency data collections (100s to 1000s) –Individual scientists (1000s to 10,000s)
Cyberinfrastructure: Informatics Across the Biological Sciences 9 Data Entropy Information Content Time Time of publication Specific details General details Accident Retirement or career change Death (Michener et al. 1997)
Cyberinfrastructure: Informatics Across the Biological Sciences 10 Data Integration Jones et al Data are heterogeneous –Syntax (format) –Schema (model) –Semantics (meaning)
Cyberinfrastructure: Informatics Across the Biological Sciences 11 Source: John Gantz, IDC Corporation: The Expanding Digital Universe Information and Storage Transient information or unfilled demand for storage Information Available Storage Petabytes Worldwide
Cyberinfrastructure: Informatics Across the Biological Sciences 12 Roadmap Why multi-decadal research? –A brief history of LTER Data/information challenges Ecological informatics –Current state-of-the-art –Future
Cyberinfrastructure: Informatics Across the Biological Sciences 13 Ecological Informatics A discipline that incorporates both concepts and practical tools for the understanding, generation, processing, preservation and propagation of ecological data, information and knowledge.
Cyberinfrastructure: Informatics Across the Biological Sciences 14 Data Archives
Cyberinfrastructure: Informatics Across the Biological Sciences 15 Existing Tools Provide Needed Functionality:
Cyberinfrastructure: Informatics Across the Biological Sciences 16 Metacat Data Distribution
Cyberinfrastructure: Informatics Across the Biological Sciences 17 Roadmap Why multi-decadal research? –A brief history of LTER Data/information challenges Ecological informatics –Current state-of-the-art –Future Science Technology Sociocultural dimension
Cyberinfrastructure: Informatics Across the Biological Sciences 18 Global Change Smith, Knapp, Collins. In press.
Cyberinfrastructure: Informatics Across the Biological Sciences 19 Critical Areas in the Earth System
Cyberinfrastructure: Informatics Across the Biological Sciences 20 Decreasing Spatial Coverage Increasing Process Knowledge Adapted from CENR-OSTP Remote sensing Intensive science sites and experiments Extensive science sites Volunteer & education networks Knowledge Pyramid
Cyberinfrastructure: Informatics Across the Biological Sciences 21 Technology Directions CI enabling the science Whole-data-life-cycle Domain-agnostic solutions
Cyberinfrastructure: Informatics Across the Biological Sciences 22 Focus on CI that Enables the Science (end-to-end solutions) Discovery, access, and use Open access to holdings (and tools)
Cyberinfrastructure: Informatics Across the Biological Sciences Support the Data Lifecycle –Reliable, replicated storage infrastructure –Interoperability across data centers
Cyberinfrastructure: Informatics Across the Biological Sciences 24 Examples of Data Holdings Data CenterTypes of Data Managed Metadata Standard(s) National Biological Information Infrastructure Biodiversity, taxonomic, ecologicalBDP, Dar, Dub, OGIS Oak Ridge National Laboratory – Distributed Active Archive Center Biogeochemical dynamics, terrestrial ecological Earth observation imagery DIF, BDP, ECHO Long Term Ecological Research Network Ecological, biodiversity, biophysical, social, genomics, and taxonomic EML Avian Knowledge NetworkAvian populations and molecular biologyDub Atlas of Living Australia (ALA)Biological and taxonomicDub subset South African Environmental Observatory Network (SAEON) Biophysical, biodiversity, disturbance, and Earth observation imagery EML Taiwan Ecological Research Network (TERN) Biodiversity, biotic structure, function/process, biogeochemical, climate, and hydrologic EML Metadata Interoperability Across Data Holdings
Cyberinfrastructure: Informatics Across the Biological Sciences Data Interoperability: Ontologies and Semantic Mediation
Cyberinfrastructure: Informatics Across the Biological Sciences 26 Earth & Space LifePhysical Engi- neering DOMAINS : top level designation for areas of study within a discipline. DISCIPLINE : major branch of knowledge or learning. SCIENCE HUMANITIES SOCIAL SCIENCE Domain-Agnostic Solutions Domain Agnostic: practice or tool that crosses domains
Cyberinfrastructure: Informatics Across the Biological Sciences 27 Kilo Nalu Workflow Streaming Data from observatory DataTurbine Server Graphs and derived data can be archived and displayed now <- Sys.time() Epoch <- now - as.numeric(now) timeval <-Epoch + timestamps posixtmedian = median(timeval) mediantime = as.numeric(posixtmedian) meantemp = mean(data) Support application scripts in R, Matlab, etc. Modular components, easily saved and shared Publish to workflow repository with accession number Documents the linkage between publication, analysis, and data
Cyberinfrastructure: Informatics Across the Biological Sciences 28 Kepler Use Cases Represent Many Science Domains Ecology –SEEK: Ecological Niche Modeling –REAP:environmental sensor networks –NEON: Ecological sensor networks Molecular biology –SDM: Gene promoter identification –ChIP-chip: genome research –CAMERA: metagenomics Oceanography –REAP: SST data processing –LOOKING: ocean observing CI –ROADNet: real-time modeling –Ocean Life project Physics –CPES: Plasma fusion simulation –FermiLab: particle physics Chemistry –Resurgence: Computational chemistry –DART (X-Ray crystallography) Library science –DIGARCH: Digital preservation –Cheshire digital library: archival Conservation biology –SanParks: Thresholds of Potential Concerns Geosciences –GEON: LiDAR data processing –GEON: Geological data integration
Cyberinfrastructure: Informatics Across the Biological Sciences 29 Workflow Sharing Portal
Cyberinfrastructure: Informatics Across the Biological Sciences 30 Sociocultural Directions Education and training Engaging citizens in science Building global communities of practice Mound built by cathedral termites
Cyberinfrastructure: Informatics Across the Biological Sciences 31 Experiential, Career-long Education and Training
Cyberinfrastructure: Informatics Across the Biological Sciences 32 Citizen Science
Cyberinfrastructure: Informatics Across the Biological Sciences 33 Building Global Science Communities of Practice via CI
Cyberinfrastructure: Informatics Across the Biological Sciences 34 …a wide range of partnering organizations Libraries & digital libraries Academic institutions Research networks NSF- and government-funded synthesis & supercomputer centers/networks Governmental organizations International organizations Data and metadata archives Professional societies NGOs Commercial sector
Cyberinfrastructure: Informatics Across the Biological Sciences 35 Longevity of CI Enterprises Broad, active community engagement –Involvement of library and science educators engaging new generations of students in best practices –Existing outreach and education programs Transparent, participatory governance Adoption/creation of sustainable business models Strong organizational sustainability
Cyberinfrastructure: Informatics Across the Biological Sciences SPECIALIZED: [FEW USERS – e.g., Econ. Dev.] Massively Parallel Systems, Specialized Codes ADVANCED: [MODERATE USE – e.g., Research] HPC clusters, Community Codes, Viz Tools BASIC: [UBIQUITOUS USE] Campus Data Networking, Video-conferencing, Data Archive (UNM, Branch Campuses, NM Institutions of Higher Learning, etc.) Help-Desk Support, Databases, Collections, Digital Archive, Collaboration Technologies, etc. Visualization & Analytical Tools HPC & Natl Networks Large cycles and High Bandwidth SPECIALIZED: Fundamental research ADVANCED: Undergraduate and Graduate Programs (in Computing, Library, and Cognitive Sciences) BASIC: University-wide informatics courses (e.g., creation of an Information Sciences Program (ISP Certificate) Research Services Cyberinfrastructure Academics Building a Computational- and Information-Literate Work Force (i.e., evolving a School of Computing, Information & Library Science) Libraries, HPC Centers, etc.- -Preserving, Protecting, Processing, and Disseminating Data, Information, and Knowledge Serving UNM, Academia, the State, and Economic Development in NM University of New Mexico Academic CI Planning for the 21 st Century
Cyberinfrastructure: Informatics Across the Biological Sciences 37 Thanks! Suzie Allard – University of Tennessee Matt Jones – University of California Santa Barbara Mike Frame – USGS, National Biological Information Infrastructure Patricia Cruse – California Digital Library Bob Cook – Oak Ridge National Laboratory DAAC Steve Kelling – Cornell Lab of Ornithology DataNetONE Partners & Kepler-CORE Team