EarthCube Transforming the Geosciences UCGIS Symposium - George Mason U: May 23, 2013 A Joint Venture of the NSF Directorate of Geosciences and Office of Cyberinfrastructure
Big Questions, Big Problems!! geohazards climate change life as a geologic agent formation & evolution of the atmosphere & oceans environmental change & resilience extreme events – causes, periodicity, & implications future world the origin of life resource discovery & abundance human-earth interactions continental evolution & changes thru time deep – surface earth Interactions & feedbacks
YOU ARE HERE! Community: Crazy, Complicated, Fascinating Present Relative State of Cyber-Sophistication and Knowledge in the Geosciences Atmospheric and Climate Science Communities Seismology/Earthquake and Physical Oceanography Communities Nearly all other Geoscience groups Geospatial/Cyberinfrastructure Communities Age of EnlightenmentIndustrial AgeModern Age Bronze Age I am here the 15% the 85%
The 85% spend about 80% of their time looking for, collecting, and getting the necessary data together in a format they can use and about 20% of their time actually thinking/doing science Read It and Weep The 15% spend an increasing amount of time having problems wrestling with unmanageably large data arrays and problems scaling from global to regional or local scales Neither are well integrated with each other and both types of data and types of geoscience disciplines are required to solve the complex, inter-related, and pressing environmental problems we and the earth are facing
Two very different levels of investment HPC, big iron, federal archives, modeling centers, data repositories, dedicated personnel and facilities Excel spreadsheets, hero code, dark data, cultural issues, no sustainability Two very different relationships with data Array-based: No personal ownership, don’t care about any given data point, computationally intensive processing and modeling Point-based: intense personal ownership, care deeply about each point, can interpret directly or simply Two very different types of data sensor, bit-stream, real-time: GB/TB size (satellite, radar, seismic) point-based, observations, images, multi informational, hard to describe The Problem (the 15% vs the 85%)
Software Analytics Modeling Communities Visualization Interoperability Multi-disciplinary & multi-scale integration The Geosciences: Diverse Communities, Data Types, Cultures, and Levels of Cyber Sophistication
Our Biggest Present Problem
Dynamic Earth Changing Climate Earth & Life Geosphere- Biospheric Connection Water: Changing Perspectives Transform the conduct of data-enabled geoscience- related research. Create effective community-driven cyberinfrastructure. Allow global data discovery and knowledge management. Achieve interoperability and data integration across disciplines. Transform the conduct of data-enabled geoscience- related research. Create effective community-driven cyberinfrastructure. Allow global data discovery and knowledge management. Achieve interoperability and data integration across disciplines. What Is EarthCube?
Atmosphere Chemistry Climate Dynamics Paleo- climate Meteor- ology Aeronomy Cyber Computer Science Geodetics Space Physics Solar Terrestial Geo- chemistry Tectonics Structure Earth Science Education Earth Science Education Polar Programs NCAR Geophysics EarthScope Tectonics Structure. Tectonics Structure. Geobiology Biological Oceano- graphy Geomorph- ology Hydrology Sediment- ology Marine Geophysics Physical Ocaeno- graphy Ocean Drilling Chemical Oceano- graphy Marine Geology Ocean Education HPC, super computing Biology Glaciology Ecosystem Geospatial Data manage- ment Software Engineering EarthCube CI Who Is EarthCube? You Are!!!
An alternative approach to respond to daunting science and CI challenges EarthCube is an outcome AND a process EarthCube will require broad community involvement; new ways of doing Path to the Vision Unidata IRIS IEDA NCAR OOI CUASHI Important Features: Builds off existing data/modeling systems/cyberinfrastructure investments Provides tools/approaches that enhance data discovery, access, and integration Addresses serious cyber needs in fields where individual data points and observations are important Leverages investments across fields Allows for more integrative and interdisciplinary science Important Features: Builds off existing data/modeling systems/cyberinfrastructure investments Provides tools/approaches that enhance data discovery, access, and integration Addresses serious cyber needs in fields where individual data points and observations are important Leverages investments across fields Allows for more integrative and interdisciplinary science
Convergence Using Spiral Development 10 Years Given: Technology improves and changes over time. Result: EarthCube being designed in a step-wise, modular fashion to accommodate change and allow refreshing over time. Given: Technology improves and changes over time. Result: EarthCube being designed in a step-wise, modular fashion to accommodate change and allow refreshing over time.
Timeline Release of umbrella Solicitation w/ 1 st Amendment Nov 2012 May 2013 GEO End-User Workshops Phase 1 FY 2014-FY 2016 (cycle repeats) Proto-Gov & EC-RCN Awards – 1 st Amend Deadline of 1 st & release of 2 nd Amendment Feb 2013 Oct Mar 2013 Jun 2013 Building Blocks & Concept Design Architecture Awards – 2 nd Amend Sept 2013 Community Meeting Release of 3 rd Amendment Nov 2013 End-User Workshops Phase 2
Feel Our Pain! help me!
Seven Modes of Failure Unrealistic or misaligned expectations among people presently involved in EarthCube “Build it and they will come” mindset – users don’t show up, data is not shared, etc. Not valuing what presently exists – current cyber/geo science efforts and initiatives that represent parts of the EarthCube vision Not advancing the frontier in transformative ways relative to what presently exists – only automating the current state Not engaging the 120,000+ geoscience and cyber stakeholders not presently involved in EarthCube Not anticipating the needs of the next generation of geoscience and cyber stakeholders (todays doctoral students and post docs, as well as the generation behind them) “Unknown Unknowns” – additional unknown unknowns including transformational changes in the technology, catastrophic shifts in the policy arena, etc.
Barriers to Progress Lack of cyber-readiness for some; and lack of unawareness of tools and approaches that could speed discovery and analysis from those other than “the usual suspects” Interoperability of disparate data types and formats; bringing dark data to light and allowing “power processing” Need for automation and smart tools to create metadata and facilitate direct lab/lab notebook data delivery to data systems in the appropriate format for ingestion Need for vastly improved handling of “big data” and ability to extract the needed information that may only be a tiny part of the whole dataset Overcoming cultural and semantic barriers between cyber/computer scientists and geoscientists to allow acceleration of development and identification of user needs Anticipating the needs of the next generation of geoscientists and questions/models focusing on more realistically simulating complex natural systems “Unknown Unknowns” including extensibility into transformational changes in the technology, catastrophic shifts in the policy, etc.
Now: Imagine a world with easy, unlimited access to scientific data from any field. Imagine a world where anyone can easily plot data of interest and display it any way they want. Imagine a world with where people can easily model their results and explore any ideas they might have. Blue-Skying the Future What science could they do? What discoveries could you help them make?