1 Cyberinfrastructure Summer Institute for Geoscientists August 14-18, 2006 San Diego Supercomputer Center
2 WELCOME !
3 Acknowledgements Instructors –Prof. Ramon Arrowsmith, Arizona State –Dr. Steve Cutchin, SDSC –Efrat Jaeger, SDSC/GEON –Prof. Randy Keller, University of Oklahoma –Dr. Kai Lin, SDSC/GEON –Prof. Bertram Ludaescher, UC Davis –Dr. Charles Meertens, UNAVCO –Ashraf Memon, SDSC/GEON –Prof. Krishna Sinha, VaTech –Dr. David Valentine, SDSC –Nancy Wilkins-Diehr, SDSC
4 Acknowledgements GEON Team at SDSC –Margaret Banton –Sandeep Chandra –Ghulam Memon –Vishu Nandigam –Dogan Seber –Nancy White –Choonhan Youn Synthesis Center Staff –Linda Ferri –John Moreland
5 Acknowledgements Others at SDSC –Ilkay Altintas –Jeff Filliez –Nancy Jensen –Matt Kullberg –Emilio Valente –Peggy Wagner NSF – CSIG is funded as a supplement to GEON
6 Schedule Monday – Introduction to Cyberinfrastructure, Data Integration, and Web Services Tuesday – Web Services, GIS Wednesday – GIS, Knowledge Representation Thursday – Workflow Systems Friday – Path Forward: Integration scenarios, Synthesis Center, TeraGrid Science Gateways
7 LOGISTICS Webcasting and video archives Machine userid/password –Userid: 279user –Password: 279Class
8 INTRODUCTIONS !
9 Distributed Systems for Geoinformatics What is the need?
10 Geoinformatics Ref: David Lambert, NSF EAR/GEO Presentation at GEON Annual Meeting, 2005
11 Geoinformatics Ref: David Lambert, NSF EAR/GEO Presentation at GEON Annual Meeting, 2005
12 Role of Cyberinfrastructure in Geoinformatics Cyberinfrastructure GEON
13 What is Cyberinfrastructure? From NSF’s Cyberinfrastructure Vision for 21 st Century Discovery, July 20, 2006www.nsf.gov/od/oci/ci-v7.pdf “The comprehensive infrastructure needed to capitalize on dramatic advances in information technology has been termed cyberinfrastructure. Cyberinfrastructure integrates hardware for computing, data and networks, digitally-enabled sensors, observatories and experimental facilities, and an interoperable suite of software and middleware services and tools. Investments in interdisciplinary teams and cyberinfrastructure professionals with expertise in algorithm development, system operations, and applications development are also essential to exploit the full power of cyberinfrastructure to create, disseminate, and preserve scientific data, information, and knowledge…” pp40 of the report: “In 1999, the PITAC released the seminal report ITR-Investing in our Future, prompting new and complementary NSF investments in CI projects, such as the Grid Physics Network (GriPhyN) and international Virtual Data Grid Laboratory (iVDGL) and the Geosciences Network, known as GEON.”
14 CI-TEAM: CI Training, Education, Advancement, and Mentoring
15 Hardware Integrated Cyberinfrastructure System Source: Dr. Deborah Crawford, Chair, NSF CI Working Committee Middleware Services Development Tools & Libraries Applications Geosciences Environmental Sciences Neurosciences High Energy Physics … Domain-specific Cybertools (software) Shared Cybertools (software) Distributed Resources (computation, storage, communication, etc.) Education and Training Discovery & Innovation
16 Data, Tools, & Computation Data –Field observations –Laboratory analyses –Sensor-based data (land, airborne, satellite) Tools –QA/QC, simple transformations and analyses –Complex models Computation –Community codes –Access to high-performance computing –Data Intensive Computing
17 Variety of Geoinformatics Efforts Data collection –Digital data collection in the field –“When does it become cyberinfrastructure”? Database curation –E.g. EarthChem, Paleobiology, MorphoBank, Paleo Pollen, etc…. –When does it become “tools” and “community codes” Software Development –Tools: gravity and magnetics, paleogeography, geochemistry, seismic data products, … –Community codes: SCEC-CME, CIG, …
18 Variety of Geoinformatics Efforts High Performance Computing –LiDAR data management –Seismic analyses –Petascale initiative Data Integration –E.g. CUAHSI HIS –Also, a pressing need in projects like EarthScope
19 Cyberinfrastructure To provide access to all of these “resources” and support “interoperability” among them Cyberinfrastructure: The Common Platform Across Distributed Projects Data Collection Data Management And Curation Tool Development Modeling and Integration
20 Example: USArray Data Flow Deploy field sensor arrays –Across US Collect data from sensor arrays and perform QA/QC –One of the sites is SIO, San Diego Archive data for community access –IRIS, Seattle EarthScope/USArray: Single project, multiple participants.
21 D. Harding, NASA Point Cloud x, y, z, … Example: LiDAR Workflow Courtesy: Chris Crosby, ASU Survey Analyze / “Do Science” Interpolate / Grid Single goal: Multiple projects, multiple participants, e.g. NCALM, GEON, ASU, NASA, USGS, …
22 The CI Challenge Support multiple science goals, each requiring access and “integration” of resources from multiple projects and involving multiple participants and partners Distributed Systems Interoperability And creation of “Virtual Organizations”…
23
24 Community Cyberinfrastructure Projects Middleware Services Development Tools & Libraries Distributed Computing, Instruments and Data Resources Friendly Work-Facilitating Portals Authentication - Authorization - Auditing - Workflows - Visualization - Analysis Biomedical Informatics (BIRN) High Enegy Physics (GriPhyN) Geosciences (GEON) Ecological Observatories (NEON) Earthquake Engineering (NEES) Ocean Observing (ORION) Hardware Adapted from: Prof. Mark Ellisman, UC San Diego Shared Tools Science Domains Your Specific Tools & User Apps.
25 GEON Cyberinfrastructure Funded by NSF IT Research program Multi-institution collaboration between IT and Earth Science researchers GEON Cyberinfrastructure provides: –Authenticated access to data and Web services –Registration of data sets, tools, and services with metadata –Search for data, tools, and services, using ontologies –Scientific workflow environment and access to HPC –Data and map integration capability –Scientific data visualization and GIS mapping
26 Key Informatics Areas Portals –Authenticated, role-based access to cyber resources: data, tools, models, model outputs, collaboration spaces, … Data Integration –Search, discovery and integration of data from heterogeneous information sources (“mediation” and “semantic integration”) Use of workflow systems, and access to HPC –Ability to “program” at a higher level of abstraction –Sharing of models, along with “provenance” information –Gateways to HPC environments Management of Geospatial Information –Using GIS capabilities, map services, geospatial data integration Visualization of 3D, 4D geospatial data and information