CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Slides:



Advertisements
Similar presentations
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Advertisements

NG-CHC Northern Gulf Coastal Hazards Collaboratory Simulation Experiment Integration Sandra Harper 1, Manil Maskey 1, Sara Graves 1, Sabin Basyal 1, Jian.
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
As computer network experiments increase in complexity and size, it becomes increasingly difficult to fully understand the circumstances under which a.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
Education and Outreach Within the Modeling Environment for Atmospheric Discovery (MEAD) Project Daniel J. Bramer University Of Illinois at Urbana-Champaign.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Integrated Scientific Workflow Management for the Emulab Network Testbed Eric Eide, Leigh Stoller, Tim Stack, Juliana Freire, and Jay Lepreau and Jay Lepreau.
May 29, 2007 Dynamically Adaptive Weather Analysis and Forecasting in LEAD: Issues in Data Management, Metadata, and Search Beth Plale Director, Center.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Knowledge Portals and Knowledge Management Tools
Linked Environments for Atmospheric Discovery (LEAD): Web Services for Meteorological Research and Education.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
May 29, 2007 Metadata, Provenance, and Search in e-Science Beth Plale Director, Center for Data and Search Informatics School of Informatics Indiana University.
V. Chandrasekar (CSU), Mike Daniels (NCAR), Sara Graves (UAH), Branko Kerkez (Michigan), Frank Vernon (USCD) Integrating Real-time Data into the EarthCube.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
1 Using the Weather to Teach Computing Topics B. Plale, Sangmi Lee, AJ Ragusa Indiana University.
Focus Study: Mining on the Grid with ADaM Sara Graves Sandra Redman Information Technology and Systems Center and Information Technology Research Center.
Grid Computing for Real World Applications Suresh Marru Indiana University 5th October 2005 OSCER OU.
Metadata, Ontologies, and Provenance: Towards Extended Forms of Data Management Beth Plale, Yogesh Simmhan Computer Science Dept.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
18:15:32Service Oriented Cyberinfrastructure Lab, Grid Deployments Saul Rioja Link to presentation on wiki.
L inked E nvironments for A tmospheric D iscovery Linked Environments for Atmospheric Discovery (LEAD) Kelvin K. Droegemeier School of Meteorology and.
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
National Center for Supercomputing Applications The Computational Chemistry Grid: Production Cyberinfrastructure for Computational Chemistry PI: John Connolly.
Addressing the Data Deluge: the Structuring, Sharing, and Preserving of Scientific Experiment Data Beth Plale Sangmi Lee Scott Jensen Yiming Sun Computer.
OGCE Workflow Suite GopiKandaswamy Suresh Marru SrinathPerera ChathuraHerath Marlon Pierce TeraGrid 2008.
SAN DIEGO SUPERCOMPUTER CENTER NUCRI Advisory Board Meeting November 9, 2006 Science Gateways on the TeraGrid Nancy Wilkins-Diehr TeraGrid Area Director.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger September 29, 2009.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Renaissance Computing Institute: An Overview Lavanya Ramakrishnan, John McGee, Alan Blatecky, Daniel A. Reed Renaissance Computing Institute.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Sponsored by the National Science Foundation A New Approach for Using Web Services, Grids and Virtual Organizations in Mesoscale Meteorology.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
Applications and Requirements for Scientific Workflow Introduction May NSF Geoffrey Fox Indiana University.
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES TeraGrid.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Towards Personalized and Active Information Management for Meteorological Investigations Beth Plale Indiana University USA.
November Geoffrey Fox Community Grids Lab Indiana University Net-Centric Sensor Grids.
Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
→ MIPRO Conference,Opatija, 31 May -3 June 2005 Grid-based Virtual Organization for Flood Prediction Miroslav Dobrucký Institute of Informatics, SAS Slovakia,
OGCE Workflow and LEAD Overview Suresh Marru, Marlon Pierce September 2009.
3-D rendering of jet stream with temperature on Earth’s surface ESIP Air Domain Overview The Air Domain encompasses a variety of topic areas, but its focus.
LEAD Project Discussion Presented by: Emma Buneci for CPS 296.2: Self-Managing Systems Source for many slides: Kelvin Droegemeier, Year 2 site visit presentation.
1 Building Gateways to Grid Capabilities Dennis Gannon (with collaborator Beth Plale) Department of Computer Science School of Informatics Indiana University.
A Quick tour of LEAD for the VGrADS
Clouds , Grids and Clusters
Tools and Services Workshop
University of Chicago and ANL
Joslynn Lee – Data Science Educator
Open Grid Computing Environments
OGCE Portal Applications for Grid Computing
Presentation transcript:

CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale ) School of Informatics Indiana University

Overview CyberInfrastructure for Virtual Organizations in Computational Science –Science Portals and The Gateway Concept Automating the Trail From Data to Scientific Discovery –An Example in Depth: Mesoscale Storm Prediction The challenge for the Individual Researcher –Connecting tools to on-line services The Promise of the Future –Multicore, personal petabyte and gigabit bandwidth

The Realities of Science in the U.S. “Big Science” dominates the funding hierarchy. –Why? Its important and easy to sell to congress. The NSF is investing in vast network of supercomputers to support big science –The results are empowering a broad range of scientific communities. Where is the single investigator? –The Web has enabled democratization of information access –Is there a similar path for access advanced computational resources?

Democratizing Access to Science What is needed for the individual or small team to do large scale science? –Access to data and the tools to analyze it and transform it. –A means to publish the not just the results of a study but a way to share the path to discovery. Where are the resources? –What we have now: TeraGrid –What is emerging?

The TeraGrid The US National Supercomputer Grid –CyberInfrastructure composed of a set of resources (compute and data) that provide common services for Wide area data management, Single sign-on user authentication Distributed Job scheduling and management. (in the works.) Collectively –1Petaflop –20 Petabytes Soon to triple. Will add a petaflop each year. –But at a slower rate than google, ebay, amazon add resources.

TeraGrid Wide: Science Gateways Science Portals –A Portal = a web-based home+personal workspace + personal tools. –Web Portal Technology + Grid Middleware Enables a community of researcher to: –Access to shared resources (both data and computation) –A forum to collaboration on shared problem solving TeraGrid Science Gateways –Allow the TeraGrid to be the back-end resource.

NEESGrid Realtime access to earthquake Shake table experiments at remote sites.

BIRN – Biomedical Information

Geological Information Grid Portal

Renci Bio Portal Providing access to biotechnology tools running on a back-end Grid. - leverage state-wide investment in bioinformatics - undergraduate & graduate education, faculty research - another portal soon: national evolutionary synthesis center

Nanohub - nanotechnology

X-Ray Crystallography

ServoGrid Portal

The LEAD Project

Predicting Storms Hurricanes and tornadoes cause massive loss of life and damage to property Underlying physical systems involve highly non-linear dynamics so computationally intense Data comes from multiple sources –“real time” derived from streams of data from sensors –Archived in databases of past storms Infrastructure challenges: –Data mine instrument radar data for storms –Allocate supercomputer resources automatically to run forecast simulations –Monitor results and retarget instruments. –Log provenance and metadata about experiments for auditing.

Analysis/Assimilation Quality Control Retrieval of Unobserved Quantities Creation of Gridded Fields Prediction/Detection PCs to Teraflop Systems Product Generation, Display, Dissemination End Users NWS Private Companies Students Traditional Methodology STATIC OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites The Process is Entirely Serial and Static (Pre-Scheduled): No Response to the Weather! The Process is Entirely Serial and Static (Pre-Scheduled): No Response to the Weather!

Analysis/Assimilation Quality Control Retrieval of Unobserved Quantities Creation of Gridded Fields Prediction/Detection PCs to Teraflop Systems Product Generation, Display, Dissemination End Users NWS Private Companies Students The LEAD Vision: Enabling a new paradigm of scientific exploration. DYNAMIC OBSERVATIONS Models and Algorithms Driving Sensors The CS challenge: * Build cyberinfrastructure services that provide adaptability, scalability, availability, useability. * Build cyberinfrastructure services that provide adaptability, scalability, availability, useability. * Create a new paradigm of meteorology research. * Create a new paradigm of meteorology research.

Building Experiments that Respond to the Future Can we pose a scientific search and discovery query that the cyber infrastructure executes as our agent? In the LEAD case it is Data Driven, Persistent and Agile –Weather data streams define nature of computation –Mine the data streams, detect “interesting” features, event triggers workflow scenario that has been waiting for months.

The LEAD Gateway Portal To support three classes of users –Meteorology research scientists & grad students. –Undergrads in meteorology classes –People who want easy access to weather data. Go to:

Gateway Components A Framework for Discovery –Four basic components Data Discovery –Catalogs and index services The experiment –Computational workflow managing on-demand resources Data analysis and visualization Data product preservation, –automatic metadata generation and experimental data providence.

Data Search Select a region and a time range and desired attributes

Building Experiments As the user interacts with the portal they are creating “experiments” An experiment is –A collection of data (or desired data) –A set of analysis, transformational or predictive tasks Defined by a workflow or a high level query –A provenance document that encodes a repeatable history of the experiment.

Portal: Experimental Data & Metadata Space CyberInfrastructure extends user’s desktop to incorporate vast data analysis space. As users go about doing scientific experiments, the CI manages back- end storage and compute resources. –Portal provides ways to explore this data and search and discover it. Metadata about experiments is largely automatically generated, and highly searchable. –Describes data object (the file) in application-rich terms, and provides URI to data service that can resolve an abstract unique identifier to real, on-line data “file”.

arpssfc arpstrn Ext2arps-ibc 88d2arps mci2arps ADAS assimilation arps2wrf nids2arps WRF Ext2arps-lbc wrf2arps arpsplot IDV viz Terrain data files Surface data files ETA, RUC, GFS data Radar data (level II) Radar data (level III) Satellite data Surface, upper air mesonet & wind profiler data Typical weather forecast runs as workflow ~400 Data Products Consumed & Produced – transformed – during Workflow Lifecycle Pre-ProcessingAssimilationForecast Visualization

The Experiment Builder A Portal “wizard” that leads the user through the set-up of a workflow Asks the user: –“Which workflow do you want to run?” Once this is know, it can prompt the user for the required input data sources Then it “launches” the workflow.

Parameter Selection

Selecting the forecast region

Experience so far First release to support “WxChallenge: the new collegiate weather forecast challenge” –The goal: “forecast the maximum and minimum temperatures, precipitation, and maximum sustained wind speeds for select U.S. cities. –to provide students with an opportunity to compete against their peers and faculty meteorologists at 64 institutions for honors as the top weather forecaster in the nation.” –79 “users” ran 1,232 forecast workflows generating 2.6TBybes of data. Over 160 processors were reserved on Tungsten from 10am to 8pm EDT(EST), five days each week National Spring Forecast –First use of user initiated 2Km forecasts as part of that program. Generated serious interest from National Severe Storm Center. Integration with CASA project scheduled for final year of LEAD ITR.

Is TeraGrid the Only Enabler? The web has evolved a set information and service “super nodes” –Directories & indexes (google, MS, Yahoo) –Transactional mosh pits (eBay, Facebook, Wikipedia) –Raw data and compute services (Amazon …) We can build the tools for scientific discovery on this “private sector” grid? –Yes. –One CS student + one Bio-informatician + Amazon Storage Service + Amazon Compute Cloud =..

A Virtual Lab for Evolutionary Genomics Data and databases live on S3 Computational Tools run (on- demand) as services on EC2. User composes workflows. Result data and metadata visible to user through desktop client.

Validating Scientific Discovery The Gateway is becoming part of the process of science by being an active repository of data provenance The system records each computational experiment that a user initiates –A complete audit trail of the experiment or computation –Published results can include link to provenance information for repeatability and transparency. The Scientific Method is all about repeatability of experiments –Are we there yet?

Almost The provenance contains the workflow and if we publish it, it can be re-run –Are the same resources still available? Not a necessary condition for validation –Has the data changed? Another user can modify it. –Replace an analysis step with another –Test it on different data.

The Future Experimental Testbed In five years multicore, personal petabytes and ubiquitous gigabit bandwidth –Much richer experimental capability on the desktop. More of the computational work can be downloaded Do we no longer need the massive remote data/compute center? –Demand scales with capability. But there is more.

Last Thought Vastly improved capability for interactive experimentation –Data exploration and visualization. Interacting with hundreds of incoming data streams. –Tracking our path and exploring 100 possible experimental scenarios concurrently. –Deep search agents Discovering new data and new tools Grab data - automatically fetch and analyze the provenance and set up the workflow to be re-run.

Questions

The Realization in Software Data Storage Application services Compute Engine User Portal Portal server Portal server Data Catalog service Data Catalog service MyLEAD User Metadata catalog MyLEAD User Metadata catalog MyLEAD Agent service MyLEAD Agent service Data Management Service Data Management Service Workflow Engine Workflow Engine Workflow graph Providence Collection service Providence Collection service Event Notification Bus Fault Tolerance & scheduler Fault Tolerance & scheduler