May 29, 2007 Dynamically Adaptive Weather Analysis and Forecasting in LEAD: Issues in Data Management, Metadata, and Search Beth Plale Director, Center.

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
© 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area.
© 2006 Open Grid Forum GGF18, 13th September 2006 OGSA Data Architecture Scenarios Dave Berry & Stephen Davey.
Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
1 G2 and ActiveSheets Paul Roe QUT Yes Australia!
As computer network experiments increase in complexity and size, it becomes increasingly difficult to fully understand the circumstances under which a.
MTA SZTAKI Hungarian Academy of Sciences Grid Computing Course Porto, January Introduction to Grid portals Gergely Sipos
Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18 Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18 Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
May 29, 2007 Metadata, Provenance, and Search in e-Science Beth Plale Director, Center for Data and Search Informatics School of Informatics Indiana University.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Karma Provenance Collection Framework for Data-driven Workflows Yogesh Simmhan Microsoft Research Beth Plale, Dennis Gannon, Ai Zhang, Girish Subramanian,
Grid Computing for Real World Applications Suresh Marru Indiana University 5th October 2005 OSCER OU.
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
ESB Guidance 2.0 Kevin Gock
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
XCAT Science Portal Status & Future Work July 15, 2002 Shava Smallen Extreme! Computing Laboratory Indiana University.
CIS 375—Web App Dev II Microsoft’s.NET. 2 Introduction to.NET Steve Ballmer (January 2000): Steve Ballmer "Delivering an Internet-based platform of Next.
Software for Science Gateways: Open Grid Computing Environments Marlon Pierce, Suresh Marru Pervasive Technology Institute Indiana University
CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )
OGCE Workflow Suite GopiKandaswamy Suresh Marru SrinathPerera ChathuraHerath Marlon Pierce TeraGrid 2008.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Phase II Additions to LSG Search capability to Gene Browser –Though GUI in Gene Browser BLAST plugin that invokes remote EBI BLAST service Working set.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
Middleware for Grid Computing and the relationship to Middleware at large ECE 1770 : Middleware Systems By: Sepehr (Sep) Seyedi Date: Thurs. January 23,
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Sponsored by the National Science Foundation A New Approach for Using Web Services, Grids and Virtual Organizations in Mesoscale Meteorology.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.
A PPARC funded project Common Execution Architecture Paul Harrison IVOA Interoperability Meeting Cambridge MA May 2004.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19 th 2006 / 1 Data Discovery and Basic Processing within the German.
Biomedical and Bioscience Gateway to National Cyberinfrastructure John McGee Renaissance Computing Institute
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
DSpace System Architecture 11 July 2002 DSpace System Architecture.
Overview of Grid Webservices in Distributed Scientific Applications Dennis Gannon Aleksander Slominski Indiana University Extreme! Lab.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
XMC Cat: An Adaptive Catalog for Scientific Metadata Scott Jensen and Beth Plale School of Informatics and Computing Indiana University-Bloomington Current.
OGCE Workflow and LEAD Overview Suresh Marru, Marlon Pierce September 2009.
Application Web Service Toolkit Allow users to quickly add new applications GGF5 Edinburgh Geoffrey Fox, Marlon Pierce, Ozgur Balsoy Indiana University.
LEAD Project Discussion Presented by: Emma Buneci for CPS 296.2: Self-Managing Systems Source for many slides: Kelvin Droegemeier, Year 2 site visit presentation.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
A Quick tour of LEAD for the VGrADS
Viet Tran Institute of Informatics Slovakia
OGSA Data Architecture Scenarios
Open Grid Computing Environments
SDM workshop Strawman report History and Progress and Goal.
Module 01 ETICS Overview ETICS Online Tutorials
4/5 May 2009 The Palazzo dei Congressi di Stresa Stresa, Italy
Presentation transcript:

May 29, 2007 Dynamically Adaptive Weather Analysis and Forecasting in LEAD: Issues in Data Management, Metadata, and Search Beth Plale Director, Center for Data and Search Informatics School of Informatics Indiana University, US

May 29, 2007 Introduction Linked Environments for Atmospheric Discovery (LEAD) makes meteorological data, forecast models, and analysis and visualization tools available to anyone who wants to interactively explore the weather as it evolves. In this talk we describe key data management aspects of the project - those projects being carried out in the Center for Data and Search Informatics at Indiana University

May 29, 2007 Infrastructure is portal based - that is, all services are available through a web server Infrastructure is portal based - that is, all services are available through a web server

May 29, 2007 Gateway Services Core Grid Services e-Science Gateway Architecture Grid Portal Server Grid Portal Server Execution Management Execution Management Information Services Information Services Self Management Self Management Data Services Data Services Resource Management Resource Management Security Services Security Services Resource Virtualization (OGSA) Compute ResourcesData ResourcesInstruments & Sensors Proxy Certificate Server (Vault) Proxy Certificate Server (Vault) Events & Messaging Resource Broker Community & User Metadata Catalog Community & User Metadata Catalog Workflow engine Resource Registry Resource Registry Application Deployment Application Deployment User’s Grid Desktop [1] [1] Service Oriented Architectures for Science Gateways on Grid Systems, Gannon, D., et al.; ICSOC, 2005Service Oriented Architectures for Science Gateways on Grid Systems

May 29, 2007 arpssfc arpstrn Ext2arps-ibc 88d2arps mci2arps ADAS assimilation arps2wrf nids2arps WRF Ext2arps-lbc wrf2arps arpsplot IDV viz Terrain data files Surface data files ETA, RUC, GFS data Radar data (level II) Radar data (level III) Satellite data Surface, upper air mesonet & wind profiler data Typical weather forecast runs as workflow ~400 Data Products Consumed & Produced – transformed – during Workflow Lifecycle Pre-ProcessingAssimilationForecast Visualization

May 29, 2007 To set up workflow experiment, we select a workflow (not shown) then set model parameters here To set up workflow experiment, we select a workflow (not shown) then set model parameters here

May 29, 2007 Supported community data collections Supported community data collections

May 29, 2007 Data Integration CASA radar Collection, Months (ftp) Latest 3 days Unidata IDD Distribution (XML web server) Level II and III radar, latest 3 days (XML web server) ETA, NCEP, NAM, METAR, etc. (XML web server) Oklaho ma Indiana Colorado Index XMLDB native XML database and Lucene for index Local view: crosswalk point of presence supports crawling, publishes difference list as LEAD Metadata Schema (LMS) documents Crawler crawls catalogs; Builds index of results; Web service API; Boolean search query with spatial/temporal support Globally integrated view: Data Catalog Service Web service API Boolean search query List of results as LEAD Metadata Schema documents crosswalks

May 29, 2007 LEAD Personal Workspace CyberInfrastructure extends user’s desktop to incorporate vast data analysis space. As users go about doing scientific experiments, the CI manages back-end storage and compute resources. Portal provides ways to explore this data and search and discover it. Metadata about experiments is largely automatically generated, and highly searchable. Describes data object (the file) in application-rich terms, and provides URI to data service that can resolve an abstract unique identifier to real, on-line data “file”.

May 29, 2007 Searching for experiments using model configuration parameters: 2 attributes selected

May 29, 2007 Searching for experiments based on model parameters: 4 returned experiments; one displayed

May 29, 2007 How forecast model configuration parameters stored in personal catalog Forecast model configuration file handed off to plugin that shreds XML document into queriable attributes associated with experiment

May 29, 2007 What & Why of Provenance Derivation history of a data product What (when, where) application created the data Its parameters & configuration Other input data used by application Workflow is composed from building blocks like these. So provenance for data used in workflow gives workflow trace Application A Data.Out.1 Data.In.1 Config.A Data.In.2 Data Provenance::Data.Out.1 Process: Application_A Timestamp: T12:45:23 Host: tyr20.cs.indiana.edu … Input: Data.In.1, Data.In.2 Config: Config.A

May 29, 2007 The What & Why of Provenance Trace Workflow Execution What services were used during workflow execution? Validate if all steps of execution successful? Audit Trail What resources were used during workflow execution? Data Quality & Reuse What applications were used to derived data products? Which workflows use a certain data product? Attribution Who performed the experiment? Who owns the workflow & data products? Discovery Locate data generated by a workflow Locate workflows containing App-X that succeeded

May 29, 2007 Karma Provenance Service Provenance Listener Provenance Listener Activity DB Activity DB Collection Framework Workflow Instance 10 Data Products Consumed & Produced by each Service Workflow Instance 10 Data Products Consumed & Produced by each Service Service 2 Service 2 … … Service 1 Service 1 Service 10 Service 10 Service 9 Service 9 10P/10C 10C 10P10C10P/10C 10P Workflow Engine Workflow Engine Message Bus WS-Eventing Service API WS-Messenger Notification Broker WS-Messenger Notification Broker Publish Provenance Activities as Notifications Application–Started & –Finished, Data–Produced & –Consumed Activities Workflow–Started & –Finished Activities Provenance Query API Provenance Query API Provenance Browser Client Provenance Browser Client Query for Workflow, Process, & Data Provenance Subscribe & Listen to Activity Notifications A Framework for Collecting Provenance in Data-Centric Scientific Workflows, Simmhan, Y., et al., ICWS Conference, 2006A Framework for Collecting Provenance in Data-Centric Scientific Workflows

May 29, 2007 Generating Karma Provenance Activities Instrument applications to publish provenance Simple Java Library available to Create provenance activities Publish activities as messages Jython “wrapper” scripts use library to publish provenance & invoke application Generic Factory toolkit easily converts applications to web service Built-in provenance instrumentation

May 29, 2007 Sample Sequence of Activities appStarted( App1 ) info( ‘ App1 starting ’ ) fileReceiveStarted( File1 ) -- do gridftp get to stage input file File1 -- fileReceiveFinished( File1 ) fileConsumed( File1 ) computationStarted( Code1 ) -- call Fortran code Code1 to process input files -- computationFinished( Code1 ) fileProduced( File2 ) fileSendStarted( File2 ) -- do gridftp put to save output file File2 -- fileSendFinished( File2 ) publishURL( File2 ) appFinishedSuccess( App1, File2 ) | appFinishedFailed( App1, ERR ) flush()

May 29, 2007 Performance perturbation

May 29, 2007 Scalability Study 4 [4] [4] Performance Evaluation of the Karma Provenance Framework for Scientific Workflows, Simmhan, Y., et al.; IPAW Workshop, 2006Performance Evaluation of the Karma Provenance Framework for Scientific Workflows

May 29, 2007 Resource monitoring as two-planes of control

May 29, 2007 LEAD BPEL Workflow Engine Workflow Configuration Service Portal Event Broker Workflow Application Service (per task) Workflow and File Status DAG myLEAD (subscribes to messages from the broker and knows what magic to do with input/output files and talks to RLS/DRS Run workflow one step at a time Run job Job notification Create Services App. Factory Launch Services Resource Management Services Sensor Actuator Resource adaptation illustrated (1) Resource has failed, need to reschedule remaining parts of workflow Stop the earlier workflow Replan the workflow Resource Changes

May 29, 2007 LEAD BPEL Workflow Engine Workflow Configuration Service Portal Event Broker Workflow Application Service (per task) Workflow and File Status DAG myLEAD (subscribes to messages from the broker and knows what magic to do with input/output files and talks to RLS/DRS Run workflow one step at a time Run job Job notification Create Services App. Factory Launch Services Resource Management Services Sensor Actuator Resource adaptation illustrated (2) Implement strict deadline scheduling Weather change Plan resources for sub- components Change priorities for users e.g. Lavanya’s workflow gets lower priority Implement Adverse Weather Policy

May 29, 2007 LEAD BPEL Workflow Engine Workflow Configuration Service Portal Event Broker Workflow Application Service (per task) DAG myLEAD (subscribes to messages from the broker and knows what magic to do with input/output files and talks to RLS/DRS Run workflow one step at a time Run job Job notification Create Services App. Factory Launch Services Resource Management Services Sensor Actuator Resource adaptation illustrated (3) Services “Replicate Service” “Service Overloaded”

May 29, 2007 Recent LEAD Highlight Spring 2007 Weather Challenge Forecast contest - February - March 2007 Students ran ….. Statistics from the Challenge Approximately 50 participants 6696 jobs submitted to Teragrid (52925 TG SU's), and Generated about 2.6 TB of data which is archived at Indiana University and available though each participating user’s personal workspace catalog. Computational models run on Teragrid resources. Portal and persistent back-end services run at Indiana University. Data storage resources (45 TB) for user-generated data products provided by Indiana University.

May 29, 2007 Future Work Optimizations and refinements: file movement, revisit metadata schema, improve crosswalks with eye to reduced maintenance Personal predictor - packaging LEAD framework into single 8-16 core multicore machine for the individual purchase

May 29, 2007 Thanks to the whole LEAD team, and the National Science Foundation for their support. For more information, feel free to contact me at or go to