Presentation is loading. Please wait.

Presentation is loading. Please wait.

May 29, 2007 Dynamically Adaptive Weather Analysis and Forecasting in LEAD: Issues in Data Management, Metadata, and Search Beth Plale Director, Center.

Similar presentations


Presentation on theme: "May 29, 2007 Dynamically Adaptive Weather Analysis and Forecasting in LEAD: Issues in Data Management, Metadata, and Search Beth Plale Director, Center."— Presentation transcript:

1 May 29, 2007 Dynamically Adaptive Weather Analysis and Forecasting in LEAD: Issues in Data Management, Metadata, and Search Beth Plale Director, Center for Data and Search Informatics School of Informatics Indiana University, US

2 May 29, 2007 Introduction Linked Environments for Atmospheric Discovery (LEAD) makes meteorological data, forecast models, and analysis and visualization tools available to anyone who wants to interactively explore the weather as it evolves. In this talk we describe key data management aspects of the project - those projects being carried out in the Center for Data and Search Informatics at Indiana University

3 May 29, 2007 Infrastructure is portal based - that is, all services are available through a web server Infrastructure is portal based - that is, all services are available through a web server

4 May 29, 2007 Gateway Services Core Grid Services e-Science Gateway Architecture Grid Portal Server Grid Portal Server Execution Management Execution Management Information Services Information Services Self Management Self Management Data Services Data Services Resource Management Resource Management Security Services Security Services Resource Virtualization (OGSA) Compute ResourcesData ResourcesInstruments & Sensors Proxy Certificate Server (Vault) Proxy Certificate Server (Vault) Events & Messaging Resource Broker Community & User Metadata Catalog Community & User Metadata Catalog Workflow engine Resource Registry Resource Registry Application Deployment Application Deployment User’s Grid Desktop [1] [1] Service Oriented Architectures for Science Gateways on Grid Systems, Gannon, D., et al.; ICSOC, 2005Service Oriented Architectures for Science Gateways on Grid Systems

5 May 29, 2007 arpssfc arpstrn Ext2arps-ibc 88d2arps mci2arps ADAS assimilation arps2wrf nids2arps WRF Ext2arps-lbc wrf2arps arpsplot IDV viz Terrain data files Surface data files ETA, RUC, GFS data Radar data (level II) Radar data (level III) Satellite data Surface, upper air mesonet & wind profiler data Typical weather forecast runs as workflow ~400 Data Products Consumed & Produced – transformed – during Workflow Lifecycle Pre-ProcessingAssimilationForecast Visualization

6 May 29, 2007 To set up workflow experiment, we select a workflow (not shown) then set model parameters here To set up workflow experiment, we select a workflow (not shown) then set model parameters here

7 May 29, 2007 Supported community data collections Supported community data collections

8 May 29, 2007 Data Integration CASA radar Collection, Months (ftp) Latest 3 days Unidata IDD Distribution (XML web server) Level II and III radar, latest 3 days (XML web server) ETA, NCEP, NAM, METAR, etc. (XML web server) Oklaho ma Indiana Colorado Index XMLDB native XML database and Lucene for index Local view: crosswalk point of presence supports crawling, publishes difference list as LEAD Metadata Schema (LMS) documents Crawler crawls catalogs; Builds index of results; Web service API; Boolean search query with spatial/temporal support Globally integrated view: Data Catalog Service Web service API Boolean search query List of results as LEAD Metadata Schema documents crosswalks

9 May 29, 2007 LEAD Personal Workspace CyberInfrastructure extends user’s desktop to incorporate vast data analysis space. As users go about doing scientific experiments, the CI manages back-end storage and compute resources. Portal provides ways to explore this data and search and discover it. Metadata about experiments is largely automatically generated, and highly searchable. Describes data object (the file) in application-rich terms, and provides URI to data service that can resolve an abstract unique identifier to real, on-line data “file”.

10 May 29, 2007 Searching for experiments using model configuration parameters: 2 attributes selected

11 May 29, 2007 Searching for experiments based on model parameters: 4 returned experiments; one displayed

12 May 29, 2007 How forecast model configuration parameters stored in personal catalog Forecast model configuration file handed off to plugin that shreds XML document into queriable attributes associated with experiment

13 May 29, 2007 What & Why of Provenance Derivation history of a data product What (when, where) application created the data Its parameters & configuration Other input data used by application Workflow is composed from building blocks like these. So provenance for data used in workflow gives workflow trace Application A Data.Out.1 Data.In.1 Config.A Data.In.2 Data Provenance::Data.Out.1 Process: Application_A Timestamp: 2006-06-23T12:45:23 Host: tyr20.cs.indiana.edu … Input: Data.In.1, Data.In.2 Config: Config.A

14 May 29, 2007 The What & Why of Provenance Trace Workflow Execution What services were used during workflow execution? Validate if all steps of execution successful? Audit Trail What resources were used during workflow execution? Data Quality & Reuse What applications were used to derived data products? Which workflows use a certain data product? Attribution Who performed the experiment? Who owns the workflow & data products? Discovery Locate data generated by a workflow Locate workflows containing App-X that succeeded

15 May 29, 2007 Karma Provenance Service Provenance Listener Provenance Listener Activity DB Activity DB Collection Framework Workflow Instance 10 Data Products Consumed & Produced by each Service Workflow Instance 10 Data Products Consumed & Produced by each Service Service 2 Service 2 … … Service 1 Service 1 Service 10 Service 10 Service 9 Service 9 10P/10C 10C 10P10C10P/10C 10P Workflow Engine Workflow Engine Message Bus WS-Eventing Service API WS-Messenger Notification Broker WS-Messenger Notification Broker Publish Provenance Activities as Notifications Application–Started & –Finished, Data–Produced & –Consumed Activities Workflow–Started & –Finished Activities Provenance Query API Provenance Query API Provenance Browser Client Provenance Browser Client Query for Workflow, Process, & Data Provenance Subscribe & Listen to Activity Notifications A Framework for Collecting Provenance in Data-Centric Scientific Workflows, Simmhan, Y., et al., ICWS Conference, 2006A Framework for Collecting Provenance in Data-Centric Scientific Workflows

16 May 29, 2007 Generating Karma Provenance Activities Instrument applications to publish provenance Simple Java Library available to Create provenance activities Publish activities as messages Jython “wrapper” scripts use library to publish provenance & invoke application Generic Factory toolkit easily converts applications to web service Built-in provenance instrumentation

17 May 29, 2007 Sample Sequence of Activities appStarted( App1 ) info( ‘ App1 starting ’ ) fileReceiveStarted( File1 ) -- do gridftp get to stage input file File1 -- fileReceiveFinished( File1 ) fileConsumed( File1 ) computationStarted( Code1 ) -- call Fortran code Code1 to process input files -- computationFinished( Code1 ) fileProduced( File2 ) fileSendStarted( File2 ) -- do gridftp put to save output file File2 -- fileSendFinished( File2 ) publishURL( File2 ) appFinishedSuccess( App1, File2 ) | appFinishedFailed( App1, ERR ) flush()

18 May 29, 2007 Performance perturbation

19 May 29, 2007 Scalability Study 4 [4] [4] Performance Evaluation of the Karma Provenance Framework for Scientific Workflows, Simmhan, Y., et al.; IPAW Workshop, 2006Performance Evaluation of the Karma Provenance Framework for Scientific Workflows

20 May 29, 2007 Resource monitoring as two-planes of control

21 May 29, 2007 LEAD BPEL Workflow Engine Workflow Configuration Service Portal Event Broker Workflow Application Service (per task) Workflow and File Status DAG myLEAD (subscribes to messages from the broker and knows what magic to do with input/output files and talks to RLS/DRS Run workflow one step at a time Run job Job notification Create Services App. Factory Launch Services Resource Management Services Sensor Actuator Resource adaptation illustrated (1) Resource has failed, need to reschedule remaining parts of workflow Stop the earlier workflow Replan the workflow Resource Changes

22 May 29, 2007 LEAD BPEL Workflow Engine Workflow Configuration Service Portal Event Broker Workflow Application Service (per task) Workflow and File Status DAG myLEAD (subscribes to messages from the broker and knows what magic to do with input/output files and talks to RLS/DRS Run workflow one step at a time Run job Job notification Create Services App. Factory Launch Services Resource Management Services Sensor Actuator Resource adaptation illustrated (2) Implement strict deadline scheduling Weather change Plan resources for sub- components Change priorities for users e.g. Lavanya’s workflow gets lower priority Implement Adverse Weather Policy

23 May 29, 2007 LEAD BPEL Workflow Engine Workflow Configuration Service Portal Event Broker Workflow Application Service (per task) DAG myLEAD (subscribes to messages from the broker and knows what magic to do with input/output files and talks to RLS/DRS Run workflow one step at a time Run job Job notification Create Services App. Factory Launch Services Resource Management Services Sensor Actuator Resource adaptation illustrated (3) Services “Replicate Service” “Service Overloaded”

24 May 29, 2007 Recent LEAD Highlight Spring 2007 Weather Challenge Forecast contest - February - March 2007 Students ran ….. Statistics from the Challenge Approximately 50 participants 6696 jobs submitted to Teragrid (52925 TG SU's), and Generated about 2.6 TB of data which is archived at Indiana University and available though each participating user’s personal workspace catalog. Computational models run on Teragrid resources. Portal and persistent back-end services run at Indiana University. Data storage resources (45 TB) for user-generated data products provided by Indiana University.

25 May 29, 2007 Future Work Optimizations and refinements: file movement, revisit metadata schema, improve crosswalks with eye to reduced maintenance Personal predictor - packaging LEAD framework into single 8-16 core multicore machine for the individual purchase

26 May 29, 2007 Thanks to the whole LEAD team, and the National Science Foundation for their support. For more information, feel free to contact me at plale@indiana.edu or go to http://www.leadportal.orgplale@indiana.edu


Download ppt "May 29, 2007 Dynamically Adaptive Weather Analysis and Forecasting in LEAD: Issues in Data Management, Metadata, and Search Beth Plale Director, Center."

Similar presentations


Ads by Google