Download presentation
Presentation is loading. Please wait.
1
A Quick tour of LEAD for the VGrADS
Dennis Gannon, Beth Plale and Suresh Marru Department of Computer Science School of Informatics Indiana University (Lavanya and Dan have seen this many times)
2
A Science Driven Grid: LEAD
A Grid designed to change the paradigm for mesoscale weather prediction Building dynamic, adaptive workflows from data streams and better-than-real-time execution constraints.
3
The LEAD Project
4
Traditional Methodology
STATIC OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites Analysis/Assimilation Quality Control Retrieval of Unobserved Quantities Creation of Gridded Fields Prediction/Detection PCs to Teraflop Systems Product Generation, Display, Dissemination The Process is Entirely Serial and Static (Pre-Scheduled): No Response to the Weather! End Users NWS Private Companies Students
5
Am Major Paradigm Shift: CASA NETRAD adaptive Doppler Radars.
6
The LEAD Vision: Adaptive Cyberinfrastructure
DYNAMIC OBSERVATIONS Analysis/Assimilation Quality Control Retrieval of Unobserved Quantities Creation of Gridded Fields Prediction/Detection PCs to Teraflop Systems Product Generation, Display, Dissemination Models and Algorithms Driving Sensors The CS challenge: Build cyberinfrastructure services that provide adaptability, scalability, availability, useability, and real-time response. End Users NWS Private Companies Students
7
Change the Paradigm To make fundamental advances we need:
Adaptivity in computational model. But also Cyberinfrastructure to: Execute complex scenarios in response to weather events Stream processing, triggers Close loop with the instruments. Acquire computational resources on demand. Need supercomputer-scale resources Invoked in response to weather events Deal with data deluge User can no longer manage his/her own experiment products
8
Reaching the LEAD Goal Understanding the role of data is the key
from streams, from mining, from simulations to visualizations Enabling Discovery The role of the experiment Sharing both process and results Creating an educational “context” An agile architecture of composable services. The data is distributed and the computational resources are distributed. Requires all the Grid attributes (distributed resource allocation, robust and scalable, secure) All application components are services. Access to all LEAD resources should be easy
9
Experiment as Control/Data Flow graph
Another Paradigm Shift for the users. Each activity a user initiates in LEAD is an Experiment Data discovery and collection. Applied analysis and transformation A graph of activities (workflow) Curated data products and results Each activity is logged by an event system and stored as metadata in the users workspace. Provides a complete provenance of work.
10
The Architecture LEAD Grid Portal Server Physical Resource Layer
The Users Desktop. LEAD Grid Portal Server Gateway Services Proxy Certificate Server / vault User Metadata Catalog MyLEAD Workflow engine Application Deployment Application Events Resource Broker App. Resource catalogs Core Grid Services Security Services Information Services Self Management Resource Management Execution Management Data Services OGSA-like Layer Physical Resource Layer
11
User Workspace and Resources Middleware Tools and Resources
Ontology service Query service Geo GUI My Workspace User Workspace and Resources my Workspace Catalog Search tools myExperiment space Authorization Portal Gateway Exper builder Exper GUI Community Resource Catalog My Tools IDV client … Registered with catalog Middleware Tools and Resources Storm event detection Community Datasets Forecast Models Data Mining tools Ensemble initial cond. generation … Used in experiment D D D Publish results to myWorkspace catalog D data Q Q/A M model A analysis Workflow orchestration engine Computational Layer Q A response M Monitoring & control service Compute servers
12
Data Discovery Select community data products for import to workspace or use in experiment
13
LEAD Data Use Scenario Importing community data products to users workspace indexed Resource catalog THREDDS, Opendap, LDM THREDDS, Opendap, LDM THREDDS, Opendap, CDM Query service Data in binary (often) Metadata in LEAD schema Noesis Ontology service myLEAD User workspace Grid storage repository
14
User’s Workspace (myLEAD)
Metadata catalog of user’s data products User’s storage on LEAD grid Agent actively archives data products: Derived data products - data products result of processing original raw data Temporally changing data products - data continuously changing through regular additions streamed into archive Ad hoc actions taken by content creators, or In conjunction with workflow processes. Approach: general, reusable data model; open source database (mySQL); standardized metadata schemas (XML); service-oriented architecture (SOAP, WSDL, gridFTP, x509 certificates)
15
Log in and see your MyLEAD Space
x
16
Workflows: Execution of Complex Experiments
LEAD requires ability to construct workflows that are Data Driven Weather data streams define nature of computation Persistent and Agile Data mining of data stream, detects “interesting” feature, event triggers workflow scenario that has been waiting for months. Adaptive In response to weather: weather changes. Nature of workflow may have to change on-the-fly. Resources More may be needed, sometimes they become unavailable. Need to be self-aware
17
The LEAD Application Codes
All community codes. Data transformers, Data miners, converters, data assimilators, forecast codes. Fortran, C and Java We don’t mess with them beyond instrumentation dan’s group may insert. We don’t have good profiles for how they behave … yet. The big one: WRF – Weather Research Forecast code is the standard. Over 3000 known versions exist … all incompatible. Atmospheric scientists like to play with it. Hmmm. What happens if I change this do loop?
18
Application Services Workflows are built by composing web services
Fortran applications are “wrapped” by a Application Factory which generates a web service for the app. Instances of the service are dynamically created using Globus Registers WSDL for the service with a registry Each service generates a stream of notifications that log the service actions back to the MyLEAD experiment. c Application Factory App Service Run program & publish events
19
Service Monitoring via Events
The service output is a stream of events I am running your request I have started to ftp your input files. I have all the files I am running your application. The application is finished I am moving the output to you file space I am done. These are automatically generated by the service using a distributed event system (WS-Eventing / WS-Notification) Topic based pub-sub system with a well known “channel”. Application Service Instance 6 5 4 3 2 1 Notification Channel Subscribe Topic=x x x publisher listener
20
Creating structure in user’s archive that models their investigation steps
12 hrs Gather data products Run 12 hour forecast (6 hrs to complete) Analyze results Based on analysis, gather other products Run 6 Hr forecast (3 hrs to complete) Analyze results workflow workflow Notif service Decoder service Product requests, Product registers, Notification msgs, myLEAD agent myLEAD server
21
The workflow composer User designs, then compiler generates GBPEL
22
Assimilation-forecast workflow
23
Workflow applied to Katrina
2D image of sea level generated by ARPS Plotting Service 3D Image generated by IDV
24
Monitoring the Event Stream
25
Resource Scheduling in LEAD
What we tell people Huh? Oh, VGrADS is doing that. So that is off our plate.
26
LEAD Static Workflow WRF ADAS Initialization Static data Forecast
Terrain data files NAM, RUC, GFS data 3D Model Data Interpolator (lateral Boundary Conditions) 3D Model Data Interpolator (Initial Boundary Conditions) Terrain Preprocessor Surface data, upper air mesonet data, wind profiler IDV Bundle Radar data (level II) 88D Radar Remapper WRF ARPS to WRF Data Interpolator Radar data (level III) ADAS NIDS Radar Remapper Satellite data WRF to ARPS Data Interpolator Satellite Data Remapper Surface, Terrestrial data files Initialization ARPS Plotting Program WRF Static Preprocessor Static data Forecast Real time data Visualization
27
Dynamic Workflows in LEAD
Terrain data files NAM, RUC, GFS data 3 7 1 3D Model Data Interpolator (Initial Boundary Conditions) 3D Model Data Interpolator (lateral Boundary Conditions) Terrain Preprocessor 11 IDV Bundle 4 Radar data (level II) Surface data, upper air mesonet data, wind profiler 88D Radar Remapper 10 9 WRF Radar data (level III) 5 WRF ARPS to WRF Data Interpolator WRF NIDS Radar Remapper WRF 12 ADAS 6 Satellite data WRF to ARPS Data Interpolator Satellite Data Remapper 8 ADAM Triggered if a storm is detected Surface, Terrestrial data files 13 ARPS Plotting Program 2 Data mining: Looking for storm signature Run Once per forecast Region WRF Static Preprocessor Repeated for periodically for new data Visualization on users request
28
Where does LEAD need VGrADS
The static case workflows: We can can build execution time models for each of the major workflow components. Outgrowth of Renci team work. We can convert this into a “workflow requirement schedule”. A graph of “tasks”. Each task is a service invocation of an application. Associated metadata: Required resources (memory, # of processors) Volume of input/output data requirements Edges are dataflow dependences Annotated with data requirements VGrADS can create a service contract which can schedule the right resources at the right time. This is wide area, deadline driven contract negotiation.
29
What would our ideal outcome be?
A contract negotiator service. We pass it Workflow Requirements Schedule document. The negotiator returns a contract with details of the form You have cluster A for task x at 6:30 for 20 min. then you have cluster B and C and data archive V for 30 min at 7:00 …
30
Issues Time to completion is critical.
If the contract can’t be satisfied, then perhaps a reduced request can be made “ok, I don’t really need processors. How about 400?” The dynamic case is harder. Do I build a contract based on a worst-case storm scenario? Or just frequently renogotiate?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.