Download presentation
Presentation is loading. Please wait.
Published byGwendoline Holmes Modified over 9 years ago
1
1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information Sciences Institute gil@isi.edu www.isi.edu/~gil In collaboration with others in the Intelligent Systems Division and the Center for Grid Technologies at USC/ISI including: Ewa Deelman, Carl Kesselman, Jim Blythe Supported in part by NSF’s GriPhyn and SCEC/CME projects, and by internal grants from USC/ISI INFORMATION SCIENCES INSTITUTE
2
2 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Outline Motivation Large-scope large-scale science Challenges and opportunities for Artificial Intelligence Research on workflow planning at USC/ISI Using AI techniques in Pegasus to generate executable grid workflows Future directions in support of scientific workflows Intelligent interactive assistance and automatic completion Active workflows Cognitive grids Knowledge infrastructure for science Challenges in Community-Based Knowledge Capture and Representation
3
3 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil The Southern California Earthquake Center’s Community Modeling Environment (SCEC-CME) (http://iowa.usc.edu/cmeportal/)
4
4 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Integrating Diverse Models of Complex Phenomena… Fault modelsFault ruptures Wave propagation Historic records Site response modelsEffect on structures
5
5 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil …for Broader Use Geophysicists, civil and structural engineers, city planners, emergency managers, … Analyze seismic hazard Learn and understand seismic hazard Of course, scientists need this infrastructure as well!
6
6 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Not Just Large-Scale and HPC Issues: Large-Scope Science and Engineering Research “Whereas large-scale means increasing the resolution of the solution to a fixed physical model problem, large- scope means increasing the physical complexity of the model itself. Increasing the scope involves adding more physical realism to the simulation, making the actual code more complex and heterogeneous, while keeping the resolution more or less constant.” -- Report from ACM Workshop on Strategic Directions in Computing Research, A. Sameh et al on Computational Science and Engineering, June 1996
7
7 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil How This is Done Today Scientists: Verbal communication needed to compose models When an earthquake occurs, hard to respond quickly Other users (e.g., building engineers): Use models based on correlations of historical data Employ consultants that know how to setup these models Delay in accessing state-of-the-art scientific models
8
8 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Scientific Workflows Models composed into end-to-end scientific workflows that model/analyze complex physical phenomena In-silico experimentation Data collection and analysis Reproducibility, reusability, pedigree Hazard Curve Calculator: SA vs. prob. exc. SA exc. probs. SA exc. prob. Rupture Ruptures Site VS30 Site Basin-Depth-2.5 SA Period Gaussian Truncation Std. Dev. Type Task Result: Hazard curve: SA vs. prob. exc. Hazard curve: SA vs. prob. exc. Field (2000) IMR: SA exc. prob. Basin-Depth Calculator Basin-DepthLat Long. UTM Converter (get-Lat-Long- given-UTM) Lat. long UTM (,,, ) Lat Long. CVM-get- Velocity- at-point Velocity Lat Long. Ruptures PEER-Fault Gaussian Dist No Truncation Total Moment Rate Duration-Year Fault-Grid-Spacing Rupture Offset Mag-Length-sigma Dip Rake Magnitude (min) Magnitude (max) Magnitude (mean) rfml
9
9 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Executing Scientific Workflows on Grids Grids support this process through middleware services: Seamless integration and management of resources (OGSA) Job submission (Condor) Resource Monitoring and Directory Service (MDS) Replica Location Service (RLS) Metadata Catalog Services (MCS) R Discovery Many sources of data, services, computation R Registries organize services of interest to a community Access Data integration activities may require access to, & exploration/analysis of, data at many locations Exploration & analysis may involve complex, multi-step workflows RM Resource management is needed to ensure progress & arbitrate competing demands Security service Security service Policy service Policy service Security & policy must underlie access & management decisions From [Kesselman 04]:
10
10 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
11
11 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Challenges Complexity: Many choices are involved as workflow is composed Alternative application components, files, and locations Many different interdependencies may occur among components May reach many dead ends Usability: Users should not need to be aware of infrastructure details Files are distributed, indexed, replicated Match application requirements to host capabilities Solution cost: Evaluate the alternative solution costs Performance Reliability Resource Usage Global cost: minimizing cost across organizations Individual user’s choices in light of other user’s choices Reliability of execution: job resubmission upon failure Detection, diagnosis, repair Anticipation and avoidance, resource reservations
12
12 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Challenges and opportunities for Artificial Intelligence We need alternative foundations that offer expressive representations to capture the complex knowledge involved in both the application domain and the execution environment flexible reasoners to explore this complex space systematically and incorporate constraints, tradeoffs, policies Many Artificial Intelligence (AI) techniques are relevant: –Planning to achieve given requirements –Searching through problem spaces of related choices –Using and combining heuristics –Reasoners that can incorporate rules, definitions, axioms, etc. –Schedulers and resource allocation techniques –Coordination and communication in distributed problem solving –Expressive knowledge representation languages –Reasoning under uncertainty –Dynamic replanning and reactive control –Learning in complex dynamic environments –Learning to improve problem solving skills
13
13 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Outline Motivation Scientific workflows Challenges and opportunities for Artificial Intelligence Research on workflow planning at USC/ISI Using AI techniques in Pegasus to generate executable grid workflows Future directions in support of scientific workflows Intelligent interactive assistance and automatic completion Active workflows Cognitive grids Knowledge infrastructure for science Challenges in Community-Based Knowledge Capture and Representation
14
14 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Workflow Generation Reasoning about Distributed Execution Infrastructure in Grids with Pegasus (work with J. Blythe, E. Deelman, C. Kesselman, and others) [Gil et al, IEEE IS 04]
15
15 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Pegasus: Using AI Planning Techniques to Generate Executable Grid Workflows Given: desired result and constraints A desired result (high-level description of data product) A set of application components described in the grid A set of resources in the grid (dynamic, distributed) A set of constraints and preferences on solution quality Find: an executable job workflow A configuration of components that generates the desired result A specification of resources where components can be executed and data can be stored A specification of data sources and data movements Approach: Use AI planning techniques to search the solution space and evaluate tradeoffs Exploit heuristics to direct the search for solutions and represent optimality and policy criteria
16
16 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Advantages of Using AI Planning Provide broad-base, generic foundation Use general techniques to search for solutions Explores alternatives, supports backtracking Incorporates domain-specific and domain-independent heuristics (as search control rules) Allow easy addition of new constraints and rules Incorporate optimality and policy into the search for solutions Interleave decisions at various levels Can integrate the generation of workflows across users and policies within virtual orgs.
17
17 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Reasoning about Workflows in Pegasus a d e g h c f i b Data processing tasks KEY The original node Input transfer node Registration node Output transfer node Unnecessary nodes e g h d a c f i b Final Workflow a Desired Results h f i
18
18 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Pegasus Application Domains (work with E. Deelman and dozens of scientists) Pulsar search for gravitational- wave physics (LIGO) 975 tasks, 1365 data transfers, 975 output files, 96hrs runtime Galaxy morphology for NVO and NASA in Montage Thomography for neural structure reconstruction High-energy physics – Compact Muon Solenoid 7 days, 678 jobs, produced ~200GB Gene alignment In 24 hours, ~ 10,000 Grid jobs, >200,000 BLAST executions, produced 50 GB
19
19 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Small Montage Workflow ~1200 nodes [Deelman et al, 04]
20
20 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artemis: Integrating Distributed Info Sources on the Grid (work with E. Deelman, S. Thakkar, R. Tuchinda) Data Source Models Data Source Filters Entity selection Ontology User Query Wizard Data Source Metadata Catalog Services Model mappings Dynamic Model Generator Prometheus Query Mediator Theseus query execution [Tuchinda et al, IAAI-04] Metadata Catalog Services Metadata Catalog Services …
21
21 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Outline Motivation Scientific workflows Challenges and opportunities for Artificial Intelligence Research on workflow planning at USC/ISI Using AI techniques in Pegasus to generate executable grid workflows Future directions in support of scientific workflows Intelligent interactive assistance and automatic completion Active workflows Cognitive grids Knowledge infrastructure for science Challenges in Community-Based Knowledge Capture and Representation
22
22 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Scientific Workflows: Future Directions Using AI to support the workflow creation process Interactive assistance and automatic completion Using AI to support the scientific experimentation process Active workflows Using AI to augment the execution infrastructure Cognitive grids
23
23 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil The Process of Creating an Executable Workflow 1. Creating a valid workflow template (human guided) Selecting application components and connecting inputs and outputs Adding other steps for data conversions/transformations 2. Creating instantiated workflow Providing input data to pathway inputs (logical assignments) 3. Creating executable workflow (automatically) Given requirements of each model, find and assign adequate resources for each model Select physical locations for logical names Include data movement steps, including data deposition steps User guided Automated
24
24 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Challenges for Interactive Composition of Valid Workflow Templates Provide flexible interaction User can start from initial data, from data products, or steps User can specify abstract descriptions of steps and later specialize them User can reuse, merge, or build from scratch Automatic tracking of workflow constraints User is notified if there are problems but does not have to keep track of details Proactive assistance System should not just point out problems but help user by suggesting fixes (always) And… how do we define what “valid” means?
25
25 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Assisting Users in Creating Workflow Templates (with J. Kim and M. Spraragen) User interaction results in modifications to workflows Specify desired result, external/user provided input Add/remove step, add/remove link Specialize step (e.g., IMR -> IMR-SA) As user creates a workflow, intermediate stages result in possibly incorrect workflows ErrorScan algorithm detects errors and generates possible fixes Knowledge base that represents components and constraints Formal definitions of desirable properties of workflows based on AI planning techniques Fixes are multi-step and “click-through” Errors and fixes are ranked using heuristics If no errors detected, workflow is guaranteed to be correct [Kim et al, IUI-04] [Spraragen et al, 04]
26
26 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Scientific Workflows: Future Directions Using AI to support the workflow creation process Interactive assistance and automatic completion Using AI to support the scientific experimentation process Active workflows Using AI to augment the execution infrastructure Cognitive grids
27
27 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Supporting the Interactive and Incremental Nature of Scientific Exploration (with M. Ellisman, E. Deelman, C. Kesselman) Workflows cannot always be created in advance Experimental design depends on initial / partial results Scientific experimentation is often exploratory Need to support interactive and incremental creation and execution of workflows Active workflows : represent evolving workflows and are continually authored, refined, executed, and modified
28
28 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Supporting the Evolution of Active Workflows (I)
29
29 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Supporting the Evolution of Active Workflows (II)
30
30 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Supporting the Evolution of Active Workflows (and III)
31
31 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Scientific Workflows: Future Directions Using AI to support the workflow creation process Interactive assistance and automatic completion Using AI to support the scientific experimentation process Active workflows Using AI to augment the execution infrastructure Cognitive grids
32
32 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Simulation codes Community Distributed Resources (e.g., computers, storage, network, simulation codes, data) Resource Indexes Replica Locators Other Grid services Application KB Resource KB Policy KB Other KB Policy Information Services Pervasive Knowledge Sources Policy Management Resource Matching Workflow Repair Workflow Refinement Workflow history Workflow history Workflow History Smart Workflow Pool Workflow Manager High-level specification of desired results, constraints, requirements, user policies Intelligent Reasoners Pervasive Knowledge Sources and Reasoners (work with J. Blythe, E. Deelman, C. Kesselman, H. Tangmurarunkit) [Gil et al, IEEE IS 04]
33
33 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Cognitive Grids: Pervasive Semantic Representations of the Environment at all Levels Basic Grid Middleware (Globus Toolkit, Condor-G, DAGMan) Higher-Level Service (Virtual Data Tools, Resource Brokers) Intelligent Reasoners (matchmaking, refinement, repair, coordination, negotiation…) Users and Applications Semantic Resource Descriptions Resource Knowledge- bases Application Component Models Resource Policy Descriptions User and VO policy models Grid Resources (Compute, Data, Network) Policy Knowledge- bases Current Request Status, Results, Provenance Information High-level Request descriptions Refined Workflow Provenance and Monitoring Tasks Monitoring, Resources knowledge Semantics for File-based data
34
34 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil time Levels of abstraction Application -level knowledge Logical tasks Tasks bound to resources and sent for execution User’s Request Relevant components Full abstract workflow Partial execution Not yet executed Workflow refinement Onto-based Matchmaker Workflow repair Policy reasoner Cognitive Grids: Distributed Intelligent Reasoners that Incrementally Generate the Workflow
35
35 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Many Opportunities for AI Techniques The Grid Now The Future Grid Syntax-based matchmaking of resources to job requirements Condor matchmaker Attribute based discovery and selection Scheduling of jobs based on Grid- able users that specify job execution sequences and computing requirements Scripting languages Workflow languages, Task graphs Explicit mappings from task to jobs, simple job brokers Explicit service negotiation and recovery strategies Knowledge-based reasoning about resources enables Semantic matchmaking Aggregate resource reasoning Task-level reasoning to plan and schedule jobs and resources More agility and coordination Wide range of users can specify high level requirements in a mixed-initiative mode Mapping of high-level requirements to details required for execution End-to-end resource negotiation and adaptive strategies to accommodate failure
36
36 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Outline Motivation Scientific workflows Challenges and opportunities for Artificial Intelligence Research on workflow planning at USC/ISI Using AI techniques in Pegasus to generate executable grid workflows Future research in support of scientific workflows Intelligent interactive assistance and automatic completion Active workflows Cognitive grids Knowledge infrastructure for science Challenges in Community-Based Knowledge Capture and Representation
37
37 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Knowledge Infrastructure for Science: Challenges in Community-Based Knowledge Capture & Representation 1. be a community-wide effort 2. have community-wide acceptance 3. be used in practice on a daily basis to compose simulation code and annotate their results
38
38 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Scientists Ask Lots of Questions, Knowledge Representation has few Answers How do you get started? How to ensure the community will accept it (use it)? How do you (can you?) represent alternative views? What is the process to contribute to it? What is the process to make changes to it? What is the impact to my application when there is an update? How is it implemented? How is it managed? Who does what, when, where, why?
39
39 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil SCEC/GO Workshop on Ontology Development: Lessons Learned and Prospects [Bada et al, forthcoming] SCEC learns from the Gene Ontology (GO) experience (Workshop Nov’02, Cambridge UK): Had a successful jumpstart Done by biologists, not knowledge engineers Developed by a wide, distributed community Focused on specific aspects of genomics –Fly-base, yeast, mouse Used 24/7 from day 1 Accepted widely by the community Extended based on use requirements of a wide community Quite large (13K terms) Simple (and messy) representation Simple infrastructure Process to accommodate changes, curation
40
40 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Some Policies for Organizing Contributions Curated by knowledge engineers: processes changes requested by users http://www.ecocyc.org Curated by domain experts: group of domain curators processes changes requested by users http://www.geneontology.org Open contributions: any user can add content http://www.dmoz.org, http://www.openmind.org http://www.dmoz.org Open editing: any user can edit and create any page on a web site. http://wiki.org
41
41 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Broad Range of Contributors of Scientific Knowledge (with T. Chklovski) <<< >> <>>>>> More inexpensive More inaccurate More ambiguous Deeper into society/impact <subclassOf foton … <>>>> More expensive More accurate More concrete Deeper into the science
42
42 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Thank you! Scientific workflows pegasus.isi.edu Cognitive grids www.isi.edu/ikcap/cognitive-grids AI and science IEEE Intelligent Systems Jan/Feb 2004, De Roure, Gil, Hendler (Eds), Special issue on e-Science www.isi.edu/~gil
43
43 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil “As We May Think” “Wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them […]. The lawyer has at his touch the associated opinions and decisions of his whole experience, and of the experience of friends and authorities. The patent attorney has on call the millions of issued patents, with familiar trails to every point of his client's interest. […] The chemist, struggling with the synthesis of an organic compound, has all the chemical literature before him in his laboratory, with trails following the analogies of compounds, and side trails to their physical and chemical behavior. […] There is a new profession of trail blazers, those who find delight in the task of establishing useful trails through the enormous mass of the common record. The inheritance from the master becomes, not only his additions to the world's record, but for his disciples the entire scaffolding by which [their additions] were erected.” --- Vannevar Bush, 1945 http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm
44
44 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Searching for Pulsars with the Pegasus Planner Used AI planning techniques to compose executable grid workflows with hundreds of jobs Laser-Interferometer Gravitational Wave Observatory (LIGO) data, which aims to detect waves predicted by Einstein’s theory of relativity Used LIGO’s data collected during the first scientific run of the instruments in Fall 2002 Targeted a set of 1000 locations of known pulsars as well as random locations in the sky Performed using compute and storage resources at Caltech, University of Southern California, and University of Wisconsin Milwaukee.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.