1 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Part VII: Future Challenges in Computational Workflows and.

Slides:



Advertisements
Similar presentations
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Advertisements

The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Alternate Software Development Methodologies
Using UML, Patterns, and Java Object-Oriented Software Engineering Royce’s Methodology Chapter 16, Royce’ Methodology.
A FRAMEWORK BASED ON WEB SERVICES ORCHESTRATION FOR BIOINFORMATICS WORKFLOW MANAGEMENT Laboratory for Bioinformatics (LBI), Institute of Computing (IC)
Integrated Scientific Workflow Management for the Emulab Network Testbed Eric Eide, Leigh Stoller, Tim Stack, Juliana Freire, and Jay Lepreau and Jay Lepreau.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 7: Expert Systems and Artificial Intelligence Decision Support.
© , Michael Aivazis DANSE Software Issues Michael Aivazis California Institute of Technology DANSE Software Workshop September 3-8, 2003.
Architectural Design Principles. Outline  Architectural level of design The design of the system in terms of components and connectors and their arrangements.
NGNS Program Managers Richard Carlson Thomas Ndousse ASCAC meeting 11/21/2014 Next Generation Networking for Science Program Update.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Effective Methods for Software and Systems Integration
Web-based design Flávio Rech Wagner UFRGS, Porto Alegre, Brazil SBCCI, Manaus, 24/09/00 Informática UFRGS.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Unit 2: Engineering Design Process
Model Bank Testing Accelerators “Ready-to-use” test scenarios to reduce effort, time and money.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Chapter 2 The process Process, Methods, and Tools
CPIS 357 Software Quality & Testing
RUP Implementation and Testing
Mihir Daptardar Software Engineering 577b Center for Systems and Software Engineering (CSSE) Viterbi School of Engineering 1.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
EMI INFSO-RI SA2 - Quality Assurance Alberto Aimar (CERN) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
Relationships July 9, Producers and Consumers SERI - Relationships Session 1.
1 The Software Development Process  Systems analysis  Systems design  Implementation  Testing  Documentation  Evaluation  Maintenance.
OBJECT ORIENTED SYSTEM ANALYSIS AND DESIGN. COURSE OUTLINE The world of the Information Systems Analyst Approaches to System Development The Analyst as.
What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
1 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 AAAI-08 Tutorial on Computational Workflows for Large-Scale.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
An Introduction to Software Engineering. Communication Systems.
1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Interactive Composition of Computational Pathways Jihie Kim Varun Ratnakar Students: Marc Spraragen (USC)
Software Engineering Prof. Ing. Ivo Vondrak, CSc. Dept. of Computer Science Technical University of Ostrava
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
1 USC INFORMATION SCIENCES INSTITUTE CAT: Composition Analysis Tool Interactive Composition of Computational Pathways Yolanda Gil Jihie Kim Varun Ratnakar.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath.
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
Portable Infrastructure for the Metafor Metadata System Charlotte Pascoe 1, Gerry Devine 2 1 NCAS-BADC, 2 NCAS-CMS University of Reading PIMMS provides.
The Software Development Process
Applications and Requirements for Scientific Workflow Introduction May NSF Geoffrey Fox Indiana University.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
ANALYSIS PHASE OF BUSINESS SYSTEM DEVELOPMENT METHODOLOGY.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
1 Artemis: Integrating Scientific Data on the Grid Rattapoom Tuchinda Snehal Thakkar Yolanda Gil Ewa Deelman.
Viewpoint Modeling and Model-Based Media Generation for Systems Engineers Automatic View and Document Generation for Scalable Model- Based Engineering.
All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.
Design and Planning Tools John Grosh Lawrence Livermore National Laboratory April 2016.
Model Based Engineering Environment Christopher Delp NASA/Caltech Jet Propulsion Laboratory.
VisIt Project Overview
Strategies for NIS Development
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Design and Manufacturing in a Distributed Computer Environment
Introduction to Design Patterns
Joseph JaJa, Mike Smorul, and Sangchul Song
Software Requirements
Model-Driven Analysis Frameworks for Embedded Systems
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Presentation transcript:

1 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Part VII: Future Challenges in Computational Workflows and Opportunities for AI Research AAAI-08 Tutorial on Computational Workflows for Large-Scale Artificial Intelligence Research

2 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Scientific Collaborations: Publications [from Science, April 2005]

3 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Sharing Data Collection: LIGO (ligo.caltech.edu)

4 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Sharing Computing Resources

5 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Ongoing Research

6 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Workflow Lifecycle [Deelman and Gil 06]

7 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Workflow Creation Workflow completion Automatically add data conversion and formatting components Workflows as components of other workflows Automatic workflow assembly from libraries of components [McDermott 02] [McIlraith & Son 03] [Blythe et al 04] … Interleaving workflow composition and execution [Gil et al 07] “Science of design” for computational workflows as software artifacts [Deelman & Gil 07] [Gil et al 07][Gil 08]

8 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Workflow Catalogs Workflow description and formal representation W3C semantic workflow language activity Workflow discovery [Goble et al 06] Workflow reuse and repurposing [Goderis et al 06] [Goderis et al 07] Query-based workflow matching [Horrocks and Li 02] [Baader 01] Workflow sharing [DeRoure & Goble 07]

9 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Workflow Learning 1)From a user’s demonstration of service invocations [Burstein et al 08] [Kim & Gil 08] 2)From tutorial instruction [Groth & Gil 08] 3)Generalizing from examples (from [Burstein et al 08])

10 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Five Opportunities for Future Research

11 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, ) Reduce Setup Cost -> Workflow as First Class Citizen in Scientific Research Today: Workflow design and implementation is costly Developed through collaboration –Application scientists in several areas, software engineers, distributed systems experts, etc. Developed over many months –Must adapt existing code, must create “glue” code Validated and refined over time Goal: Must be done by scientists themselves at minimal cost: To create them To understand them To learn to use them for research To adapt them for another purpose or analysis variant To refine/update them over time

12 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, ) Workflow Centered User Interaction Workflow template as selected method User visibility into the data analysis process User steering during execution based on results Interleaving generation and execution (data-driven adaptation) Recording provenance Automation of non-experiment critical, routine tasks

13 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, ) Workflows for Cross-Disciplinary Analyses -> Enable Integrative Science Today: Workflow systems can generate detailed provenance and metadata for new data products Describe individual datasets so they can be used by others Reuse of new data products by other systems is currently rare –Reuse is common within systems/communities Goal: Workflows generating data that is used across disciplines Meaningful reuse of data products (results) by other workflows True test of the utility of provenance and metadata information

14 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, ) Using Workflows for Educating New (and Old!) Scientists Today: Scientific analyses are less and less accessible to newcomers Steep learning curve that includes a variety of areas of expertise –Application science(s), modeling, software engineering, distributed computing, etc. Goal: Workflow systems could be configured to enable learning of additional capabilities on-demand Could isolate less proficient users from advanced capabilities while enabling them to learn and practice what they learn Everyone should be able to contribute as they learn

15 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, ) Workflows as Efficient Instruments of Systematic Exploration and Discovery Today: Workflows manually selected by user User decides what data/analysis to conduct Not a systematic exploration of space Visualization is only one way to understand results Human is bottleneck, current practice will not scale Goal: Workflows conduct automated heuristic discovery and pattern detection Automate systematic exploration of all possible workflows Formulate heuristics for scientific discovery: recurring domain- independent data analysis patterns [Simon 82] Search for patterns (or pattern types) Workflows could include pattern detection and discovery components

16 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Cyberinfrastructure: Not Just Big Iron “The Federal government must rebalance R&D investments to: Create a new generation of well-engineered, scalable, easy-to-use software suitable for computational science that can reduce the complexity and time to solution for today’s challenging scientific applications and can create accurate models and simulations that answer new questions Design, prototype, and evaluate new hardware architectures that can deliver larger fractions of peak hardware performance on key applications Focus on sensor- and data-intensive computational science applications in light of the explosive growth of data” President’s Information Technology Advisory Committee (PITAC) report on “Computational Science: Ensuring America’s Competitiveness”, May 2005

17 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Tomorrow’s Cyberinfrastructure Layers Enabled by Knowledge-Rich Workflow Systems [Gil 08] Resource Access Resource Sharing Data Services Application Tools Portals Workflow Systems Workflow Sharing Heuristic Discovery Portals Workflow-Centered Interfaces

18 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 “As We May Think” “Wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them […]. The lawyer has at his touch the associated opinions and decisions of his whole experience, and of the experience of friends and authorities. The patent attorney has on call the millions of issued patents, with familiar trails to every point of his client's interest. […] The chemist, struggling with the synthesis of an organic compound, has all the chemical literature before him in his laboratory, with trails following the analogies of compounds, and side trails to their physical and chemical behavior. […] There is a new profession of trail blazers, those who find delight in the task of establishing useful trails through the enormous mass of the common record. The inheritance from the master becomes, not only his additions to the world's record, but for his disciples the entire scaffolding by which [their additions] were erected.” --- Vannevar Bush,