1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.

Slides:



Advertisements
Similar presentations
A Method for Validating Software Security Constraints Filaret Ilas Matt Henry CS 527 Dr. O.J. Pilskalns.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
© Keith G Jeffery, Anne G S Asserson GL New Orleans Hyperactive Grey Objects Keith G Jeffery Director, IT & International Strategy CCLRC.
Configuration management
1 USC INFORMATION SCIENCES INSTITUTE Modeling and Using Simulation Code for SCEC/IT Yolanda Gil Varun Ratnakar Norm Tubman USC/Information Sciences Institute.
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
Automated Analysis and Code Generation for Domain-Specific Models George Edwards Center for Systems and Software Engineering University of Southern California.
Application architectures
CS 290C: Formal Models for Web Software Lecture 6: Model Driven Development for Web Software with WebML Instructor: Tevfik Bultan.
1 Yolanda Gil, PhD Information Sciences Institute and Department of Computer Science University of Southern California
Describing Syntax and Semantics
Application architectures
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Chapter 1 Overview of Databases and Transaction Processing.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
January, 23, 2006 Ilkay Altintas
Ontology-derived Activity Components for Composing Travel Web Services Matthias Flügge Diana Tourtchaninova
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
Designing Workflows: An Example from Image Analysis Yolanda Gil Information Sciences Institute University of Southern California October 17,
1 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Part II Designing Workflows AAAI-08 Tutorial on Computational.
1 USC INFORMATION SCIENCES INSTITUTE Modeling and Using Simulation Code for SCEC/IT Yolanda Gil Jihie Kim Varun Ratnakar Marc Spraragen USC/Information.
1 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Part VII: Future Challenges in Computational Workflows and.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
Secure Systems Research Group - FAU Classifying security patterns E.B.Fernandez, H. Washizaki, N. Yoshioka, A. Kubo.
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
Development Process and Testing Tools for Content Standards OASIS Symposium: The Meaning of Interoperability May 9, 2006 Simon Frechette, NIST.
WSMX Execution Semantics Executable Software Specification Eyal Oren DERI
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
1 Yolanda Gil Information Sciences InstituteFebruary 4, 2010 Metadata Meets Semantic Workflows Yolanda Gil, PhD Information Sciences Institute.
Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.
1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Interactive Composition of Computational Pathways Jihie Kim Varun Ratnakar Students: Marc Spraragen (USC)
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
Using and modifying plan constraints in Constable Jim Blythe and Yolanda Gil Temple project USC Information Sciences Institute
1 Limitations of BLAST Can only search for a single query (e.g. find all genes similar to TTGGACAGGATCGA) What about more complex queries? “Find all genes.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
16/11/ Semantic Web Services Language Requirements Presenter: Emilia Cimpian
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Khalid Belhajjame 1, Paolo Missier 2, and Carole A. Goble 1 1 University of Manchester 2 University of Newcastle Detecting Duplicate Records in Scientific.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Recommendations for caBIG to Support Semantic Workflows Yolanda Gil, PhD.
Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis.
Design-Directed Programming Martin Rinard Daniel Jackson MIT Laboratory for Computer Science.
CS223: Software Engineering
Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK
1 Chapter 2 Database Environment Pearson Education © 2009.
The Akoma Ntoso Naming Convention Fabio Vitali University of Bologna.
Ewa Deelman, Virtual Metadata Catalogs: Augmenting Existing Metadata Catalogs with Semantic Representations Yolanda Gil, Varun Ratnakar,
1 CEN 4020 Software Engineering PPT4: Requirement analysis.
1 Pegasus and wings WINGS/Pegasus Provenance Challenge Ewa Deelman Yolanda Gil Jihie Kim Gaurang Mehta Varun Ratnakar USC Information Sciences Institute.
CIMA and Semantic Interoperability for Networked Instruments and Sensors Donald F. (Rick) McMullen Pervasive Technology Labs at Indiana University
George Edwards Computer Science Department Center for Systems and Software Engineering University of Southern California
Of 24 lecture 11: ontology – mediation, merging & aligning.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
Chapter 1 Overview of Databases and Transaction Processing.
Maitrayee Mukerji. INPUT MEMORY PROCESS OUTPUT DATA INFO.
Semantic Workflows: Metadata Meets Computational Experiments
Model Discovery through Metalearning
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment.
USC Information Sciences Institute {jihie, gil,
CSSSPEC6 SOFTWARE DEVELOPMENT WITH QUALITY ASSURANCE
Health Ingenuity Exchange - HingX
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment Pearson Education © 2009.
Presentation transcript:

1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda Gil, PhD Information Sciences Institute and Department of Computer Science University of Southern California

2 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Outline Brief background on semantic workflows Semantic workflow representations in Wings Five uses of semantic workflows to assist users and their resulting requirements Reproducibility Validation Metadata generation Data discovery Workflow discovery Requirements for architecture components Ontology repositories and services Data/metadata catalogs and services Component/service catalogs and services Workflow catalogs and services

3 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Benefits of Semantic Workflows [Gil JSP-09] Execution management: Automation of workflow execution Managing distributed computation Managing large data sets Security and access control Provenance recording Low-cost high fidelity reproducibility Semantics and reasoning: Workflow retrieval and discovery Automation of workflow generation Systematic exploration of design space Validation of workflows Automated generation of metadata Guarantees of data pedigree “Conceptual” reproducibility

4 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Semantic Workflows in Wings [Kim et al CCPEJ 08; Gil et al IEEE eScience 09; Gil et al K-CAP 09; Kim et al IUI 06; Gil et al IEEE IS 2010] Workflows augmented with semantic constraints Each workflow constituent has a variable associated with it –Nodes, links, workflow components, datasets –Workflow variables can represent collections of data as well as classes of software components Constraints are used to restrict variables, and include: –Metadata properties of datasets –Constraints across workflow variables Incorporate function of workflow components: how data is transformed Reasoning about semantic constraints in a workflow Algorithms for semantic enrichment of workflow templates Algorithms for matching queries against workflow catalogs Algorithms for generating workflows from high-level user requests Algorithms for generating metadata of new data products Algorithms for assisting users w/creation of valid workflow templates

5 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Semantic Workflows in WINGS Workflow templates Dataflow diagram Each constituent (node, link, component, dataset) has a corresponding variable Semantic properties Constraints on workflow variables (TestData dcdom:isDiscrete false) (TrainingData dcdom:isDiscrete false)

6 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Semantic Constraints as Metadata Properties Constraints on reusable template (shown below) Constraints on current user request (shown above) [modelerInput_not_equal_to_classifierInput: (:modelerInput wflow:hasDataBinding ?ds1) (:classifierInput wflow:hasDataBinding ?ds2) equal(?ds1, ?ds2) (?t rdf:type wflow:WorkflowTemplate) > (?t wflow:isInvalid "true"^^xsd:boolean)]

7 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Outline Brief background on semantic workflows Semantic workflow representations in Wings Five uses of semantic workflows to assist users and their resulting requirements Reproducibility Validation Metadata generation Data discovery Workflow discovery Requirements for architecture components Ontology repositories and services Data/metadata catalogs and services Component/service catalogs and services Workflow catalogs and services

8 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Uses of Semantic Workflows: 1) Easily Replicate Previously Published Results A catalog of carefully crafted workflows of select state-of- the-art methods to cover a wide range of common analyses Many implementations of same algorithm, some proprietary Same implementation but new versions and bug fixes With such catalog, the effort involved in reproducing results is greatly reduced Semantics needed to assist users to use workflows correctly

9 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Resulting Requirements (1) Semantic representations of workflows need to abstract from software implementation Representing abstract classes of software components –Instances are the implemented codes –Workflow steps refer to component classes Representing abstract kinds of data (eg exclude format) Semantic reasoning needed to specialize workflow To map the abstract workflow into an execution-ready workflow To insert lower level steps (eg data transformations)

10 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Uses of Semantic Workflows: 2) Ensure Correct Use of State-of-the-Art Methods Analytic software and methods are well documented but all is text (papers, manuals, etc) Time consuming, hard to spot interdependencies, no validation Semantics needed to guide users to set up workflows correctly and customize them to their datasets and goals

11 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements (2) Semantic workflows can check constraints and guide users Representing requirements of software components –Constraints on input data –Constraints on parameter settings given properties of input data Representing metadata properties of datasets Semantic reasoning needed: To check constraints of each workflow step To propagate constraints across the workflow

12 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Uses of Semantic Workflows: 3) Automatic Generation of Metadata Metadata annotations are tedious and involved Often not done, an obstacle to sharing and to reuse Semantic workflows can automate the generation of metadata for analysis data products

13 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements (3) Semantic representations needed: Representing expected characteristics of output dataset for each software component given the input metadata Representing metadata properties of input datasets Semantic reasoning needed: To propagate metadata for each workflow step To propagate metadata across the workflow

14 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Uses of Semantic Workflows: 4) Discovery of Relevant Data Need a dataset of updated common (known) loci to annotate findings, where can I find one? Workflows reused from a catalog may require additional data besides what is provided by the user Semantic workflows can help identify characteristics of required datasets and query data catalogs to find them for the user

15 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements (4) Semantic representations needed: Metadata properties of any additional input datasets in the workflow, including: –Default properties for the given workflow –Augmented properties that result from the specific input data provided by the user Semantic reasoning needed: Propagation of semantic constraints through the workflow Formulation of queries to data catalogs based on semantic properties required of datasets in the workflow

16 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Uses of Semantic Workflows: 5) Retrieval of Workflows Hard to find workflows for the type of analysis a user wants Semantic information is not provided when creating the workflow However, retrieval queries are often based on metadata properties of data –e.g., “Find workflows that can normalize data which is continuous and has missing values [<- constraints on inputs] to create a decision tree model [constraint on intermediate data products]” Semantic workflows needed to augment user-provided workflows with semantic constraints from metadata catalogs and component catalogs

17 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements (5) Semantic representations are needed: For workflow constituents –Metadata properties of input, intermediate and final data products –Metadata properties of workflow and component function For user queries –Express workflow sketches containing partial data descriptions (constraints) Reasoning capabilities Automatic creation of metadata for expected workflow data products Workflow matching to queries (exact and partial)

18 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Outline Brief background on semantic workflows Semantic workflow representations in Wings Five uses of semantic workflows to assist users and their resulting requirements Reproducibility Validation Metadata generation Data discovery Workflow discovery Requirements for architecture components Ontology repositories and services Data/metadata catalogs and services Component/service catalogs and services Workflow catalogs and services

19 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements on Core Ontology Repositories and Services Component/service ontologies Extend with semantic representations that support reasoning, not just their execution Workflow ontologies Develop workflow ontologies that enable shared workflow repositories Develop semantic layer for the workflow ontologies –Workflow steps must be able to represent component classes –Support reasoning about workflows in all architecture components

20 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements on Data/Metadata Catalogs and Services Representing abstracts kinds of data (eg exclude format) Representing metadata properties that are relevant to data analysis Eg: the organization that contributed the data may be less relevant than the instrument used to collect it, its calibration, its quality and accuracy, etc.

21 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements on Component/Service Catalogs and Services Represent abstract classes of software components Instances correspond to implemented codes/services Represent constraints on input data Metadata properties that make the component appropriate for a given input dataset Represent constraints on output data Metadata properties of expected input datasets given the required outcome of the execution of the component Represent constraints on parameter values Constraints on parameter settings given properties of input or output data Represent how metadata properties of inputs is related to metadata of outputs Metadata properties of output datasets given the properties of the input datasets

22 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements on Workflow Catalogs and Services Semantic reasoning to specialize workflows Given user requirements and a high-level workflow, automatically generate valid execution-ready workflows Automatically insert lower level steps when needed (eg data format conversions) Semantic reasoning to propagate constraints of each workflow step Check constraints of each workflow step and propagate them throughout the workflow Incorporate constraints coming from the user’s requirements with constraints from the individual steps of the workflow Formulation of data catalog queries based on the metadata properties of a given dataset in the workflow Workflow discovery and matching for a given user query Need a language to express user queries as workflow sketches containing partial data descriptions (constraints) and partial dataflow patterns Need semantic reasoning for matching such queries, both exact and partial matching