Discovery Workflow: (ServiceFlow) Programming the Grid Prof. Yike Guo Imperial College London.

Slides:



Advertisements
Similar presentations
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Advertisements

Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
Copyright Discovery Net Imperial College SARS Analysis on the Grid Discovery Net in Bioinformatics.
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Discovery Net : A UK e-Science Pilot Project for Grid-based Knowledge Discovery Services Patrick Wendel Imperial College, London Data Mining and Exploration.
Performance-responsive Middleware for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems.
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
Making Workflows Work  Prof. Yike Guo  Dept. of Computing  Imperial College London  InforSense Limited 
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
Yike Guo/Jiancheng Lin InforSense Ltd. 15 September 2015 Bioinformatics workflow integration.
Multiple Examples of tumor tissue (public data from Whitehead/MIT) SVM Classification of Multiple Tumor Types DNA Microarray Data Oracle Data Mining 78.25%
Life Sciences Integrated Demo Joyce Peng Senior Product Manager, Life Sciences Oracle Corporation
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Recording application executions enriched with domain semantics of computations and data Master of Science Thesis Michał Pelczar Krakow,
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.
1 Another group of Patterns Architectural Patterns.
Orchestration of an OGSI-enabled scientific application using the Business Process Execution Language Ben Butchart Wolfgang Emmerich University College.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Scalable Clustering on the Data Grid Patrick Wendel Moustafa Ghanem Yike Guo Discovery Net Department of Computing Imperial College,
INFSO-RI Enabling Grids for E-sciencE V. Breton, 30/08/05, seminar at SERONO Grid added value to fight malaria Vincent Breton EGEE.
4 th Annual EPSRC e-science meeting The need for e-Science An industrial perspective Stephen Calvert – VP Cheminformatics GSKYike Guo – Imperial College.
Issues in (Financial) High Performance Computing John Darlington Director Imperial College Internet Centre Fast Financial Algorithms and Computing 4th.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
A Practical Approach to Metadata Management Mark Jessop Prof. Jim Austin University of York.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
6 February 2009 ©2009 Cesare Pautasso | 1 JOpera and XtremWeb-CH in the Virtual EZ-Grid Cesare Pautasso Faculty of Informatics University.
Mining the Biomedical Research Literature Ken Baclawski.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 System models.
XML-Based Grid Data System for Bioinformatics Development Noppadon Khiripet, Ph.D Wasinee Rungsarityotin, MS Chularat Tanprasert, Ph.D Royol Chitradon.
Data Integration in Bioinformatics Using OGSA-DAI The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones, Richard White (Cardiff.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Some comments on Portals and Grid Computing Environments PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics,
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
SEA Side – Extreme Programming 1 SEA Side Software Engineering Annotations Architectural Patterns Professor Sara Stoecklin Director of Software Engineering-
Distributed Data Mining in Discovery Net Dr. Moustafa Ghanem Department of Computing Imperial College London.
Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.
Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
High throughput biology data management and data intensive computing drivers George Michaels.
MSF and MAGE: e-Science Middleware for BT Applications Sep 21, 2006 Jaeyoung Choi Soongsil University, Seoul Korea
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
ACGT Architecture and Grid Infrastructure Juliusz Pukacki ‏ EGEE Conference Budapest, 4 October 2007.
Genomic Medicine Grid Juan Pedro Sánchez Merino Instituto de Salud Carlos III
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Building Enterprise Applications Using Visual Studio®
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Core Platform The base of EmpFinesse™ Suite.
NSDL Data Repository (NDR)
Presentation transcript:

Discovery Workflow: (ServiceFlow) Programming the Grid Prof. Yike Guo Imperial College London

Discovery Net Funding : –One of the Eight UK National e-Science Projects (£2.4 M) Key Features: –Allow Scientists to Construct, Share and Execute Complex Knowledge Discovery Procedures & Services –Allow Institutions to Integrate, Manage and Utilise its Intellectual Property Applications: –Life Science –Environmental Modelling –Geo-hazard Prediction Goal : Constructing the World’s First Infrastructure for Global Wide Knowledge Discovery Services Using GRID Resources Scientific Information Scientific Discovery In Real Time Literature Databases Operational Data Images Instrument Data Real Time Data Integration Dynamic Application Integration Discovery Services Integrative Knowledge Management Discovery Workflow

Workflow Paradigms Business Processes Management: Workflow describes interaction and collaboration between different entities (inter/intra organization) –Workflow co-ordinates events and actions –Business Process Re-engineering/ Business Process Management Application Integration: Workflow provides Glue to integrate distributed applications. –Workflow describes composition of individual programs/components/applications –Leverage distributed software resources –Mechanisms for moving data and results –A natural model for service composition Data Integration and Analysis: Workflow describes how a particular data results (value, table,..aggregation, cluster and predicative model) have been generated and related –Workflow defines a virtual schema –Workflow for planning distributed query –Workflow defines a series of data transformation and analysis tasks Resource Planning: Workflow describes the action logic of computation –Workflow plans a computational task –Workflow provides audit trial of a complex process –Workflow models a communicating (control/data flow) protocol of a complex system (Petri Net).

Commercial Workflow Products Business WORKFLOWS Oracle Workflow Staffware Carnot Maximum COSA Workflow Eastman EW InConcert InTempo MQSeries Workflow SERfloware TeamWARE Flow Dolphin Visual Workflow W4 WFX BizFlow Web Service WORKFLOWS WSFL XLANG BPEL4WS BPML SCUFL Analytical WORKFLOWS Clementine SAS Kwiz Scitegic HPC Job Scheduling WORKFLOWS Unicore LSF TurboWorx

Discovery Workflow Design: Presenting Process Knowledge Capture of theories, analytical processes in science and business and the information relationships among processes Integrating of data and applications in a process Representing integrative knowledge Sharing and reuse of workflows through templates Organizing and planning complex processes in science and business Capture provenance and auditable history of processes Management and deployment of workflow for sharing process knowledge Discovery Workflow

Discovery Workflow Technology : Towards Compositional Services Workflow Execution A Compositional GRID Workflow Management Collaborative Knowledge Management Workflow Deployment: Grid Service and Portal Workflow Warehousing Resource Mapping Service Abstraction Workflow Authoring Composing Services Condor-G Native MPI OGSA-service Web Service Unicore Oralce 10g Web Wrapper Sun Grid Engine

Discovery Workflow: Language Issues Language for informatics process---rich data model –Data Access –Data Cleaning/Transformation/Processing –Data Analysis Language for open informatics process---integration capacity –Integration of data resource –Integration of applications –Integration of services Language for open informatics process management---integrative knowledge representation and management –Workflow for integrative knowledge representation –Rich meta data model for process provenance –Rich operations over workflow for abstraction, composition, storing, searching and deploying workflow Language for distributed service computing---support grid computing model –Composing distributed services –Mapping a workflow to distributed resources

Discovery Workflow: System Issues Support easy end user access –Powerful visual workflow editor and visual analysis –Automatic workflow capturing technology Support collaborative work –Groupware support for workflow construction Support enterprise infrastructure –J2EE compliant workflow middleware –Oracle based workflow management –Web service integration and deployment Support grid computing model –Automatic mapping workflow to grid resources –Compositing grid services using workflow –Optimal scheduling of workflow execution on a grid environment

Workflow for Information Process Rich Data Models: Table, text, sequence, stream, image, XML….

Power of Rich Data Model: Workflow for Multi-Modality Analysis Data mining Text mining Spectrum data mining chemical/sequence data model

Workflow for Open Informatics Workflow = Compositional Process by Dynamic Application/Service Integration instrument (remote/local) services data base Desktop application

Integrating an Application: Action Abstraction Applications/Services Functional Abstraction (parameters+meta data) Provenance Abstraction (history+control protocol) Data Abstraction (data type mapping)

Deploying a Composed Application: Workflow Parameterisation Workflow Functional Abstraction (parameters+meta data) Provenance Abstraction (history+control protocol) Data Abstraction (data type mapping) Web service Super node Wizard

An Oracle 10g Example (The Lymphoma Example) Generate and Compile Code Choose A Build Choose algorithm/ parameter Accuracy Testing View Model Feature Selection Sequence Search

Workflow with Oracle 10g Seamlessly Integrated Pre-processing Naïve Bayes Mining and Evaluation Adaptive NB Mining and Evaluation Apply Classification Naïve Bayes Mining, Evaluation & BLAST Oracle Components

Workflow for Integrative Knowledge Representation Workflow for Knowledge Integration = Dynamically construction of schema to organise related cross-domain analysis results and background knowledge Towards a Knowledge Schema Framework for integrative knowledge –A Mechanism of indexing and cross- annotating related analytical results: workflow as a schema for integrative knowledge : representing related knowledge by their generation process. –Workflow based knowledge management: workflow indexing, workflow ontology, workflow provenance form the base for building a process knowledge base Sequence Analysis Text Mining Genetic Analysis Pathway Analysis WF clinic WF Screening WF Genetics WF literature WF chemistry WF sequence workflow warehousing

Workflow for Distributed Service Computing Workflow = execution plan of distributed service computing Grid computing mapping = automatic mapping and scheduling workflow over distributed resources Discovery workflow can be mapped to various scheduling system : LSF, Sun GridEngine, Unicore and Condor(DAGman) Discovery workflow can be deployed as OGSA compliant grid services Text Analysis Clustering Classification Gene function perdition Homology Search Discovery workflow Image processing

Workflow Deployment  workflow parameterisation  Define actions  Deploy Significance threshold Sample size Plot dot size View scatter plot View table of interesting genes Effect threshold Volcano plot

Deploying Workflow as New Application/Service Parameters Actions Result

Using Distributed Resources Scientific Information Scientific Discovery Literature Databases Operational Data Images Instrument Data Discovery Workflow: Programming the Girds Real Time Data Integration Dynamic Application Integration Discovery Services Integrative Knowledge Management Discovery Workflow

Demo: China SARS Virtual Lab Relationship between SARS and other virus Mutual regions identification Homology search against viral genome DB Annotation using Artemis and GenSense Gene prediction Phylogenetic analysis Exon prediction Splice site prediction Immunogenetics Multiple sequence alignment Microarray analysis Bibliographic databases Key word search GeneSense Ontology D-Net: Integration, interpretation, and discovery Epidemiological analysis Predicted genes SARS patients diagnosis Homology search against protein DB Homology search against motif DB Protein localization site prediction Protein interaction prediction Relationship between SARS virus and human receptors prediction Classification and secondary structure prediction Bibliographic databases Genbank Annotation using Artemis and GenSense

Discovery Net Service Composition

Compositional Services for SARS Mutation Analysis  50 data resource, with scaling up to 1000’s  > 200 software applications and services Designed on top of the Web service environment Used by more than 100 scientists for SARS analysis

Resource Mapping

Service Abstraction and Workflow Deployment

Service Abstraction by Parameterizing Workflow

Executing Deployed Service through Portal

Further Composing of Deployed Services

Workflow Warehousing

Conclusion Discovery Net is developing an advanced scientific workflow technology Discovery workflow is more than just a scientific workflow system. It offers much more: –Discovery workflow enables dynamic build new services by integrating data/application/service-----a powerful EAI (enterprise application integration) tool –Discovery workflow is a uniform mean for integrative knowledge representation and management –Discovery workflow provides a systematic mechanism of mapping compositional services over distributed resources Discovery workflow: towards a language of programming the GRID