E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining.

Slides:



Advertisements
Similar presentations
© Fraunhofer Institute SCAI and other members of the SIMDAT consortium Data Grids for Process and Product Development using Numerical Simulation and Knowledge.
Advertisements

Terminologies: An e-Science perspective Nicholas Gibbins Intelligence, Agents, Multimedia University of Southampton.
IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
Semantic Web Agents: Hope or Hype Nicholas Gibbins School of Electronics and Computer Science University of Southampton.
RightField The Semantic Annotation of Experimental Data using Spreadsheets, The Semantic Annotation of Experimental Data using Spreadsheets, Katy Wolstencroft,
1.Data categorization 2.Information 3.Knowledge 4.Wisdom 5.Social understanding Which of the following requires a firm to expend resources to organize.
Semantic Web Services Peter Bartalos. 2 Dr. Jorge Cardoso and Dr. Amit Sheth
Accelerating Time to Experiment – The myExperiment Approach to Open Science David De Roure Carole Goble Jiten Bhagat.
Simon Woodman Hugo Hiden Paul Watson Jacek Cala. Outline 1. What is e-Science Central? 2. Architecture and Features 3. Workflows and Applications.
E-lico planner/DM assistant DMO Taverna WF Rapid-Miner WF 2. WFs converted to run in other applications myExperiment E-lico provenance repository WF execution.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Workflow discovery in e-science Antoon Goderis Peter Li Carole Goble University of Manchester, UK
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
Support for Automatic Workflow Composition in Semantic Grid Environemnt Tomasz Gubała, Marian Bubak, Maciej Malawski Institute of Computer Science and.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 7: Expert Systems and Artificial Intelligence Decision Support.
Building Knowledge-Driven DSS and Mining Data
Advanced Data Mining and Integration Research for Europe ADMIRE – Framework 7 ICT ADMIRE Overview European Commission 7 th.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
WPS Application Patterns at the Workshop “Models For Scientific Exploitation Of EO Data” ESRIN, October 2012 Albert Remke & Daniel Nüst 52°North Initiative.
INFSO-SSA International Collaboration to Extend and Advance Grid Education ICEAGE Forum Meeting at EGEE Conference, Geneva Malcolm Atkinson & David.
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
Taverna in e-Lico  e-Lico is an EU Project ( ) to create a virtual laboratory for data mining and data-intensive sciences  Main partners: –University.
Taverna and my Grid Basic overview and Introduction Tom Oinn
14/11/11 Taverna Roadmap Shoaib Sufi myGrid Project Manager.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
Per Møldrup-Dalum State and University Library SCAPE Information Day State and University Library, Denmark, SCAPE Scalable Preservation Environments.
ASG - Towards the Adaptive Semantic Services Enterprise Harald Meyer WWW Service Composition with Semantic Web Services
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Structural.
The Science of Cyber Security Laurie Williams 1 Figure from IEEE Security and Privacy, May-June 2011 issue.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
E-science in the Netherlands Maria Heijne TU Delft Library Director / Chair Consortium of University Libraries and National Library.
Spreadsheets to OWL with Populous 8/12/2011 Mikel Egaña Aranguren 3205 School of Computer Science Universidad Politécnica de Madrid (UPM) Boadilla.
Domain-Specific Languages for Composing Signature Discovery Workflows Ferosh Jacob*, Adam Wynne+, Yan Liu+, Nathan Baker+, and Jeff Gray* *Department of.
1 A National Virtual Specimen Database for Early Cancer Detection June 26, 2003 Daniel Crichton NASA Jet Propulsion Laboratory Sean Kelly NASA Jet Propulsion.
E-Science for the SKA WF4Ever: Supporting Reuse and Reproducibility in Experimental Science Lourdes Verdes-Montenegro* AMIGA and Wf4Ever teams Instituto.
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics.
Distributed Aircraft Maintenance Environment - DAME DAME Workflow Advisor Max Ong University of Sheffield.
Max Ong University of Sheffield, UK. AHM 2004 Session 2.3: Workflow Composition, Wednesday 1 st September 2004, 4pm. Workflow Advisor in DAME Abstract.
Managing and communicating uncertainty in geospatial web service workflows Richard Jones, Dan Cornford, Lucy Bastin, Matthew Williams Computer Science,
SCAP E SCAPE Project EU project aimed at building a scalable platform for planning and execution of computation intensive processes for ingestion or migration.
Individualized Knowledge Access David Karger Lynn Andrea Stein Mark Ackerman Ralph Swick.
I Robot.
1 USC INFORMATION SCIENCES INSTITUTE CAT: Composition Analysis Tool Interactive Composition of Computational Pathways Yolanda Gil Jihie Kim Varun Ratnakar.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
GEON Cyberinfrastructure Workshop Beijing, China, July 21-23, 2006 Workflow-Driven Ontologies for the Geosciences Leonardo Salayandía The University of.
Moby Web Services Iván Párraga García MSc on Bioinformatics for Health Sciences May 2006.
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
ReproZip Packing Experiments for Sharing and Publication Fernando Chirigati, Juliana Freire | NYU-Poly Dennis Shasha | NYU.
6 February 2009 ©2009 Cesare Pautasso | 1 JOpera and XtremWeb-CH in the Virtual EZ-Grid Cesare Pautasso Faculty of Informatics University.
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
9/03 Data Mining – Introduction G Dong (WSU)1 CS499/ Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.
Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.
OGCE Workflow and LEAD Overview Suresh Marru, Marlon Pierce September 2009.
Provenance Research BIBI RAJU, TODD ELSETHAGEN, ERIC STEPHAN 1 Pacific Northwest National Laboratory, Richland, WA.
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
Technical Drawings Understanding for the Blind Dr. George Ioannidis
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Aleksandra Pawlik Alan Williams University of Manchester.
Brian Matthews STFC EOSCpilot Brian Matthews STFC
FashionBrain: Understanding Europe’s Fashion Data Universe
Chaitali Gupta, Madhusudhan Govindaraju
Presentation transcript:

e-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining to the Life Science Community Simon Jupp School of Computer Science University of Manchester, United Kingdom

e-LICO project overview  Infrastructure to support collaborative, data mining enabled experimental research  Knowledge-driven planning of DM workflows –Improve planning by meta-mining  Support research in data-intensive, knowledge-rich domains –Systems biology use case

European Project  European Project, 9 partners. (Month 20/36) –Specialists from Data Mining, Semantic Web, Grid computing and Systems Biology University of Manchester, UK University of Geneva, Switzerland Inserm, France Josef Stefan Institute, Slovenia NHRF, Greece Poznan University, Poland Rapid-I GmbH, Germany Ruder Boskovic Institute, Coratia University of Zurich, Switzerland An EU-FP7 Collaborative Project ( ) Theme ICT-4.4: Intelligent Content and Semantics

Problems…  Capturing the workflow –Explanation –Error detection / Repair –Reproducibility –Provenance  Steep learning curve –Many operators to choose from –Best combination of operators –Hard for non Data Miners

Problems… and solutions (e-LICO planned workflows)  Develop “Intelligent Discovery Assistant” (IDA) for Data Analysis –Automatically generate workflows by planning –Assist the user in solving DM task –Structure workflows in workflow templates –Self improvement through Meta-Mining  Ontology based data model –Adds semantics –OWL/RDF based –Data Mining Experiment Resository  Capturing the workflow –Explanation –Error Detection / Repair –Reproducibility –Provenance  Steep learning curve –Many operators to choose from –Best combination of operators –Hard for non Data Miners

The e-LICO workflow Input Data Ontology based AI planner Workflow execution engine Publish and share Output: Data, provenance and models Meta-mining

Ontology based AI planner Input Data Ontology based AI planner Workflow execution engine Publish and share Output: Data, provenance and models Meta-mining

 Hierarchical Task Network (HTN) planning  Set of Tasks to achieve possible Data Mining Goals  Tasks have an I/O specification and set of associated Methods to achieve that task  Methods composed of simpler Task/Methods  Some methods are Operators with Conditions and Effects Example: My task is ‘Data Mining With Evaluation’, my Goal is to get a workflow that does this Evaluation via Cross-Validation Workflow planning

The Data Mining Worfkflow Ontology (DMWF) ClassDescriptionExamples IO ObjectInput and output used by operatorsData, Model, Report MetaDataCharacteristics of the IOObjectsAttribute, AttributeType, DataColumn, DataFormat OperatorDM operatorsDataTableProcessing, ModelProcessing, Modeling, MethodEvaluation GoalA DM goal that the user could solveDescriptiveModelling, PatternDiscovery, PredictiveModelling, RetrievalByContent TaskA task is used to achieve a goalCleanMV, CategorialToScalar, DiscretizeAll, PredictTarget MethodsA method is used to solve a taskCategorialToScalarRecursive, CleanMVRecursive, DiscretizeAllRecursive, DoPrediction

 AI Planner  Brute force planning  Probabilistic Planning  What will likely produce better results?  Case-based Planning –How did we solved that previously?  DMOP (Workflow optimization ontology) –Algorithm and Model selection given a particular task –Meta-mining by abstraction and generalisation Workflow Planning

Meta-Mining  Initially, the AI planner recommends applicable DM workflows, not necessarily good ones  Self-improves with experience through meta-mining  The meta-miner –Applies DM techniques to meta-data from past DM experiments –Extracts workflow patterns that are signatures of high predictive performance  The planner uses these workflow patterns to design and recommend promising workflows

Workflow Execution 12/14/2015e-LICO Kick-Off, Geneva12 Input Data Ontology based AI planner Workflow execution engine Publish and share Output: Data, provenance and models Meta-mining

Workflow Execution  All operators in ontology (+200) are exposed as SOAP or REST based Web Service  Plans converted to Workflow execution language (SCUFL 2)  Provenance capture –Execution times, intermediate model returned to planner Taverna

Worflow Publishing and Sharing 12/14/2015e-LICO Kick-Off, Geneva14 Input Data Ontology based AI planner Workflow execution engine Publish and share Output: Data, provenance and models Meta-mining

Workflow Publishing and Sharing  Workflows and data can be shared via myExperiment  Build a community of data miners  Set of re-usable workflows, data and workflow templates (packs)

Use case – Obstructive nephropathy  Demonstrated with System Biology Use Case –Biomarker discovery and pathway modelling in the study of chronic kidney disease –KUP challenge initiated (August 2010) Expression data KUP KB (RDF store) Text-mining / Image mining New models And hypothesis Further wet lab experiments

Research Questions  How and when does a planner based “Intelligent Discovery Assistant” help the end user?  Can we improve planning and suggest better workflows through meta- mining?  Can we plan complex workflows with Scientific Goals that answer biological questions? –KUP goal is to construct diagnostic models that accurately connect the biological views to the severity of this pathology

Where are we nowAvailability   1 st year demo –  eProPlan plugin for Protégé 4.0  Ontologies available  Taverna  RapidMiner

Summary  e-LICO: virtual laboratory for interdisciplinary collaborative research in data-mining  Ontology based AI planning of KDD workflows  Generic E-Science platform for DM  Application layer for Systems Biology

Acknowledgments  Robert Stevens (Manchester)  Alan Williams (Manchester)  Rishi Ramgolam (Manchester)  Jorg-Uwe Kietz (Zurich)  Melanie Hilario (Geneva)  E-LICO consortium