Presentation is loading. Please wait.

Presentation is loading. Please wait.

E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining.

Similar presentations


Presentation on theme: "E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining."— Presentation transcript:

1 e-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining to the Life Science Community Simon Jupp School of Computer Science University of Manchester, United Kingdom

2 e-LICO project overview  Infrastructure to support collaborative, data mining enabled experimental research  Knowledge-driven planning of DM workflows –Improve planning by meta-mining  Support research in data-intensive, knowledge-rich domains –Systems biology use case

3 European Project  European Project, 9 partners. (Month 20/36) –Specialists from Data Mining, Semantic Web, Grid computing and Systems Biology University of Manchester, UK University of Geneva, Switzerland Inserm, France Josef Stefan Institute, Slovenia NHRF, Greece Poznan University, Poland Rapid-I GmbH, Germany Ruder Boskovic Institute, Coratia University of Zurich, Switzerland An EU-FP7 Collaborative Project (2009-2012) Theme ICT-4.4: Intelligent Content and Semantics

4 Problems…  Capturing the workflow –Explanation –Error detection / Repair –Reproducibility –Provenance  Steep learning curve –Many operators to choose from –Best combination of operators –Hard for non Data Miners

5 Problems… and solutions (e-LICO planned workflows)  Develop “Intelligent Discovery Assistant” (IDA) for Data Analysis –Automatically generate workflows by planning –Assist the user in solving DM task –Structure workflows in workflow templates –Self improvement through Meta-Mining  Ontology based data model –Adds semantics –OWL/RDF based –Data Mining Experiment Resository  Capturing the workflow –Explanation –Error Detection / Repair –Reproducibility –Provenance  Steep learning curve –Many operators to choose from –Best combination of operators –Hard for non Data Miners

6 The e-LICO workflow Input Data Ontology based AI planner Workflow execution engine Publish and share Output: Data, provenance and models Meta-mining 1 34 2

7 Ontology based AI planner Input Data Ontology based AI planner Workflow execution engine Publish and share Output: Data, provenance and models Meta-mining 1 34 2

8  Hierarchical Task Network (HTN) planning  Set of Tasks to achieve possible Data Mining Goals  Tasks have an I/O specification and set of associated Methods to achieve that task  Methods composed of simpler Task/Methods  Some methods are Operators with Conditions and Effects Example: My task is ‘Data Mining With Evaluation’, my Goal is to get a workflow that does this Evaluation via Cross-Validation Workflow planning

9 The Data Mining Worfkflow Ontology (DMWF) ClassDescriptionExamples IO ObjectInput and output used by operatorsData, Model, Report MetaDataCharacteristics of the IOObjectsAttribute, AttributeType, DataColumn, DataFormat OperatorDM operatorsDataTableProcessing, ModelProcessing, Modeling, MethodEvaluation GoalA DM goal that the user could solveDescriptiveModelling, PatternDiscovery, PredictiveModelling, RetrievalByContent TaskA task is used to achieve a goalCleanMV, CategorialToScalar, DiscretizeAll, PredictTarget MethodsA method is used to solve a taskCategorialToScalarRecursive, CleanMVRecursive, DiscretizeAllRecursive, DoPrediction

10  AI Planner  Brute force planning  Probabilistic Planning  What will likely produce better results?  Case-based Planning –How did we solved that previously?  DMOP (Workflow optimization ontology) –Algorithm and Model selection given a particular task –Meta-mining by abstraction and generalisation Workflow Planning

11 Meta-Mining  Initially, the AI planner recommends applicable DM workflows, not necessarily good ones  Self-improves with experience through meta-mining  The meta-miner –Applies DM techniques to meta-data from past DM experiments –Extracts workflow patterns that are signatures of high predictive performance  The planner uses these workflow patterns to design and recommend promising workflows

12 Workflow Execution 12/14/2015e-LICO Kick-Off, Geneva12 Input Data Ontology based AI planner Workflow execution engine Publish and share Output: Data, provenance and models Meta-mining 1 34 2

13 Workflow Execution  All operators in ontology (+200) are exposed as SOAP or REST based Web Service  Plans converted to Workflow execution language (SCUFL 2)  Provenance capture –Execution times, intermediate model returned to planner Taverna

14 Worflow Publishing and Sharing 12/14/2015e-LICO Kick-Off, Geneva14 Input Data Ontology based AI planner Workflow execution engine Publish and share Output: Data, provenance and models Meta-mining 1 34 2

15 Workflow Publishing and Sharing  Workflows and data can be shared via myExperiment  Build a community of data miners  Set of re-usable workflows, data and workflow templates (packs)

16 Use case – Obstructive nephropathy  Demonstrated with System Biology Use Case –Biomarker discovery and pathway modelling in the study of chronic kidney disease –KUP challenge initiated (August 2010) Expression data KUP KB (RDF store) Text-mining / Image mining New models And hypothesis Further wet lab experiments

17 Research Questions  How and when does a planner based “Intelligent Discovery Assistant” help the end user?  Can we improve planning and suggest better workflows through meta- mining?  Can we plan complex workflows with Scientific Goals that answer biological questions? –KUP goal is to construct diagnostic models that accurately connect the biological views to the severity of this pathology

18 Where are we nowAvailability  http://wwww.e-lico.eu http://wwww.e-lico.eu  1 st year demo – http://www.youtube.com/watch?v=JtmqZfzyEKs  eProPlan plugin for Protégé 4.0  Ontologies available  Taverna http://www.taverna.org.uk  RapidMiner http://rapid-i.com

19 Summary  e-LICO: virtual laboratory for interdisciplinary collaborative research in data-mining  Ontology based AI planning of KDD workflows  Generic E-Science platform for DM  Application layer for Systems Biology

20 Acknowledgments  Robert Stevens (Manchester)  Alan Williams (Manchester)  Rishi Ramgolam (Manchester)  Jorg-Uwe Kietz (Zurich)  Melanie Hilario (Geneva)  E-LICO consortium


Download ppt "E-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining."

Similar presentations


Ads by Google