Download presentation
Presentation is loading. Please wait.
Published byCharleen Owen Modified over 9 years ago
1
e-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences October 12 th, 2010 Delivering data mining to the Life Science Community Simon Jupp School of Computer Science University of Manchester, United Kingdom
2
e-LICO project overview Infrastructure to support collaborative, data mining enabled experimental research Knowledge-driven planning of DM workflows –Improve planning by meta-mining Support research in data-intensive, knowledge-rich domains –Systems biology use case
3
European Project European Project, 9 partners. (Month 20/36) –Specialists from Data Mining, Semantic Web, Grid computing and Systems Biology University of Manchester, UK University of Geneva, Switzerland Inserm, France Josef Stefan Institute, Slovenia NHRF, Greece Poznan University, Poland Rapid-I GmbH, Germany Ruder Boskovic Institute, Coratia University of Zurich, Switzerland An EU-FP7 Collaborative Project (2009-2012) Theme ICT-4.4: Intelligent Content and Semantics
4
Problems… Capturing the workflow –Explanation –Error detection / Repair –Reproducibility –Provenance Steep learning curve –Many operators to choose from –Best combination of operators –Hard for non Data Miners
5
Problems… and solutions (e-LICO planned workflows) Develop “Intelligent Discovery Assistant” (IDA) for Data Analysis –Automatically generate workflows by planning –Assist the user in solving DM task –Structure workflows in workflow templates –Self improvement through Meta-Mining Ontology based data model –Adds semantics –OWL/RDF based –Data Mining Experiment Resository Capturing the workflow –Explanation –Error Detection / Repair –Reproducibility –Provenance Steep learning curve –Many operators to choose from –Best combination of operators –Hard for non Data Miners
6
The e-LICO workflow Input Data Ontology based AI planner Workflow execution engine Publish and share Output: Data, provenance and models Meta-mining 1 34 2
7
Ontology based AI planner Input Data Ontology based AI planner Workflow execution engine Publish and share Output: Data, provenance and models Meta-mining 1 34 2
8
Hierarchical Task Network (HTN) planning Set of Tasks to achieve possible Data Mining Goals Tasks have an I/O specification and set of associated Methods to achieve that task Methods composed of simpler Task/Methods Some methods are Operators with Conditions and Effects Example: My task is ‘Data Mining With Evaluation’, my Goal is to get a workflow that does this Evaluation via Cross-Validation Workflow planning
9
The Data Mining Worfkflow Ontology (DMWF) ClassDescriptionExamples IO ObjectInput and output used by operatorsData, Model, Report MetaDataCharacteristics of the IOObjectsAttribute, AttributeType, DataColumn, DataFormat OperatorDM operatorsDataTableProcessing, ModelProcessing, Modeling, MethodEvaluation GoalA DM goal that the user could solveDescriptiveModelling, PatternDiscovery, PredictiveModelling, RetrievalByContent TaskA task is used to achieve a goalCleanMV, CategorialToScalar, DiscretizeAll, PredictTarget MethodsA method is used to solve a taskCategorialToScalarRecursive, CleanMVRecursive, DiscretizeAllRecursive, DoPrediction
10
AI Planner Brute force planning Probabilistic Planning What will likely produce better results? Case-based Planning –How did we solved that previously? DMOP (Workflow optimization ontology) –Algorithm and Model selection given a particular task –Meta-mining by abstraction and generalisation Workflow Planning
11
Meta-Mining Initially, the AI planner recommends applicable DM workflows, not necessarily good ones Self-improves with experience through meta-mining The meta-miner –Applies DM techniques to meta-data from past DM experiments –Extracts workflow patterns that are signatures of high predictive performance The planner uses these workflow patterns to design and recommend promising workflows
12
Workflow Execution 12/14/2015e-LICO Kick-Off, Geneva12 Input Data Ontology based AI planner Workflow execution engine Publish and share Output: Data, provenance and models Meta-mining 1 34 2
13
Workflow Execution All operators in ontology (+200) are exposed as SOAP or REST based Web Service Plans converted to Workflow execution language (SCUFL 2) Provenance capture –Execution times, intermediate model returned to planner Taverna
14
Worflow Publishing and Sharing 12/14/2015e-LICO Kick-Off, Geneva14 Input Data Ontology based AI planner Workflow execution engine Publish and share Output: Data, provenance and models Meta-mining 1 34 2
15
Workflow Publishing and Sharing Workflows and data can be shared via myExperiment Build a community of data miners Set of re-usable workflows, data and workflow templates (packs)
16
Use case – Obstructive nephropathy Demonstrated with System Biology Use Case –Biomarker discovery and pathway modelling in the study of chronic kidney disease –KUP challenge initiated (August 2010) Expression data KUP KB (RDF store) Text-mining / Image mining New models And hypothesis Further wet lab experiments
17
Research Questions How and when does a planner based “Intelligent Discovery Assistant” help the end user? Can we improve planning and suggest better workflows through meta- mining? Can we plan complex workflows with Scientific Goals that answer biological questions? –KUP goal is to construct diagnostic models that accurately connect the biological views to the severity of this pathology
18
Where are we nowAvailability http://wwww.e-lico.eu http://wwww.e-lico.eu 1 st year demo – http://www.youtube.com/watch?v=JtmqZfzyEKs eProPlan plugin for Protégé 4.0 Ontologies available Taverna http://www.taverna.org.uk RapidMiner http://rapid-i.com
19
Summary e-LICO: virtual laboratory for interdisciplinary collaborative research in data-mining Ontology based AI planning of KDD workflows Generic E-Science platform for DM Application layer for Systems Biology
20
Acknowledgments Robert Stevens (Manchester) Alan Williams (Manchester) Rishi Ramgolam (Manchester) Jorg-Uwe Kietz (Zurich) Melanie Hilario (Geneva) E-LICO consortium
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.