Intelligent Grid Solutions GridMiner A Framework for Knowledge Discovery on the Grid – from a Vision to Design and Implementation Peter Brezany, Ivan Janciak, Alexander Wöhrer, A Min Toja University of Vienna Institute for Software Science
CGW'04, 13. Dec. 042 GridMiner Overview Start: Jan Host: University of Vienna Vienna University of Technology Target: provide tools to discover and access relevant knowledge and information from different distributed and heterogeneous data sources Test application area: medical traumatic brain injury treatment Predicting the outcome of seriously ill patients analytical part focuses on data mining and On-Line Analytical Processing (OLAP)
CGW'04, 13. Dec. 043 Project members Project leader: Prof. A Min Tjoa, Vienna University of Technology Prof. Peter Brezany, University of Vienna Visualization: Radoslav Ivanov Data streaming: Nguyen Manh Tho OLAP: Bernhard Fiser Umut Onan Ibrahim Elsayed Data mediation: Alexander Wöhrer Knowledge Mgt: Ivan Janciak Job Control: Günter Kickinger Sequence Rules: Michael Rinner Clustering: Markus Mayer Decision rules: Christian Kloner Juergen Hofer GUI: Paul Panhofer Autonomic aspects: Michael Bergmann
CGW'04, 13. Dec. 044 Outline Motivation/ Requirements GridMiner Services Architecture Dynamic Service Composition Engine OLAP Knowledge base Data Integration Graphical user interface Implementation Summary
CGW'04, 13. Dec. 045 The process to cover Data distributed over participating hospitals accesses from different platforms (hand held, PC,…) for data generation, querying, analysis Process needs to access various data sources
CGW'04, 13. Dec. 046 GridMiner Motivation integrate knowledge discovery and knowledge management as an autonomic system manage and control whole lifecycle of knowledge give a strong support to other intelligent entities in their needs for knowledge Basic Requirements Ability to access and analyze a huge amount of information – typically heterogeneous and geographically distributed Intelligent behavior ability to maintain, discover, extend, present and communicate knowledge High performance (real-time or soft real-time) query processing High security guarantee
CGW'04, 13. Dec. 047 GridMiner Services Dynamic Workflow Control Service Data mining services Sequences (SPADE) Clustering (SimpleKMeans) Decision rules (SPRINT) OLAP (sequential/parallel version) Association rules on OLAP Grid Data Mediator Service
CGW'04, 13. Dec. 048 GridMiner Architecture Graphical User Interface Knowledge BaseService configuration Dynamic service control engine (DSCE) Data Access and IntegrationData mining services Grid Web User environment DSCE Client
CGW'04, 13. Dec. 049 Dynamic Service Control Engine Process a workflow described by DSCL. Based on the Open Grid Services Architecture Supports both interactive and batch processing User independent processing of the workflow Provision of all intermediate results from the involved services Full user control during workflow execution Supports the OGSA Notification Model
CGW'04, 13. Dec Dynamic Service Control Engine (cont.)
CGW'04, 13. Dec Knowledge Base Metadata Domain Ontology Activity OntologyDatamining Ont.Datatsource Ont. Rules Facts XML,XML Schema (XSL) (webrowset,pmml…) Web Ontology Language OWL + OWL-S SWRL OWL
CGW'04, 13. Dec OLAP Multidimensional data analysis by sequential and distributed / parallel OLAP engines. Cube construction and querying Representation of query results by OLAP Modeling Markup Language Integration with data mining engines (Association rules on OLAP)
CGW'04, 13. Dec Grid Data Mediation Service Principles Tight Federation: global (relational) schema Virtual integration: let the data where it is always up-to-date data No proprietary solution inherit well solve aspects from OGSA-DAI Not bound to special architecture Supported data sources: RDBMS (via JDBC), XMLDB (Xindice), CSV files Operators: “Union all” and “inner join” Operators are XQuery based (using SAXON)
CGW'04, 13. Dec Data Integration Scenario Heterogeneities: Name in A is „First Last“ (as the target format) Name in C has to be combined Distribution: 3 data sources
CGW'04, 13. Dec Data Integration Scenario (cont.) Query: SELECT p_name FROM patient WHERE id=10 to Standard optimized
CGW'04, 13. Dec Implementation/Technology Globus 3.2 OGSA/DAI GUI – Workflow constructions/Results visualization (JGraph, Java web Start, Java server pages) Service Configuration (Java server pages/PHP/..) Knowledge base – (XML,OWL)
CGW'04, 13. Dec Data mining Scenario Database (100k rows) (Select 10k rows) Decision Rules (SPRINT)Decision Rules (C45) (Select 20k rows) Decision Rules (C45)
CGW'04, 13. Dec Graphical User Interface
CGW'04, 13. Dec Summary Integrated data mining infrastructure Covers the whole process Service Oriented Architecture Implemented Prototype Project ongoing New data mining tasks (algorithms) Knowledge management More information:
CGW'04, 13. Dec Thank you Questions?