Download presentation
Presentation is loading. Please wait.
Published byHarriet Sullivan Modified over 9 years ago
1
University of ViennaP. Brezany 1 Knowledge Discovery in Grid Datasets – Goals, Design Concepts and the Architecture Peter Brezany University of Vienna
2
P. Brezany 2 Collecting Data Data Re- positories Satellites Laboratories (microscopes, MRI/CT scanners,...) Computer simulations Experiments (high energy physics,...) Analysis Business
3
University of ViennaP. Brezany 3 Motivation Computational Grid – a new-generation infrastructure Challenge: Advanced analysis of data managed by Grid Typical data in modern Grid applications: –files, file collections, relational and XML DBs, virtual data, data objects The data is often is large, geographically distributed and its complexity is increasing; some applications require special security precautions. Our research aims: –Phase 1 : Knowledge discovery Grid system (GridMiner) –Phase 2 : Intelligent Grid system (WisdomGrid)
4
University of ViennaP. Brezany 4 Outline Motivation Background and Related Work Basic Concepts and GridMiner Architecture Grid Data Integration System Data Mining Layer Implementation Issues and Experiments Future Research Conclusions
5
University of ViennaP. Brezany 5 Background and Related Work Basic Grid development (Globus 1) – metacomputing Data Grid (Globus 2, DataGrid of CERN, etc.) Semantic Grid (myGrid) Open Grid Service Architecture (Globus 3, OGSA-DAIS) Parallel and Distributed Data Mining and Data Warehousing Knowledge Grid (GridMiner and work of others) Web Intelligence
6
University of ViennaP. Brezany 6 GridMiner Requirements Open architecture Data distribution, complexity, heterogeneity, and large data size Applying different kinds of analysis strategies Compatibility with existing Grid infrastructure Openness to tools and algorithms Scalability Grid, network, and location transparency Security and data privacy OLAP support
7
University of ViennaP. Brezany 7 GridMiner (Layered) Abstract Architecture Computational & Data Grid Information Grid Knowledge Grid Data to Knowledge Control User Interface Built on the K.G. Jeffery‘s proposal
8
University of ViennaP. Brezany 8 GridMiner Conceptual Architecture JobControlJobControl
9
University of ViennaP. Brezany 9 Service Architecture Based on OGSA-DAIS
10
University of ViennaP. Brezany 10 Data Distribution Scenarios 1.Single data source 2.Federated data sources with different types of partitioning
11
University of ViennaP. Brezany 11 Example Vertical and horizontal distribution of the virtual data source
12
University of ViennaP. Brezany 12 Mapping Schema
13
University of ViennaP. Brezany 13 Grid Data Mediation Services
14
University of ViennaP. Brezany 14 Architecture of a Data Mining System
15
University of ViennaP. Brezany 15 Components of the Data Mining Layer GridMiner Service Factory GridMiner Service Registry GridMiner Data Mining Service GridMiner Preprocessing Service GridMiner Presentation Service GridMiner Orchestration Service
16
University of ViennaP. Brezany 16 Centralized Data Mining
17
University of ViennaP. Brezany 17 Parallel and Distributed Data Mining
18
University of ViennaP. Brezany 18 GridMiner Orchestration Service
19
University of ViennaP. Brezany 19 GridMiner Job Specification Language
20
University of ViennaP. Brezany 20 Implementation Prototype Implementation of the Mediation Service for horizontal data partitioning Implementation of Data Mining Services for decision tree construction as OGSA conformous Grid service, based on the Globus Toolkit 3 Release We use –a freely available Java-based data mining system Weka (data preprocessing and data mining tasks) – (main memory oriented) –a home-grown Java implementation of the algorithm SPRINT (disk-oriented)
21
University of ViennaP. Brezany 21 Experimental Environment Test data suites –synthetical data (generated by an extended version of the IBM Quest Synthetic Data Generation Code) –TBI (Traumatic Brain Injury) databases Grid testbed –Vienna –CERN –Dublin –Zagreb –Cracow Goals in the first phases –Verifying model accuracy –Overhead of the service layers
22
University of ViennaP. Brezany 22 Extending the Functionality
23
University of ViennaP. Brezany 23 OLAM
24
University of ViennaP. Brezany 24 Example: Mining Patterns for Data Classification and Associations use database dat1, dat2 mine classifications analyze patient_outcome using g_parsimony display as tree use database DBs attributes mine associations using method_attributes display as rules
25
University of ViennaP. Brezany 25 Workflow 1: Interactive Mode
26
University of ViennaP. Brezany 26 Workflow 2: Batch Mode
27
University of ViennaP. Brezany 27 Workflow 3: Hybrid Mode
28
University of ViennaP. Brezany 28 Execution Model Based on Static Workflow
29
University of ViennaP. Brezany 29 Execution Model Based on Dynamic Workflow
30
University of ViennaP. Brezany 30 Towards the Wisdom Grid (WG)
31
University of ViennaP. Brezany 31 WG Architecture Wisdom Grid Agent Grid Service Knowledge Base Service Knowledge Discovery Service Agent Platform External Services External Knowledge Base Domain Knowledge AgentsKnowledge Explorer Agent End User (personal) Agent Grid KB
32
University of ViennaP. Brezany 32 Work-Flow End User Agent Knowledge AgentKnowledge Explorer Agent Knowledge Base service External Agents Knowledge Base Agent Service Knowledge discovery service Services...
33
University of ViennaP. Brezany 33 Knowledge Discovery Service Client for other services Knowledge Discovery in Databases GridMiner data mining on-line analytical processing (OLAP) Web Mining semantic web Online libraries Web/Grid Services Knowledge Explorer Agent
34
University of ViennaP. Brezany 34 Knowledge Base Service / KB KBS - Search, Query, Expand Knowledge Base KB- Database that stores particular data about real objects and relations between these objects and their properties Consists of ontologies and instances Information about resources (location, query lang.) on the Web web/grid services,agents references to the online database Languages XML/RDF/DAML-OIL/DAML-S/OWL
35
University of ViennaP. Brezany 35 Ontology - example Patient Age Human has is DAML-OIL Language:
36
University of ViennaP. Brezany 36 Knowledge Base - example Patient Temperature Human has DatabaseTables jdbc://foo/hospitaltable:PATIENTSattribute:PAT_ID is Value Attribute has
37
University of ViennaP. Brezany 37 Semantic mediator Distributed heterogeneous databases –Different database schemas –Different query languages –Different names of attributes/tables… but the same semantics ! WG enables semantics mediation at a higher level
38
University of ViennaP. Brezany 38 Semantic mediator (cont.) PATIENTS PAT_IDPAT_AGEPAT_BLOOD_TYPE...…… PAT_TAB IDAGEBT...…… Patient Age Human has is Blood Type has AGEPAT_AGE samePropertyAs BTPAT_BLOOD_TYPE samePropertyAs Database in Hospital X Database in Hospital Z
39
University of ViennaP. Brezany 39 Distributed Knowledge base is subclass has property Class property uri:fooX#Patient uri:fooY#Human uri:fooZ#Temperature class uri:fooX#Ill_Person Is same class as
40
University of ViennaP. Brezany 40 Agent Grid Service Supports system with ability to communicate with the outside world in standard languages FIPA Standards ACL – Agent Communication Language KQML- Knowledge Query and Manipulation Language Agent Platform (JADE,FIPA-OS) Agents Domain Knowledge Agent Knowledge Explorer Agent End-user Agent (personal)
41
University of ViennaP. Brezany 41 Querying End-user agent with own ontology – subset of ontology Merging of ontologies without own ontology Negotiating about domain of interest Queries created from ontology Templates
42
University of ViennaP. Brezany 42 Answers Mined Knowledge (GridMiner) –Decision trees/ rules »(clinical pathways) –Association rules Instances of domain ontology –Particular data –References –Links to Web sites –Information about another knowledge providers
43
University of ViennaP. Brezany 43 Case Study - Medical Application End User (personal) Agent Q: Outcome? + data about patient’s condition Knowledge Agent Training set GridMiner Testset Hospital Databases Knowledge Discovery Service Knowledge Base Semantic Web/Grid A: probability of survival + references to the diagnoses Knowledge Explorer Agent resources
44
University of ViennaP. Brezany 44 Conclusions and Future Work Application and extension of the Grid technology to knowledge discovery – an important, but non- traditional Grid application domain Introduction of a new Grid Data Mediation Service Future work –Performance evaluation on large synthetic data volumes –Coupling of the Data Minining services architecture with the OLAP services architecture –Development of a knowledge discovery oriented Grid Workflow Language and the appropriate Workflow Engine –Application of GridMiner to a real medical application (management of patients with severe traumatic brain injuries) –Development of the Wisdom Grid
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.