Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of ViennaP. Brezany 1 Knowledge Discovery in Grid Datasets – Goals, Design Concepts and the Architecture Peter Brezany University of Vienna.

Similar presentations


Presentation on theme: "University of ViennaP. Brezany 1 Knowledge Discovery in Grid Datasets – Goals, Design Concepts and the Architecture Peter Brezany University of Vienna."— Presentation transcript:

1 University of ViennaP. Brezany 1 Knowledge Discovery in Grid Datasets – Goals, Design Concepts and the Architecture Peter Brezany University of Vienna

2 P. Brezany 2 Collecting Data Data Re- positories Satellites Laboratories (microscopes, MRI/CT scanners,...) Computer simulations Experiments (high energy physics,...) Analysis Business

3 University of ViennaP. Brezany 3 Motivation Computational Grid – a new-generation infrastructure Challenge: Advanced analysis of data managed by Grid Typical data in modern Grid applications: –files, file collections, relational and XML DBs, virtual data, data objects The data is often is large, geographically distributed and its complexity is increasing; some applications require special security precautions. Our research aims: –Phase 1 : Knowledge discovery Grid system (GridMiner) –Phase 2 : Intelligent Grid system (WisdomGrid)

4 University of ViennaP. Brezany 4 Outline Motivation Background and Related Work Basic Concepts and GridMiner Architecture Grid Data Integration System Data Mining Layer Implementation Issues and Experiments Future Research Conclusions

5 University of ViennaP. Brezany 5 Background and Related Work Basic Grid development (Globus 1) – metacomputing Data Grid (Globus 2, DataGrid of CERN, etc.) Semantic Grid (myGrid) Open Grid Service Architecture (Globus 3, OGSA-DAIS) Parallel and Distributed Data Mining and Data Warehousing Knowledge Grid (GridMiner and work of others) Web Intelligence

6 University of ViennaP. Brezany 6 GridMiner Requirements Open architecture Data distribution, complexity, heterogeneity, and large data size Applying different kinds of analysis strategies Compatibility with existing Grid infrastructure Openness to tools and algorithms Scalability Grid, network, and location transparency Security and data privacy OLAP support

7 University of ViennaP. Brezany 7 GridMiner (Layered) Abstract Architecture Computational & Data Grid Information Grid Knowledge Grid Data to Knowledge Control User Interface Built on the K.G. Jeffery‘s proposal

8 University of ViennaP. Brezany 8 GridMiner Conceptual Architecture JobControlJobControl

9 University of ViennaP. Brezany 9 Service Architecture Based on OGSA-DAIS

10 University of ViennaP. Brezany 10 Data Distribution Scenarios 1.Single data source 2.Federated data sources with different types of partitioning

11 University of ViennaP. Brezany 11 Example Vertical and horizontal distribution of the virtual data source

12 University of ViennaP. Brezany 12 Mapping Schema

13 University of ViennaP. Brezany 13 Grid Data Mediation Services

14 University of ViennaP. Brezany 14 Architecture of a Data Mining System

15 University of ViennaP. Brezany 15 Components of the Data Mining Layer GridMiner Service Factory GridMiner Service Registry GridMiner Data Mining Service GridMiner Preprocessing Service GridMiner Presentation Service GridMiner Orchestration Service

16 University of ViennaP. Brezany 16 Centralized Data Mining

17 University of ViennaP. Brezany 17 Parallel and Distributed Data Mining

18 University of ViennaP. Brezany 18 GridMiner Orchestration Service

19 University of ViennaP. Brezany 19 GridMiner Job Specification Language

20 University of ViennaP. Brezany 20 Implementation Prototype Implementation of the Mediation Service for horizontal data partitioning Implementation of Data Mining Services for decision tree construction as OGSA conformous Grid service, based on the Globus Toolkit 3 Release We use –a freely available Java-based data mining system Weka (data preprocessing and data mining tasks) – (main memory oriented) –a home-grown Java implementation of the algorithm SPRINT (disk-oriented)

21 University of ViennaP. Brezany 21 Experimental Environment Test data suites –synthetical data (generated by an extended version of the IBM Quest Synthetic Data Generation Code) –TBI (Traumatic Brain Injury) databases Grid testbed –Vienna –CERN –Dublin –Zagreb –Cracow Goals in the first phases –Verifying model accuracy –Overhead of the service layers

22 University of ViennaP. Brezany 22 Extending the Functionality

23 University of ViennaP. Brezany 23 OLAM

24 University of ViennaP. Brezany 24 Example: Mining Patterns for Data Classification and Associations use database dat1, dat2 mine classifications analyze patient_outcome using g_parsimony display as tree use database DBs attributes mine associations using method_attributes display as rules

25 University of ViennaP. Brezany 25 Workflow 1: Interactive Mode

26 University of ViennaP. Brezany 26 Workflow 2: Batch Mode

27 University of ViennaP. Brezany 27 Workflow 3: Hybrid Mode

28 University of ViennaP. Brezany 28 Execution Model Based on Static Workflow

29 University of ViennaP. Brezany 29 Execution Model Based on Dynamic Workflow

30 University of ViennaP. Brezany 30 Towards the Wisdom Grid (WG)

31 University of ViennaP. Brezany 31 WG Architecture Wisdom Grid Agent Grid Service Knowledge Base Service Knowledge Discovery Service Agent Platform External Services External Knowledge Base Domain Knowledge AgentsKnowledge Explorer Agent End User (personal) Agent Grid KB

32 University of ViennaP. Brezany 32 Work-Flow End User Agent Knowledge AgentKnowledge Explorer Agent Knowledge Base service External Agents Knowledge Base Agent Service Knowledge discovery service Services...

33 University of ViennaP. Brezany 33 Knowledge Discovery Service Client for other services Knowledge Discovery in Databases GridMiner data mining on-line analytical processing (OLAP) Web Mining semantic web Online libraries Web/Grid Services Knowledge Explorer Agent

34 University of ViennaP. Brezany 34 Knowledge Base Service / KB KBS - Search, Query, Expand Knowledge Base KB- Database that stores particular data about real objects and relations between these objects and their properties Consists of ontologies and instances Information about resources (location, query lang.) on the Web web/grid services,agents references to the online database Languages XML/RDF/DAML-OIL/DAML-S/OWL

35 University of ViennaP. Brezany 35 Ontology - example Patient Age Human has is DAML-OIL Language:

36 University of ViennaP. Brezany 36 Knowledge Base - example Patient Temperature Human has DatabaseTables jdbc://foo/hospitaltable:PATIENTSattribute:PAT_ID is Value Attribute has

37 University of ViennaP. Brezany 37 Semantic mediator Distributed heterogeneous databases –Different database schemas –Different query languages –Different names of attributes/tables… but the same semantics ! WG enables semantics mediation at a higher level

38 University of ViennaP. Brezany 38 Semantic mediator (cont.) PATIENTS PAT_IDPAT_AGEPAT_BLOOD_TYPE...…… PAT_TAB IDAGEBT...…… Patient Age Human has is Blood Type has AGEPAT_AGE samePropertyAs BTPAT_BLOOD_TYPE samePropertyAs Database in Hospital X Database in Hospital Z

39 University of ViennaP. Brezany 39 Distributed Knowledge base is subclass has property Class property uri:fooX#Patient uri:fooY#Human uri:fooZ#Temperature class uri:fooX#Ill_Person Is same class as

40 University of ViennaP. Brezany 40 Agent Grid Service Supports system with ability to communicate with the outside world in standard languages FIPA Standards ACL – Agent Communication Language KQML- Knowledge Query and Manipulation Language Agent Platform (JADE,FIPA-OS) Agents Domain Knowledge Agent Knowledge Explorer Agent End-user Agent (personal)

41 University of ViennaP. Brezany 41 Querying End-user agent with own ontology – subset of ontology Merging of ontologies without own ontology Negotiating about domain of interest Queries created from ontology Templates

42 University of ViennaP. Brezany 42 Answers Mined Knowledge (GridMiner) –Decision trees/ rules »(clinical pathways) –Association rules Instances of domain ontology –Particular data –References –Links to Web sites –Information about another knowledge providers

43 University of ViennaP. Brezany 43 Case Study - Medical Application End User (personal) Agent Q: Outcome? + data about patient’s condition Knowledge Agent Training set GridMiner Testset Hospital Databases Knowledge Discovery Service Knowledge Base Semantic Web/Grid A: probability of survival + references to the diagnoses Knowledge Explorer Agent resources

44 University of ViennaP. Brezany 44 Conclusions and Future Work Application and extension of the Grid technology to knowledge discovery – an important, but non- traditional Grid application domain Introduction of a new Grid Data Mediation Service Future work –Performance evaluation on large synthetic data volumes –Coupling of the Data Minining services architecture with the OLAP services architecture –Development of a knowledge discovery oriented Grid Workflow Language and the appropriate Workflow Engine –Application of GridMiner to a real medical application (management of patients with severe traumatic brain injuries) –Development of the Wisdom Grid


Download ppt "University of ViennaP. Brezany 1 Knowledge Discovery in Grid Datasets – Goals, Design Concepts and the Architecture Peter Brezany University of Vienna."

Similar presentations


Ads by Google