University of ViennaP. Brezany 1 Knowledge Discovery in Grid Datasets – Goals, Design Concepts and the Architecture Peter Brezany University of Vienna.

Slides:



Advertisements
Similar presentations
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Advertisements

Distributed Data Processing
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Crucial Patterns in Service- Oriented Architecture Jaroslav Král, Michal Žemlička Charles University, Prague.
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
Institute for Software Science – University of ViennaP.Brezany 1 Databases and the Grid Peter Brezany Institute für Scientific Computing University of.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Intelligent Grid Solutions 1 / 18 Convergence of Grid and Web technologies Alexander Wöhrer und Peter Brezany Institute for Software.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Interpret Application Specifications
Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Computer Science and Engineering A Middleware for Developing and Deploying Scalable Remote Mining Services P. 1DataGrid Lab A Middleware for Developing.
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
UNIT-V The MVC architecture and Struts Framework.
Application of PDM Technologies for Enterprise Integration 1 SS 14/15 By - Vathsala Arabaghatta Shivarudrappa.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
An approach to Intelligent Information Fusion in Sensor Saturated Urban Environments Charalampos Doulaverakis Centre for Research and Technology Hellas.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
Intelligent Grid Solutions GridMiner A Framework for Knowledge Discovery on the Grid – from a Vision to Design and Implementation Peter.
DEVS Namespace for Interoperable DEVS/SOA
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Through the development of advanced middleware, Grid computing has evolved to a mature technology in which scientists and researchers can leverage to gain.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Edinburgh, 30. Nov GridMiner A Framework for Knowledge Discovery on the Grid – Scientific Drivers and Contributions Peter Brezany.
Development Process and Testing Tools for Content Standards OASIS Symposium: The Meaning of Interoperability May 9, 2006 Simon Frechette, NIST.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
KNOWLEDGE GRIDS Akshat Mishra GRID SEMINAR WINTER 2008 Feb 2008.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Middleware for Grid Computing and the relationship to Middleware at large ECE 1770 : Middleware Systems By: Sepehr (Sep) Seyedi Date: Thurs. January 23,
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
Cracow Grid Workshop ‘06 17 October 2006 Execution Management and SLA Enforcement in Akogrimo Antonios Litke Antonios Litke, Kleopatra Konstanteli, Vassiliki.
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
Enabling the Future Service-Oriented Internet (EFSOI 2008) Supporting end-to-end resource virtualization for Web 2.0 applications using Service Oriented.
SCAPE Rainer Schmidt SCAPE Training Event September 16 th – 17 th, 2013 The British Library Building Scalable Environments Technologies and SCAPE Platform.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Workflow Management in GridMiner Günter Kickinger, Jürgen Hofer, Peter Brezany, A Min Tjoa Institute for Software Science University of Vienna The 3rd.
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
Globus: A Report. Introduction What is Globus? Need for Globus. Goal of Globus Approach used by Globus: –Develop High level tools and basic technologies.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
DS-Grid: Large Scale Distributed Simulation on the Grid Georgios Theodoropoulos Midlands e-Science Centre University of Birmingham, UK Stephen John Turner,
1 Artemis: Integrating Scientific Data on the Grid Rattapoom Tuchinda Snehal Thakkar Yolanda Gil Ewa Deelman.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
ACGT Architecture and Grid Infrastructure Juliusz Pukacki ‏ EGEE Conference Budapest, 4 October 2007.
Empowering the Knowledge Worker End-User Software Engineering in Knowledge Management Witold Staniszkis The 17th International.
Managing Data Resources File Organization and databases for business information systems.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
Similarities between Grid-enabled Medical and Engineering Applications
Architecture Components
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
University of Technology
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Data Warehousing and Data Mining
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Presentation transcript:

University of ViennaP. Brezany 1 Knowledge Discovery in Grid Datasets – Goals, Design Concepts and the Architecture Peter Brezany University of Vienna

P. Brezany 2 Collecting Data Data Re- positories Satellites Laboratories (microscopes, MRI/CT scanners,...) Computer simulations Experiments (high energy physics,...) Analysis Business

University of ViennaP. Brezany 3 Motivation Computational Grid – a new-generation infrastructure Challenge: Advanced analysis of data managed by Grid Typical data in modern Grid applications: –files, file collections, relational and XML DBs, virtual data, data objects The data is often is large, geographically distributed and its complexity is increasing; some applications require special security precautions. Our research aims: –Phase 1 : Knowledge discovery Grid system (GridMiner) –Phase 2 : Intelligent Grid system (WisdomGrid)

University of ViennaP. Brezany 4 Outline Motivation Background and Related Work Basic Concepts and GridMiner Architecture Grid Data Integration System Data Mining Layer Implementation Issues and Experiments Future Research Conclusions

University of ViennaP. Brezany 5 Background and Related Work Basic Grid development (Globus 1) – metacomputing Data Grid (Globus 2, DataGrid of CERN, etc.) Semantic Grid (myGrid) Open Grid Service Architecture (Globus 3, OGSA-DAIS) Parallel and Distributed Data Mining and Data Warehousing Knowledge Grid (GridMiner and work of others) Web Intelligence

University of ViennaP. Brezany 6 GridMiner Requirements Open architecture Data distribution, complexity, heterogeneity, and large data size Applying different kinds of analysis strategies Compatibility with existing Grid infrastructure Openness to tools and algorithms Scalability Grid, network, and location transparency Security and data privacy OLAP support

University of ViennaP. Brezany 7 GridMiner (Layered) Abstract Architecture Computational & Data Grid Information Grid Knowledge Grid Data to Knowledge Control User Interface Built on the K.G. Jeffery‘s proposal

University of ViennaP. Brezany 8 GridMiner Conceptual Architecture JobControlJobControl

University of ViennaP. Brezany 9 Service Architecture Based on OGSA-DAIS

University of ViennaP. Brezany 10 Data Distribution Scenarios 1.Single data source 2.Federated data sources with different types of partitioning

University of ViennaP. Brezany 11 Example Vertical and horizontal distribution of the virtual data source

University of ViennaP. Brezany 12 Mapping Schema

University of ViennaP. Brezany 13 Grid Data Mediation Services

University of ViennaP. Brezany 14 Architecture of a Data Mining System

University of ViennaP. Brezany 15 Components of the Data Mining Layer GridMiner Service Factory GridMiner Service Registry GridMiner Data Mining Service GridMiner Preprocessing Service GridMiner Presentation Service GridMiner Orchestration Service

University of ViennaP. Brezany 16 Centralized Data Mining

University of ViennaP. Brezany 17 Parallel and Distributed Data Mining

University of ViennaP. Brezany 18 GridMiner Orchestration Service

University of ViennaP. Brezany 19 GridMiner Job Specification Language

University of ViennaP. Brezany 20 Implementation Prototype Implementation of the Mediation Service for horizontal data partitioning Implementation of Data Mining Services for decision tree construction as OGSA conformous Grid service, based on the Globus Toolkit 3 Release We use –a freely available Java-based data mining system Weka (data preprocessing and data mining tasks) – (main memory oriented) –a home-grown Java implementation of the algorithm SPRINT (disk-oriented)

University of ViennaP. Brezany 21 Experimental Environment Test data suites –synthetical data (generated by an extended version of the IBM Quest Synthetic Data Generation Code) –TBI (Traumatic Brain Injury) databases Grid testbed –Vienna –CERN –Dublin –Zagreb –Cracow Goals in the first phases –Verifying model accuracy –Overhead of the service layers

University of ViennaP. Brezany 22 Extending the Functionality

University of ViennaP. Brezany 23 OLAM

University of ViennaP. Brezany 24 Example: Mining Patterns for Data Classification and Associations use database dat1, dat2 mine classifications analyze patient_outcome using g_parsimony display as tree use database DBs attributes mine associations using method_attributes display as rules

University of ViennaP. Brezany 25 Workflow 1: Interactive Mode

University of ViennaP. Brezany 26 Workflow 2: Batch Mode

University of ViennaP. Brezany 27 Workflow 3: Hybrid Mode

University of ViennaP. Brezany 28 Execution Model Based on Static Workflow

University of ViennaP. Brezany 29 Execution Model Based on Dynamic Workflow

University of ViennaP. Brezany 30 Towards the Wisdom Grid (WG)

University of ViennaP. Brezany 31 WG Architecture Wisdom Grid Agent Grid Service Knowledge Base Service Knowledge Discovery Service Agent Platform External Services External Knowledge Base Domain Knowledge AgentsKnowledge Explorer Agent End User (personal) Agent Grid KB

University of ViennaP. Brezany 32 Work-Flow End User Agent Knowledge AgentKnowledge Explorer Agent Knowledge Base service External Agents Knowledge Base Agent Service Knowledge discovery service Services...

University of ViennaP. Brezany 33 Knowledge Discovery Service Client for other services Knowledge Discovery in Databases GridMiner data mining on-line analytical processing (OLAP) Web Mining semantic web Online libraries Web/Grid Services Knowledge Explorer Agent

University of ViennaP. Brezany 34 Knowledge Base Service / KB KBS - Search, Query, Expand Knowledge Base KB- Database that stores particular data about real objects and relations between these objects and their properties Consists of ontologies and instances Information about resources (location, query lang.) on the Web web/grid services,agents references to the online database Languages XML/RDF/DAML-OIL/DAML-S/OWL

University of ViennaP. Brezany 35 Ontology - example Patient Age Human has is DAML-OIL Language:

University of ViennaP. Brezany 36 Knowledge Base - example Patient Temperature Human has DatabaseTables jdbc://foo/hospitaltable:PATIENTSattribute:PAT_ID is Value Attribute has

University of ViennaP. Brezany 37 Semantic mediator Distributed heterogeneous databases –Different database schemas –Different query languages –Different names of attributes/tables… but the same semantics ! WG enables semantics mediation at a higher level

University of ViennaP. Brezany 38 Semantic mediator (cont.) PATIENTS PAT_IDPAT_AGEPAT_BLOOD_TYPE...…… PAT_TAB IDAGEBT...…… Patient Age Human has is Blood Type has AGEPAT_AGE samePropertyAs BTPAT_BLOOD_TYPE samePropertyAs Database in Hospital X Database in Hospital Z

University of ViennaP. Brezany 39 Distributed Knowledge base is subclass has property Class property uri:fooX#Patient uri:fooY#Human uri:fooZ#Temperature class uri:fooX#Ill_Person Is same class as

University of ViennaP. Brezany 40 Agent Grid Service Supports system with ability to communicate with the outside world in standard languages FIPA Standards ACL – Agent Communication Language KQML- Knowledge Query and Manipulation Language Agent Platform (JADE,FIPA-OS) Agents Domain Knowledge Agent Knowledge Explorer Agent End-user Agent (personal)

University of ViennaP. Brezany 41 Querying End-user agent with own ontology – subset of ontology Merging of ontologies without own ontology Negotiating about domain of interest Queries created from ontology Templates

University of ViennaP. Brezany 42 Answers Mined Knowledge (GridMiner) –Decision trees/ rules »(clinical pathways) –Association rules Instances of domain ontology –Particular data –References –Links to Web sites –Information about another knowledge providers

University of ViennaP. Brezany 43 Case Study - Medical Application End User (personal) Agent Q: Outcome? + data about patient’s condition Knowledge Agent Training set GridMiner Testset Hospital Databases Knowledge Discovery Service Knowledge Base Semantic Web/Grid A: probability of survival + references to the diagnoses Knowledge Explorer Agent resources

University of ViennaP. Brezany 44 Conclusions and Future Work Application and extension of the Grid technology to knowledge discovery – an important, but non- traditional Grid application domain Introduction of a new Grid Data Mediation Service Future work –Performance evaluation on large synthetic data volumes –Coupling of the Data Minining services architecture with the OLAP services architecture –Development of a knowledge discovery oriented Grid Workflow Language and the appropriate Workflow Engine –Application of GridMiner to a real medical application (management of patients with severe traumatic brain injuries) –Development of the Wisdom Grid