Service - Oriented Middleware for Distributed Data Mining on the Grid 89721006 ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.

Slides:



Advertisements
Similar presentations
1 Towards an Open Service Framework for Cloud-based Knowledge Discovery Domenico Talia ICAR-CNR & UNIVERSITY OF CALABRIA, Italy Cloud.
Advertisements

Towards a GRID Operating System: from GLinux to a Pervasive GVM Domenico TALIA DEIS University of Calabria ITALY CoreGRID Workshop.
1 From Grids to Service-Oriented Knowledge Utilities research challenges Thierry Priol.
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
How Distributed Data Mining Tasks can Thrive as Services on Grids Domenico Talia and Paolo Trunfio Università della Calabria, Italy
Distributed Data Processing
High Performance Computing Course Notes Grid Computing.
8.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Grid Computing, B. Wilkinson, 20046c.1 Globus III - Information Services.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Computer Science and Engineering A Middleware for Developing and Deploying Scalable Remote Mining Services P. 1DataGrid Lab A Middleware for Developing.
Internet GIS. A vast network connecting computers throughout the world Computers on the Internet are physically connected Computers on the Internet use.
Assoc. prof., dr. Vladimir Dimitrov University of Sofia, Bulgaria
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
STRATEGIES INVOLVED IN REMOTE COMPUTATION
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
DISTRIBUTED COMPUTING
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Through the development of advanced middleware, Grid computing has evolved to a mature technology in which scientists and researchers can leverage to gain.
DOMENICO TALIA (joint work with M. Cannataro, A. Congiusta, P. Trunfio) DEIS University of Calabria ITALY Grid-Based Data Mining and.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
KNOWLEDGE GRIDS Akshat Mishra GRID SEMINAR WINTER 2008 Feb 2008.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
9 Systems Analysis and Design in a Changing World, Fourth Edition.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GRID ARCHITECTURE Chintan O.Patel. CS 551 Fall 2002 Workshop 1 Software Architectures 2 What is Grid ? "...a flexible, secure, coordinated resource- sharing.
Grid Services I - Concepts
Authors: Ronnie Julio Cole David
Towards Using Grid Services for Mining Fuzzy Association Rules Mihai Gabroveanu, Ion Iancu, Mirel Cosulschi, Nicolae Constantinescu Faculty of Mathematics.
GVis: Grid-enabled Interactive Visualization State Key Laboratory. of CAD&CG Zhejiang University, Hangzhou
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
7. Grid Computing Systems and Resource Management
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
9. 9 Systems Analysis and Design in a Changing World, Fourth Edition.
ACGT Architecture and Grid Infrastructure Juliusz Pukacki ‏ EGEE Conference Budapest, 4 October 2007.
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Chapter 1 Characterization of Distributed Systems
Discovering Computers 2010: Living in a Digital World Chapter 14
Clouds , Grids and Clusters
SuperComputing 2003 “The Great Academia / Industry Grid Debate” ?
Grid Computing.
University of Technology
Data Warehousing and Data Mining
CLUSTER COMPUTING.
Grid Services B.Ramamurthy 12/28/2018 B.Ramamurthy.
Introduction to Grid Technology
Large Scale Distributed Computing
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
Database System Architectures
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed Computing, 2008

Outlines   Abstract   Introduction   Distributed Data Mining and Grids   Grid services for Distributed Data Mining   The Knowledge Grid Framework   Weka4WS   Concluding Remarks

Abstract (Cont.)   Distribution of data and computation allows for solving larger problems and executing applications that are distributed in nature.   The grid is a distributed computing infrastructure that enables coordinated resource sharing within dynamic organizations consisting of individuals, institutions, and resources.

Abstract ( Cont. )   Distributed data mining (DDM) is widely used in industrial, scientific and commercial applications to analyze large data sets maintained over geographically distributed sites.   The current study on DDM is mainly focused on two aspects : One is the architecture of DDM, and the other is the algorithm of DDM.

Abstract   This paper discusses : --how grid computing can be used to support distributed data mining. --Two systems developed according to this approach are presented and discussed.

Introduction ( Cont. )   Data mining in distributed environments like the Internet, virtual organization networks, corporate intranets, sensor networks, and other decentralized infrastructures questions the suitability of centralized knowledge discovery architectures for large-scale data analysis in a networked environment.

Introduction ( Cont. )   Distributed data mining works by analyzing data in a distributed fashion and pays careful attention to the trade-off between centralized collection and distributed analysis of data.   When data sets are large, scaling up the speed of a data mining task is crucial.   Parallel knowledge discovery techniques address this problem by using high-performance multi- computer machines.

Introduction ( Cont. )   The increasing availability of such machines calls for extensive development of data mining algorithms that can scale up as we attempt to analyze data sets measured in terabytes and petabytes on parallel machines with hundreds or thousands of processors.   This technology is particularly suitable for applications that typically deal with very large amount of data (e.g., transaction data, scientific simulation, and telecom data) that cannot be analyzed on traditional machines in acceptable times.

Introduction ( Cont. )   Grid technology integrates both distributed computing and parallel computing, thus it represents a critical infrastructure for high-performance distributed knowledge discovery.   Grid computing can leverage the computing power of a large number of server computers, desktop PCs, clusters, and other kinds of hardware. Therefore, it can help to increase efficiency and reduce the cost of computing networks by decreasing data processing time, optimizing resources and distributing workloads, thereby allowing users to achieve much faster results on large operations and at lower costs.

Introduction ( Cont. )   Data mining algorithms and knowledge discovery processes are both compute and data intensive, therefore the grid offers a computing and data management infrastructure for supporting decentralized and parallel data analysis.   the grid provide general data mining services, algorithms, and applications that help analysts, scientists, organizations, and professionals to leverage grid capacity in supporting high performance distributed computing for solving their data mining problems in a distributed way.

Distributed data mining on the Grid (Cont.)   the OGSA (Open Grid Services Architecture), defined grid services as an extension of Web services for providing a standard model for using the grid resources and composing distributed applications as composed of several grid services.   the OGSA provides a well-defined set of basic interfaces for the development of interoperable grid systems and applications.

Grid services for distributed data mining (Cont.)   OGSA defines standard mechanisms for creating, naming, and discovering transient grid service instances, provides location transparency and multiple protocol bindings for service instances.   OGSA also defines mechanisms required for creating and composing sophisticated distributed systems, including lifetime management, change management, and notification.

Grid services for distributed data mining (Cont.)   WSRF provides the means to express stateful resources and codifies the relationship between WS and stateful resources.   WSRF define basic services for supporting distributed data mining tasks in grids. Those services can address all the aspects that must be considered in data mining and in knowledge discovery processes from data selection and transport to data analysis, knowledge models representation, and visualization.

  the data mining phase takes from 96.1% to 99.6% of the total execution time if client and service are located in a LAG ( Local Area Grid ).   the execution time in the WAG ( Wide Area Grid ) takes from 84.6% to 88.3% of the total time.   WSRF Overhead : sum of the various WSRF-related steps ( resource creation, notification subscription, task submission, results notification, resource destruction. ) accounts for an amount of time ranging from about 0.4% to 4.1%.

The Knowledge Grid Framework (Cont.)   The Knowledge Grid framework is a system implemented to support the development of distributed KDD processes in a grid. It uses basic grid mechanisms to build specific knowledge discovery services.   The Knowledge Grid provides users with high- level abstractions and a set of services by which it is possible to integrate grid resources to support all the phases of the knowledge discovery process, as well as basic, related tasks like data management, data mining, and knowledge representation.

DAS : Data Access Services TAAS : Tools and Algorithms Access Services EPMS : Execution Plan Management Service RPS : Result Presentation Services RAEMS : Resource Allocation and Execution Management Service KDS : Knowledge Directory Service KBR : Knowledge Base Repository KMR : Knowledge Metadata Repository KEPR : Knowledge Execution Plan Repository The Knowledge Grid Framework (Cont.)

EPMS : Execution Plan Management Service RAEMS : Resource Allocation and Execution Management Service KDS : Knowledge Directory Service KBR : Knowledge Base Repository KMR : Knowledge Metadata Repository KEPR : Knowledge Execution Plan Repository

The Knowledge Grid Framework (Cont.)

Weka4WS (Cont.)   Weka4WS is a framework that extends the widely used open source Weka toolkit for supporting distributed data mining on WSRF-enabled grids.   Weka4WS adopts the WSRF technology for running remote data mining algorithms and managing distributed computations.   TheWeka4WS software prototype has been developed by using the Java WSRF library provided by Globus Toolkit 4 (GT4). All involved grid nodes inWeka4WS applications use the GT4 services for standard grid functionality, such as security, data management, and so on.

Weka4WS (Cont.)

1.Resource Creation. 2.Notification Subscription. 3.Task Submission. 4.Dataset Download. 5. Data Mining. 6.Results notification 7.Resource Destruction.

Concluding Remarks   Many experts in IT, science, finance, and commerce are recognizing the importance of scalable data mining solutions in their business.   From Bill Gates’ keynote speech at Supercomputing05:... And so now we need to take the techniques that have been developed for things like business intelligence and data mining that goes on around that and think how we can apply those in these realms as well, how we can take every step of the process and have it be very visual and only require as much software understanding as is absolutely necessary..

Concluding Remarks (Cont.)   the grid can offer an effective infrastructure for deploying data mining and knowledge discovery applications.   we discussed systems and services for implementing grid-enabled knowledge discovery services by using dispersed resources connected through a grid. These services allow professionals and scientists to create and manage complex knowledge discovery applications composed as workflows that integrate data sets and mining tools provided as distributed services on a grid.

Concluding Remarks   As an example of this approach, we described how the Knowledge Grid and the Weka4WS systems provide a higher level of abstraction of the grid resources for distributed knowledge discovery activities, thus allowing the end-users to concentrate on the knowledge discovery process without worrying about grid infrastructure details.