DOMENICO TALIA (joint work with M. Cannataro, A. Congiusta, P. Trunfio) DEIS University of Calabria ITALY Grid-Based Data Mining and.

Slides:



Advertisements
Similar presentations
1 Towards an Open Service Framework for Cloud-based Knowledge Discovery Domenico Talia ICAR-CNR & UNIVERSITY OF CALABRIA, Italy Cloud.
Advertisements

Towards a GRID Operating System: from GLinux to a Pervasive GVM Domenico TALIA DEIS University of Calabria ITALY CoreGRID Workshop.
C. Mastroianni, D. Talia, O. Verta - A Super-Peer Model for Resource Discovery Services in Grids A Super-Peer Model for Building Resource Discovery Services.
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
How Distributed Data Mining Tasks can Thrive as Services on Grids Domenico Talia and Paolo Trunfio Università della Calabria, Italy
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Interaction model of grid services in mobile grid environment Ladislav Pesicka University of West Bohemia.
High Performance Computing Course Notes Grid Computing.
A conceptual model of grid resources and services Authors: Sergio Andreozzi Massimo Sgaravatto Cristina Vistoli Presenter: Sergio Andreozzi INFN-CNAF Bologna.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Workload Management Massimo Sgaravatto INFN Padova.
The Open Grid Service Architecture (OGSA) Standard for Grid Computing Prepared by: Haoliang Robin Yu.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Assoc. prof., dr. Vladimir Dimitrov University of Sofia, Bulgaria
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
DISTRIBUTED COMPUTING
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Supporting Heterogeneous Users in Collaborative Virtual Environments using AOP CoopIS 2001 September 5-7, Trento, Italy M. Pinto, M. Amor, L. Fuentes,
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Through the development of advanced middleware, Grid computing has evolved to a mature technology in which scientists and researchers can leverage to gain.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
KNOWLEDGE GRIDS Akshat Mishra GRID SEMINAR WINTER 2008 Feb 2008.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Middleware for Grid Computing and the relationship to Middleware at large ECE 1770 : Middleware Systems By: Sepehr (Sep) Seyedi Date: Thurs. January 23,
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GRID ARCHITECTURE Chintan O.Patel. CS 551 Fall 2002 Workshop 1 Software Architectures 2 What is Grid ? "...a flexible, secure, coordinated resource- sharing.
Grid Services I - Concepts
Towards Using Grid Services for Mining Fuzzy Association Rules Mihai Gabroveanu, Ion Iancu, Mirel Cosulschi, Nicolae Constantinescu Faculty of Mathematics.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
Enabling Grids for E-sciencE Astronomical data processing workflows on a service-oriented Grid architecture Valeria Manna INAF - SI The.
1 Centre for Intelligent Systems and their Applications Division of Informatics, University of Edinburgh Draft for AKT July Workshop Jessica Chen-Burger.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
©2012 LIESMARS Wuhan University Building Integrated Cyberinfrastructure for GIScience through Geospatial Service Web Jianya Gong, Tong Zhang, Huayi Wu.
1 Registry Services Overview J. Steven Hughes (Deputy Chair) Principal Computer Scientist NASA/JPL 17 December 2015.
7. Grid Computing Systems and Resource Management
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Globus: A Report. Introduction What is Globus? Need for Globus. Goal of Globus Approach used by Globus: –Develop High level tools and basic technologies.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.
Collaborative Tools for the Grid V.N Alexandrov S. Mehmood Hasan.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
ACGT Architecture and Grid Infrastructure Juliusz Pukacki ‏ EGEE Conference Budapest, 4 October 2007.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
The Open Grid Service Architecture (OGSA) Standard for Grid Computing
Globus —— Toolkits for Grid Computing
University of Technology
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

DOMENICO TALIA (joint work with M. Cannataro, A. Congiusta, P. Trunfio) DEIS University of Calabria ITALY Grid-Based Data Mining and the KNOWLEDGE GRID Framework Minneapolis, September 18, 2003

2 OUTLINE  Introduction  Parallel and Distributed Data Mining on Grids  The KNOWLEDGE GRID  KNOWLEDGE GRID Architecture  KNOWLEDGE GRID Services  KNOWLEDGE GRID Tools  VEGA  Current Work  Conclusion

3  Data mining is often a compute intensive task.  When  large data sets are coupled with  geographic distribution of data, users, and systems, it is necessary to combine different technologies for implementing high-performance distributed knowledge discovery systems (PDKD).  Distributed data mining tools are available but most of them do not run on Grids. PARALLEL & DISTRIBUTED DATA MINING

4 “By providing scalable, secure, high-performance mechanisms for discovering and negotiating access to remote resources, the Grid promises to make it possible for scientific collaborations to share resources on an unprecedented scale, and for geographically distributed groups to work together in ways that were previously impossible” Ian Foster WHAT IS A GRIDS ?

5  Grid middleware targets technical challenges in areas such as  communication,  scheduling,  security,  information and data access, and  fault detection.  Efforts are needed for the development of knowledge discovery tools and services on the Grid. PARALLEL & DISTRIBUTED DM ON GRIDS Grid-aware PDKD systems

6 PARALLEL & DISTRIBUTED DM ON GRIDS The basic principles that motivate the architecture design of the grid-aware PDKD systems  Data heterogeneity and large data size  Algorithm integration and independence  Grid awareness  Openness  Scalability  Security and data privacy.

7 WHAT THE GRID OFFERS  Grid infrastructure tools, such as the Globus Toolkit and Legion, provide basic services that can be effectively used in the development of a data mining applications.  Data Grid middleware (e.g. Globus Data Grid) implements data management architectures based on two main services: storage system and metadata management.  Data Grids are useful, but are not sufficient for data mining.

8  KNOWLEDGE GRID - a PDKD architecture that integrates data mining techniques and computational Grid resources.  In the KNOWLEDGE GRID architecture data mining tools are integrated with lower-level Grid mechanisms and services and exploit Data Grid services.  This approach benefits from "standard" Grid services and offers an open PDKD architecture that can be configured on top of generic Grid middleware. THE KNOWLEDGE GRID

9 KNOWLEDGE GRID ENVIRONMENT A KNOWLEDGE GRID application uses:  A set of KNOWLEDGE GRID-enabled computers - K-GRID nodes declaring their availability to participate to some PDKD computation, that are connected by  A Grid infrastructure offering basic grid-services (authentication, data location, service level negotiation) and implementing the KNOWLEDGE GRID services.

10 KNOWLEDGE GRID ENVIRONMENT LAN Cluster containing data sets and/or DM algorithms K-GRID node Generic Grid node Basic Grid Infrastucture K-GRID node Local Resources Cluster Element Grid Middleware K-GRID tools KNOWLEDGE GRID services Local Resources Grid Middleware K-GRID tools Grid Middleware

11 KNOWLEDGE GRID SERVICES  The KNOWLEDGE GRID services are organized in two hierarchic layers : Core K-Grid layer and High-level K-Grid layer.  The former refers to services directly implemented on the top of generic Grid services.  The latter is used to describe, develop, and execute PDKD computations over the KNOWLEDGE GRID.

12 KNOWLEDGE GRID ARCHITECTURE Generic Grid Services KNOWLEDGEGRIDKNOWLEDGEGRID

13 KNOWLEDGE GRID SERVICES Core K-Grid layer services: Knowledge directory service (KDS). Extends the basic Globus MDS and GIS services to maintain a description of all data and tools used in the KNOWLEDGE GRID. Resource allocation and execution management service (RAEMS). RAEMS services are used to find a mapping between an execution plan and available resources. The Core K-Grid layer manages metadata describing features of data sources, third party data mining tools, data management, and data visualization tools and algorithms.

14 KNOWLEDGE GRID SERVICES High-level K-grid layer services:  Data Access Search, selection (Data search services), extraction, transformation and delivery (Data extraction services) of data to be mined.  Tools and algorithms access Search, selection, and downloading of data mining tools and algorithms.  Execution Plan Management Generation of a set of different execution plans that satisfy user, data, and algorithms requirements and constraints.  Results presentation Specifies how to generate, present and visualize the PDKD results (rules, associations, models, classification, etc.).

15 KNOWLEDGE GRID OBJECTS  We use the Globus MDS model only for generic Grid resources, but extended it with an XML metadata model to manage specific KNOWLEDGE GRID resources.  Metadata describing relevant K-Grid objects, such as data sources and data mining tools, are implemented using both LDAP and XML.  The (Knowledge Metadata Repository) KMR is implemented by LDAP entries and XML documents. The LDAP portion is used as a first point of access to more specific information represented by XML documents.

16 APPLICATION COMPOSITION STEPS Search and selection of resources DAS / TAAS EPMS KMRs TMR Design of the PDKD computation Metadata about K-grid resources Metadata about K-grid resources Metadata about the selected K-grid resources Metadata about the selected K-grid resources Execution Plan KEPR

17 APPLICATION EXECUTION STEPS RAEMS GRAM RPS Execution Plan optimization and translation Execution of the PDKD computation Results presentation Execution Plan RSL script Computation results KEPR KBR

18  A prototype version f the KNOWLEDGE GRID architecture have been implemented using Java and the Globus Toolkit 2.x.  To allow a user to build a grid-based data mining application, we developed a toolset named VEGA (a Visual Environment for Grid Applications).  VEGA offers users support for :  task composition - definition of the entities involved in the computation and specification of relations among them;  checking of the consistency of the planned task;  generation of the execution plan for a data mining task.  execution of the execution plan through the resource allocation manager of the underlying grid. A TOOL : VEGA

19 Objects: Links : Hosts Software Data Output Input Execute File Transfer Objects represent resources Links represent relations among resources VEGA : OBJECTS and LINKS

20 Hosts pane Resources pane VEGA

21 A KGrid application can be composed of several workspaces VEGA

22... AutoClass Unsupervised Bayesian Classifier 01 May 00 Nasa Ames Research Center icarus.isi.cs.cnr.it /share/software/autoclass-c/autoclass /share/software/autoclass-c/read-me.text... XML METADATA in a KMR

23... <Destination ep:href="k2../Unidb.xml“ ep:title="Unidb on k2.deis.unical.it"/> <Output ep:href="k2../IMiner.out.xml" ep:title="IMiner.out on k2.deis.unical.it"/> XML EXECUTION PLAN

(&(resourceManagerContact=g1.isi.cs.cnr.it) (subjobStartType=strict-barrier) (label=ws1_dt2) (executable=$(GLOBUS_LOCATION)/bin/globus-url-copy) (arguments=-vb –notpt gsiftp://g1.isi.cs.cnr.it/.../Unidb gsiftp://k2.deis.unical.it/.../Unidb )... (&(resourceManagerContact=k2.deis.unical.it) (subjobStartType=strict-barrier) (label=ws2_c2) (executable=.../IMiner)... )... A GENERATED RSL SCRIPT

25 APPLICATION EXECUTION

26 Some things we have done recently VEGA :  Support for more complex computation layouts,  Execution plan optimization,  Abstract resources definition and use. KNOWLEDGE GRID :  A peer-to-peer system for presence management and resource discovery on the Grid,  A tool for optimized file transfer on the Grid based on GridFTP,  A data mining ontology and an associated tool. ON GOING WORK : OTHER TOOLS

27 ON GOING WORK OGSA and KNOWLEDGE DISCOVERY SERVICES  The KNOWLEDGE GRID is an abstract service-based Grid architecture that does not limit the user in developing and using service-based knowledge discovery applications.  We are defining a set of Grid Services that export functionalities and operations of the KNOWLEDGE GRID.  Each of the KNOWLEDGE GRID services is exposed as a persistent service, using the OGSA conventions and mechanisms.  We intend to offer those OGSA-Compliant services for impementing distributed Data Mining applications and Knowledge Discovery processes on Grids.

28 CONCLUSION  Parallel and distributed data mining suites and computational grid technology are two critical elements of future high- performance computing environments for e-science (data-intensive experiments) e-business (on-line services) virtual organizations support (virtual teams, virtual enterprises)  Knowledge Grids will enable entirely new classes of advanced applications for dealing with the data deluge.  The Grid is not yet another distributed computing system: it is a medium to dynamically share heterogeneous resources, services, and knowledge.

29 CONCLUSION  Grids are coupling computation-oriented services with data- oriented services and knowledge-based services.  This trend enlarges the Grid application scenario and offer new opportunities for high-level applications.  We are much more able to store data than to extract knowledge from it.  The KNOWLEDGE GRID is a framework for the unification of knowledge discovery and grid technologies helping us to climb some mountain of data.

30 MAIN REFERENCES  M. Cannataro, D. Talia, The Knowledge Grid, Communications of the ACM, 46(1),  M Cannataro, D. Talia, P. Trunfio, Distributed Data Mining on the Grid, Future Generation Computer Systems, 18(8),  D. Talia, The Open Grid Services Architecture-Where the Grid Meets the Web, IEEE Internet Computing, 6(6),

31 THANKS