1 / 18 Federal University of Rio de Janeiro – COPPE/UFRJ Author : Wladimir S. Meyer – Doctorate Student Advisors : Jano Moreira de Souza – Ph.D. Milton.

Slides:



Advertisements
Similar presentations
ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
Advertisements

Database System Concepts and Architecture
Database Architectures and the Web
Distributed Systems basics
OLAP Query Processing in Grids
Chapter 3 Database Architectures and the Web Pearson Education © 2009.
MTA SZTAKI Hungarian Academy of Sciences Grid Computing Course Porto, January Introduction to Grid portals Gergely Sipos
Globus Toolkit 4 hands-on Gergely Sipos, Gábor Kecskeméti MTA SZTAKI
Distributed Heterogeneous Data Warehouse For Grid Analysis
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
The Globus Toolkit Gary Jackson. Introduction The Globus Toolkit is a product of the Globus Alliance ( It is middleware for developing.
Slides for Grid Computing: Techniques and Applications by Barry Wilkinson, Chapman & Hall/CRC press, © Chapter 1, pp For educational use only.
1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Globus Computing Infrustructure Software Globus Toolkit 11-2.
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
Chapter 3 Database Architectures and the Web Pearson Education © 2009.
Grid Monitoring By Zoran Obradovic CSE-510 October 2007.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.
Database Architectures and the Web Session 5
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Daniel Vanderster University of Victoria National Research Council and the University of Victoria 1 GridX1 Services Project A. Agarwal, A. Berman, A. Charbonneau,
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Grid Workload Management Massimo Sgaravatto INFN Padova.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
Middleware for Grid Computing and the relationship to Middleware at large ECE 1770 : Middleware Systems By: Sepehr (Sep) Seyedi Date: Thurs. January 23,
Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
OGSA-DAI.
Grid Services I - Concepts
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
State Key Laboratory of Resources and Environmental Information System China Integration of Grid Service and Web Processing Service Gao Ang State Key Laboratory.
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
Nguyen Tuan Anh. VN-Grid: Goals  Grid middleware (focus of this presentation)  Tuan Anh  Grid applications  Hoai.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Distributed database system
Cole David Ronnie Julio. Introduction Globus is A community of users and developers who collaborate on the use and development of open source software,
Website: Answering Continuous Queries Using Views Over Data Streams Alasdair J G Gray Werner.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Data Manipulation with Globus Toolkit Ivan Ivanovski TU München,
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Distributed DBMS Architecture Chapter 4 Principles Of Distributed Database Systems,2/e By Ozsu, Patrick Valduriez.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Database Management System Architecture 2004, Spring Pusan National University.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction OGSA-DQP is a service based distributed.
Current Globus Developments Jennifer Schopf, ANL.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G.
OGSA-DAI.
WP2: Data Management Gavin McCance University of Glasgow.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Databases and DBMSs Todd S. Bacastow January 2005.
Database Architectures and the Web
Database Architectures and the Web
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Database Architecture
Presentation transcript:

1 / 18 Federal University of Rio de Janeiro – COPPE/UFRJ Author : Wladimir S. Meyer – Doctorate Student Advisors : Jano Moreira de Souza – Ph.D. Milton Ramos Ramirez – D.Sc.

2 / 18 Introduction Motivation Objectives Related Works Framework Description Structure Functioning New functionalities added to Secondo The Case Study Final Considerations Summary

3 / 18 Introduction Motivation The challenge of integrate spatial databases spread around a computational grid Objectives Aggregate new functionalities to an extensible SDBMS that permit it to act as a platform to study distributed spatial databases in computational grids. This platform should: Be capable of interact (by itself) with other analogous platforms in a grid Offer some level of transparencies [Özsu and Valduriez 1999]: Data independence Network transparency Replication Transparency Be modular to permit focus only in experiences being developed Be capable of exchange “specialized skills” (algebras in this case)

4 / 18 Introduction Related Works The GGF Data Access and Integration Services Work Group (GGF- DAIS-WG) produces a lot of recomendations related with DB in grids [OGSA-DAI-WSRF 05]. They are a set of interfaces and services to be implemented outside the DBMS environment Only relational, XML and file system data models are supported The OGSA-DAI project implements many of DAIS-WG recomendations and offers a java toolkit for clients The OGSA-DQP project [Smith et al. 2002] uses OGSA-DAI to offer support in distributed queries over a grid. Only relational databases are benefitted and doesn’t support the newly release of OGSA-DAI based on WSRF.

5 / 18 Framework Description - Structure The framework is composed by: A Spatial DBMS * : Secondo [Dieker and Güting 2000] was adopted because its modularity, formalism and extensibility. It was intended originally for experimental purpose with spatial and spatio-temporal data models [Güting et al. 2004]. A grid middleware: it offers several services that are used by the SDBMS [Foster 2005]: Job Manager Service (GRAM) Reliable File Transfer Service (RFT) Index Service (MDS) Globus Toolkit 4 was chosen because of its web service approach and set of powerful components. A set of tools: it was added to provide some extra functionalities like: Submit queries to a set of servers, Discovery an algebra, in other Secondo, based in algebra description files Import an algebra (*) – when used with its spatial algebra

6 / 18 Central Index Service (MDS) Secondo#1 Secondo #4 Secondo #3 QUERY Request Global Schema & Fragments’ map Response Secondo #2 Algebras’ Description file Framework Description - Functioning Global Schema Fragments’ map

7 / 18 Central Index Service (MDS) Secondo #1 Secondo #4 Secondo #3 QUERY Secondo #2 Request Servers’ status Same fragments Framework Description - Functioning Global Schema Fragments’ map

8 / 18 Central Index Service (MDS) Secondo #1 Secondo #4 Secondo #3 QUERY Global Schema Fragments’ map Secondo #2 Framework Description - Functioning CPU load Total amount of memory Total amount of free memory Number of running processes Number of active processes Number of users logged in Total amount of free space in hard disk CPU load Total amount of memory Total amount of free memory Number of running processes Number of active processes Number of users logged in Total amount of free space in hard disk Responses

9 / 18 Central Index Service (MDS) Secondo #1 Secondo #4 Secondo #3 QUERY Secondo #2 Send subqueries The Secondo #1 generates a job description file, a Secondo-command file and submit them to selected nodes using GRAM The job description file can express a multijob, for example meaning that a result from a query must be transfered to another to be used in a second step. Framework Description - Functioning Global Schema Fragments’ map

10 / 18 Central Index Service (MDS) Secondo #1 Secondo #4 Secondo #3 QUERY Secondo #2 Results as nested lists (RFT) Framework Description - Functioning Global Schema Fragments’ map

11 / 18 Central Index Service (MDS) Secondo #1 Secondo #4 Secondo #3 Result Secondo #2 The returned results are aggregated to form a global result Framework Description - Functioning Global Schema Fragments’ map

12 / 18 Modified Secondo Submit activities (jobs) to grid Discover and monitor registered resources Framework Description – New functionalities Adapted from [Ramirez 2001] subqueries

13 / 18 Files generated automatically during a job submission: Job description file – a file that specifies details about where and how a job must be executed Secondo Command file – specifies a set of commands to be run in a Secondo server Framework Description – New functionalities open database 28433; create tempBox:rect; update tempBox:=[const rect value( – – –25.339)] let temp=drain_line creatertree [shape]; query temp drain_line windowintersect [tempBox] consume; delete temp; delete tempBox; close database 28433; Spatial select example Constructed with spatial algebra R-tree algebra operators

14 / 18 The Case Study To validate the proposed framework a geographic database prototype is being built in the following manner: Composition: 04 computers, with Fedora Linux, as grid nodes, All machines running GT4 with GRAM, MDS, RFT services, All machines running a modified Secondo (Secondo-grid) Distributed spatial database design: The fragments can be replicated All themes belong to the same region Federated architecture with a Global Schema Thematic fragmentation

15 / 18 The Case Study Autonomy: modarate, because each Secondo must update the global schema and fragments’ map when necessary Nature of data: Cartographic data supplied by Directory of Geographic Service (Brazilian Army) Queries being implemented: spatial select and spatial join

16 / 18 Final Considerations This framework is being developed as a platform for experimental purposes: performance isn’t its main focus Many issues were not included in present work and will be covered in future works: transaction control, optimizer for distributed queries, security, etc Modules of the framework that are running now: Registering and Monitoring modules: based on global schema, fragments’ map, servers’ status monitor and algebras’ description file Automatic generation of files: job description and secondo command file Submission of single queries with GRAM clients

17 / 18 Final Considerations Next steps: Conclude the data transference module using RFT Implement multijob submission with complex queries Conclude the infrastructure to import algebras

18 / 18 Thank you !

19 / 18 The Study Case

20 / 18 Files that are generated automatically during a job submission: Job description file – a file that specifies details about where and how a job must be executed Framework Description – New functionalities 1 2 SecondoTTYBDB 3 ${GLOBUS_USER_HOME}/secondo/bin 4 commands.txt 5 ${GLOBUS_USER_HOME}/secondo/bin/results.txt 6 ${GLOBUS_USER_HOME}/stderr gsiftp://brasilia.gridbd.cos.ufrj.br:2888/${GLOBUS_USER_HOME}/secondo/bin/commands.txt file:///${GLOBUS_USER_HOME}/secondo/bin/commands.txt file://${GLOBUS_USER_HOME}/secondo/bin/results.txt gsiftp://submit.host:2888/${GLOBUS_USER_HOME}/secondo/bin/results-srv1.txt file://${GLOBUS_USER_HOME} /secondo/bin/results.txt 21 22

21 / 18 There are two resources registered, as XML files, in a Central MDS service: A Global Schema: Framework Description – Resources registered 28433NE 1 drainage_line geoData line 1 nome string 1.

22 / 18 A map of fragments’ locations Framework Description – Resources registered 28433NE hydrography rio.cos.ufrj.br recife.cos.ufrj.br vegetation brasilia.cos.ufrj.br edification vitoria.cos.ufrj.br brasilia.cos.ufrj.br

23 / 18 Each SDBMS server should register an “algebras’ description file” that specifies all its algebras. This is a XML file with the following format: Framework Description – Resources registered rio.cos.ufrj.br spatial /usr/local/secondo/Algebras/Spatial SpatialAlgebra.h, SpatialAlgebra.cpp, SpatialAlgebra.spec, makefile point, points, region, line intersects, inside, touches, atached, overlaps, ininterior, intersection rtree data structure r-tree b-tree /usr/local/secondo/Algebras/RTree RTreeAlgebra.h, RTreeAlgebra.cpp, RTreeAlgebra.spec, makefile rtree creatertree, windowintersects, insertrtree, deletertree, updatertree

24 / 18 It is possible to use MDS to provide any kind of information related with a resource. In this framework all servers should be monitored (as a resource) to permit a better choice among machines that contains replicas of a fragment. A script was developed to collect the following information: CPU load Total amount of memory Total amount of free memory Number of running processes Number of active processes Number of users logged in Total amount of free space in hard disk The results are exposed by MDS as a XML file Framework Description – Monitoring status

25 / 18 There are two resources registered, as XML files, in a Central MDS service: A Global Schema: A map of fragments’ locations Framework Description – Resources registered Each SDBMS server should register an “algebras’ description file” that specifies all its algebras.

26 / 18 The Case Study Join 1.Read the global schema 2.Read the fragments’ locations map 3.Read resources status from nodes with fragments involved in the query 4.Select the nodes with best conditions in case of a replicated fragment 5.Break the global query in sub-queries 6.Estimate cardinality of sub-queries 7.Build a job description file that determines sub-queries execution in an adequate order: sub-queries with smaller cardinality at first 8.Submit the job to GRAM 9.Transfer the results of these first sub-queries to nodes where the last stage of the queries should be executed as a local query in a SDBMS environment (ingenuous approach). 10.Transfer the final results to the original node and delete all temporary files.

27 / 18 The Case Study Select 1.Read the global schema 2.Read the fragments’ locations map 3.Read resources status from nodes with fragments involved in the query 4.Select the nodes with best conditions in case of a replicated fragment 5.Break the global query in sub-queries 6.Generate a job description file and submit the job to GRAM 7.Receive and integrate results to generate a global result