1 Interoperability of a Scalable Distributed Data Manager with an Object-relational DBMS Thesis presentation Yakham NDIAYE November, 13 the 2001 November,

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Database Architectures and the Web
1 Mariposa system Witold Litwin. 2 Basic goals WAN oriented DDBS Multiple sites –e.g., 1000 Scalable Locally autonomous Easy to evolve.
Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Rim Moussa University Paris 9 Dauphine Experimental Performance Analysis of LH* RS Parity Management Workshop on Distributed Data Structures: WDAS 2002.
Tutorial 8 CSI 2132 Database I. Exercise 1 Both disks and main memory support direct access to any desired location (page). On average, main memory accesses.
Brian Babcock Surajit Chaudhuri Gautam Das at the 2003 ACM SIGMOD International Conference By Shashank Kamble Gnanoba.
Parallel Database Systems The Future Of High Performance Database Systems David Dewitt and Jim Gray 1992 Presented By – Ajith Karimpana.
1 Virtual Machine Resource Monitoring and Networking of Virtual Machines Ananth I. Sundararaj Department of Computer Science Northwestern University July.
Database Software File Management Systems Database Management Systems.
1 ©2007, University of Pisa, Dip. Ingegneria dell’Informazione – Andrea Bacioccola Survey on Database Architectures A. Bacioccola.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Witold Litwin Riad Mokadem Thomas Schwartz Disk Backup Through Algebraic Signatures.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
BUSINESS DRIVEN TECHNOLOGY
Data-centric computing with Netezza Architecture DISC reading group September 24, 2007.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
Chapter 4 Database Management Systems. Chapter 4Slide 2 What is a Database Management System (DBMS)?  Database An organized collection of related data.
Databases with Scalable capabilities Presented by Mike Trischetta.
PMIT-6102 Advanced Database Systems
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
1 SDDS-2000 : A Prototype System for Scalable Distributed Data Structures on Windows 2000 SDDS-2000 : A Prototype System for Scalable Distributed Data.
Interpreting the data: Parallel analysis with Sawzall LIN Wenbin 25 Mar 2014.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Database Architectures and the Web Session 5
Soror SAHRI – June 13 th, 2006 Design & Implementation of a Scalable Distributed Database System: SD-SQL Server 1\46 pages Soror SAHRI
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
1 Experimental Evidence on Partitioning in Parallel Data Warehouses Pedro Furtado Prof. at Univ. of Coimbra & Researcher at CISUC DEI/CISUC-Universidade.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Introduction to Hadoop and HDFS
1 High-Availability LH* Schemes with Mirroring W. Litwin, M.-A. Neimat U. Paris 9 & HPL Palo-Alto
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Storing Organizational Information - Databases
1 Scalable Distributed Database System: SD-SQL Server Soror SAHRI Ceria, Paris-Dauphine University Journées Académiques Microsoft.
Parallel Database Systems Instructor: Dr. Yingshu Li Student: Chunyu Ai.
What is a Database? SECTION 1. Database Technology and its Evolution Decades long evolution Early data processing systems Today's systems New technology.
1 WDAS – 14 June THESSALONIKI(Greece) Range Queries to Scalable Distributed Data Structure RP* WDAS – 14 June THESSALONIKI(Greece) Range.
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
CENTRALISED AND CLIENT / SERVER DBMS. Topics To Be Discussed………………………. (A) Centralized DBMS (i) IntroductionIntroduction (ii) AdvantagesAdvantages (ii)
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 2- 1.
1 Database Management Systems (DBMS). 2 Database Management Systems (DBMS) n Overview of: ä Database Management Components ä Database Systems Architecture.
Distributed database system
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Database Concepts Track 3: Managing Information using Database.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Mapping the Data Warehouse to a Multiprocessor Architecture
1 Querying the Physical World Son, In Keun Lim, Yong Hun.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
CS 540 Database Management Systems
Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.
Grid Computing.
Mapping the Data Warehouse to a Multiprocessor Architecture
April 30th – Scheduling / parallel
Distributed computing deals with hardware
Wednesday, 5/8/2002 Hash table indexes, physical operators
Performance And Scalability In Oracle9i And SQL Server 2000
The Gamma Database Machine Project
Parallel DBMS DBMS Textbook Chapter 22
Presentation transcript:

1 Interoperability of a Scalable Distributed Data Manager with an Object-relational DBMS Thesis presentation Yakham NDIAYE November, 13 the 2001 November, 13 the 2001

2 u Develop techniques for the interoperability of a DBMS with an external SDDS file. u Examine various architectural issues, making such a coupling the most efficient. u Validate our technical choices by the prototyping and the experimental performances analysis. u Our approach is at the crossing the main memory DBMS, the object-relational-DBMS with the foreign functions, and the distributed/parallel DBMS. Objective

3 u Multicomputers u SDDSs u AMOS-II & DB2 DBMSs u Coupling SDDS and AMOS-II u Coupling SDDS and DB2 u Experimental analysis u Conclusion Plan

4 Multicomputers u A collection of loosely coupled computers ­Computers inter-connected by high-speed local area networks. u Cost/Performance ­offers potentially storage and processing capabilities rivaling a supercomputer at a fraction of the cost.  New architectural concepts ­offer to applications the cumulated CPU and storage capabilities of a large number of inter-connected computers.

5 New data structures specifically for Multicomputers Data are structured - records with keys parallel scans & function shipping Data are on servers - waiting for access Overflowing servers split into new servers - appended to the file without informing the clients Queries come from multiple autonomous clients - Access initiators - Not using any centralized directory for access computations See for more : SDDS

6 u AMOS-II : Active Mediating Object System u A main memory database system. u Declarative query language : AMOSQL. u External data sources capability. u External program interfaces AMOS-II using : - Call-level interface (call-in) - Foreign functions (call-out) u See the AMOS-II page for more: AMOS-II DBMS

7 u IBM object-relational DBMS « DB2 Universal Database ». u Typical representative of a commercial relational-object DBMS. u Capabilities to handle external data through the user-defined functions (UDF). DB2 Universal Database

8 Coupling Strategies u AMOS-SDDS Strategy : - for a scalable RAM file supporting database queries - Use a DBMS for manipulations best handled through by the query language ; - Direct fast data access for manipulations not supported well, or at all, by a DBMS ; - Distributed queries processing with functions shipping.

9 AMOS-SDDS System AMOS-SDDS scalable parallel query processing

10 Coupling Strategies u SD-AMOS Strategy : - Uses AMOS-II as the memory manager at each SDDS storage site ; - Scalable generalization of a parallel DBMS ; - Data partitioning becomes dynamic.

11 SD-AMOS System SD-AMOS scalable parallel query processing

12 Couplage SDDS & DB2 u DB2-SDDS Strategy : - Coupling of a DBMS with an external data repository with direct fast data access. - Use of a SDDS file by a DBMS like an external data repository. - Offer to the user an interface more elaborate than that of SDDS manager, in particular by his query language.

13 Coupling SDDS & DB2 DB2-SDDS Overall Architecture Register a user-defined external table function : CREATE FUNCTION scan(Varchar(20)) RETURNS TABLE (ssn integer, name Varchar(20), city Varchar(20)) EXTERNAL NAME ‘interface!fullscan'

14 Coupling SDDS & DB2 Foreign functions to access SDDS records from DB2 : range(cleMin, cleMax) -> liste enregistrements dont cleMin < clé < cleMax scan(nom_fichier)-> liste de tous les enregistrements du fichier Sample queries : - Parallel scan All SDDS records. select * from table( scan(‘fichier’) ) as table_sdds(SSN, NAME,CITY) - Range query SDDS records where key between 1 and 100. select * from table( range(1, 100) ) as table_sdds(SSN, NAME,CITY) order by Name

15 u Six Pentium III 700 MHz with 256 MB of RAM running Windows 2000 u On a 100Mbit/s Ethernet network. u One site is used as Client and the five other as Servers u We run many servers at the same machine (up to 3 per machine). u File scaled from 1 to 15 servers. The Hardware

16 u Benchmark data : ­Table Person (SS#, Name, City). ­Size 20,000 to 300,000 tuples of 25 bytes. ­50 Cities. ­Random distribution. u Benchmark query : « couples of persons in the same city » ­Query 1, the file resides at a single AMOS-II. ­Query 2, the file resides at AMOS-SDDS. ­Join evaluation : Two strategies. u Measures : - Speed-up & Scale-up u Processing time of aggregate functions Benchmark queries

17 Server Query Processing u E-strategy ­Data stay external to AMOS » within the SDDS bucket ­Custom foreign functions perform the query u I-strategy ­Data are dynamically imported into AMOS-II » Possibly with the local index creation » Deleted after the processing » Good for joins ­AMOS performs the query

18 Speed-up Elapsed time of Query 2 according to the strategy for a file of 20,000 records, distributed over 1 to 5 servers. Server nodes12345 Elapsed time(s)1, Time per tuple (ms) Serveur nodes12345 Nested-loop(s) Index lookup(s) I-Strategy for Query 2: elapsed time E-Strategy for Query 2: elapsed time Elapsed time per tuple of Query 2 according to the strategy

19 u The results showed an important advantage of I-Strategy on E- Strategy for the evaluation of the join query. u For 5 servers, the rate is 6 times for the nested loop, and 9 times if an index is creates. u The favorable result makes us study the scale-up characteristics of AMOS-SDDS on a file that scales up to 300,000 tuples. Discussion

20 Scaling the number of servers Elapsed time of join queries to AMOS-SDDS File size20,00060,000100,000160,000200,000240,000300,000 # SDDS servers Q1 (ms) Q2 (ms) Q1 w. extrap. (ms) Q2 w. extrap. (ms) AMOS-II (ms) Q1 = AMOS-SDDS join; Q2 = AMOS-SDDS join with count. Time per tuple (extrapolated for AMOS-SDDS)

21 Scaling the number of servers Expected time per tuple of join queries to AMOS-SDDS u Results are extrapolated to 1 server per machine. - Basically, the CPU component of the elapsed time is divided by 3 u The extrapolation of the processing time of the join query with count shows a linear scalability of the system. u Processing time per tuple remains constant (2.94ms) when the file size and the number of servers increase by the same factor.

22 Aggregate Function count Elapsed time of aggregate function Count # servers12345 E-Stratégie (ms) 10 I-Stratégie (ms) 1, Elapsed times for AMOS-II = 280ms Elapsed time of aggregate functions Count under AMOS-SDDS Elapsed time over 100,000-tuple file on AMOS-SDDS

23 Aggregate Function max Elapsed time of aggregate function Max #servers12345 I-Stratégie (ms) I-Stratégie (ms) 1, Elapsed times for AMOS-II = 471ms Elapsed time over 100,000-tuple file on AMOS-SDDS Elapsed time of aggregate functions Max under AMOS-SDDS

24 u Contrary to the join query, the external strategy is gaining for the evaluation of aggregate functions. u For count function, improvement is about 34 times. u For max function, improvement is about 4 times. u Due to the importation cost and to a SDDS property : the current number of records is a parameter of a bucket. u Linear Speed-up : processing time decreases with the number of servers. u The use of the external functions can thus be very advantageous for certain kind of operations. Discussion

25 SD-AMOS performance measurements Creation time of 3,000,000 records file. The bucket size is 750,000 records of 100 bytes Global and moving average insertion time of a record

26 SD-AMOS performance measurements Elapsed time of range query Average time per tuple

27 u The average insertion time of a record with the splits is of 0.15ms. u The average access time to a record on a distributed file is of 0.12ms. - It is 100 times faster than that with a traditional file on disc. u Linear scalability : The insertion time and the access time per tuple remains constant when the file size and the number of servers increase. Discussion

28 DB2-SDDS performance measurements Elapsed time of range query Time per tuple (i) access time to the data in a DB2 table, (ii) access time to SDDS file from the DB2 external functions (DB2-SDDS) and (iii) direct access time to SDDS file from a SDDS client.

29 u Access time to SDDS file is much faster than the access time to a DB2 table: 0.02ms versus 0.07ms. u Access time to external data from DB2 (0.08ms), is less fast than the access to the internal data (0.07ms). Coupling cost u An application has : - fast direct access to the data - through the DBMS, access by the query language Discussion

30 u We have coupled a SDDS manager with a main-memory DBMS AMOS-II and DB2 to improve the current technologies for high- performance databases and for the coupling with external data repositories. u The experiments we have reported in the Thesis prove the efficiency of the system.  AMOS-SDDS et DB2-SDDS : use of a SDDS file by a DBMS and the parallel query processing on the server sites. u SD-AMOS : appears as a scalable generalisation of a parallel main-memory DBMS where the data partitioning becomes automatic. Conclusion

31 u Other types of DBMS queries. u Client's scalable distributed query decomposer. u challenging appears the design of a scalable distributed query optimizer handling the dynamic data partitioning. Future Work

32 End Thank You for Your Attention CERIA Université Paris IX Dauphine