ICDT 20051 Optimal Distributed Declustering using Replication Keith Frikken Purdue University Jan 5, 2005.

Slides:



Advertisements
Similar presentations
Jingming Xu Multimedia Communications Lab University of Waterloo
Advertisements

13/4/20051 Alaa Sadik Ph.D., University of Hull, UK Faculty of Education in Qena جامعة جنوب الوادي مركز تنمية قدرات أعضاء هيئة التدريس Effective Presentation.
Load Balancing in a Cluster-based Active Jiani Guo (Student Member, IEEE) Laxmi Bhuyan (Fellow, IEEE) March 15 th 2005 Seo, Dong Mahn.
Optimal Algorithms for k-Search with Application in Option Pricing Julian Lorenz, Konstantinos Panagiotou, Angelika Steger Institute of Theoretical.
The Future (and Past) of Quantum Lower Bounds by Polynomials Scott Aaronson UC Berkeley.
Lower Bounds for Local Search by Quantum Arguments Scott Aaronson (UC Berkeley) August 14, 2003.
IVOA, Kyoto May Simple Spectral Access SSA Query Interface Doug Tody (NRAO) Markus Dolensky (ESO) Et. al. International V IRTUAL O BSERVATORY.
IVOA, Kyoto May Data Access Layer Working Group Working Group Report and Summary Doug Tody National Radio Astronomy Observatory International.
A AAAA Model to Support Science Gateways with Community Accounts GGF-14 Science Gateways Workshop June 28, 2005 Von Welch, James Barlow, James Basney,
Lia Toledo Moreira Mota, Alexandre de Assis Mota, Wu, Shin-Ting
M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,
UKOLN is supported by: Emergent technologies & digitisation: the institutional impact. Liz Lyon & Kevin Edge VCs Retreat, October a.
UKOLN is supported by: Starting to explore the role of memory institutions within the social fabric of the new Web Dr Liz Lyon, UKOLN, University of Bath,
VGISCs view VGISC Uses Cases Geneva October 2005.
DATESO, April 14 th 2005 Multimedia Information extraction from HTML product catalogues Martin Labský 1, Vojtěch Svátek 1, Pavel Praks 2, Ondřej Šváb 1.
March 18, 2005Computers in Libraries SPACE THE FUTURE FRONTIER Don Albrecht Jennifer S. Kutzik Colorado State University Libraries.
Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.
07/05/20051 The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms by Ali R. Butt, Chris Gniady, and Y.Charlie Hu, SIGMETRICS05.
Intelligent Soccer Team Gustavo Armagno Facundo Benavides Claudia Rostagnol
May 9, September 2005, Barcelona, Spain Prioritization of Forestry Themes for the SRA Risto Päivinen.
Lindsey Bleimes Charlie Garrod Adam Meyerson
ML Conseils SOCRATES – GRUNDTVIG 1 ACRE 2 Evaluation Report Seminar at Latsia (Cyprus) Marc LACAUD.
Chapter 1 - Introduction to Operations Management
Scalable and Dynamic Quorum Systems Moni Naor & Udi Wieder The Weizmann Institute of Science.
CAS: Central Instrument for Managing for Results.
Pflugerville ISD - August Pflugerville Independent School District Integrated Physics & Chemistry Measurement.
New evaluation of d+Li data up to 50 MeV for IFMIF P. Pereslavtsev, U. Fischer Association FZK-Euratom Forschungszentrum Karlsruhe, Germany Technical Meeting.
Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops Nikolaos Drosinos and Nectarios Koziris National Technical University.
Computer Networking Lecture 20 – Queue Management and QoS.
Choosing an Order for Joins
SU/IU Service-Learning Symposium Nov Strategies and challenges to institutionalising service-learning at a South African university Magda Fourie.
Spring Part III: Introduction to XPath XML Path Language.
ISB5- March 20-23, The comprehension of sentences in Spanish-English bilinguals Paola E. Dussias Penn State University 4 th International.
C omputer G raphics, TU Braunschweig EuroVis “ BioBrowser: A Framework for Fast Protein Visualization ” Andreas Halm, Lars Offen, Dieter Fellner.
Lecture 10 Boltzmann machine
Lossless compression: state of the art. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Many more variants In our lessons we’ve.
CSCE430/830 Computer Architecture
Hash-based Indexes CS 186, Spring 2006 Lecture 7 R &G Chapter 11 HASH, x. There is no definition for this word -- nobody knows what hash is. Ambrose Bierce,
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Fast Parallel Similarity Search in Multimedia Databases (Best Paper of ACM SIGMOD '97 international conference)
Chapter 11 (3 rd Edition) Hash-Based Indexes Xuemin COMP9315: Database Systems Implementation.
Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1
UNIVERSITY OF MASSACHUSETTS Dept
Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,
Chapter 3 Parallel Search 3.1Search Queries 3.2Data Partitioning 3.3Search Algorithms 3.4Summary 3.5Bibliographical Notes 3.6Exercises.
Continuous Retrieval of Replicated Data from Heterogeneous Storage Arrays 9/10/2014 Nihat Altiparmak and Ali Saman Tosun Mascots 2014.
DEXA 2005 Quality-Aware Replication of Multimedia Data Yicheng Tu, Jingfeng Yan and Sunil Prabhakar Department of Computer Sciences, Purdue University.
A Quantitative Analysis and Performance Study For Similar- Search Methods In High- Dimensional Space Presented By Umang Shah Koushik.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
Introduction to Database, Fall 2004/Melikyan1 Hash-Based Indexes Chapter 10.
1.1 CS220 Database Systems Indexing: Hashing Slides courtesy G. Kollios Boston University via UC Berkeley.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
Hash-Based Indexes Chapter 11
CS222: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hash-Based Indexes Chapter 10
CS222P: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hashing.
Hash-Based Indexes Chapter 11
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Chapter 11 Instructor: Xin Zhang
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #07 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Presentation transcript:

ICDT Optimal Distributed Declustering using Replication Keith Frikken Purdue University Jan 5, 2005

ICDT Declustering Data Declustering data over multiple disks to improve performance for range queries has been well studied Applications include: –Spatio-temporal databases –Image and video data –Scientific simulation datasets

ICDT Goal Divide data uniformly along dimensions to create tiles Put records contained in each tile on different disks so that I/O can be parallelized Assumptions –Data can be tiled in such a way –Disks have constant retrieval times Assigning tiles to disks is similar to a coloring problem (disks are colors) A range query can be answered optimally if the # of I/O retrievals for any specific disk is: # of tiles/# of disks Two approaches: –Coloring schemes –Replication

ICDT Notations k is number of disks m is number of tiles in queries r is level of replication (i.e., is 2) Q is the set of all range queries ret(q) is the actual retrieval time of q Optimal retrieval time for a query q is o q = m/k Additive error ε, max q Q {ret(q)-o q }

ICDT Coloring schemes Disk Modulo (DM) [Du and Sobolewski, 1982] Fieldwise XOR (FX) [Kim and Pramanik, 1988] Cyclic Schemes (RPHM, GFIB, EXH) – [Prabhakar et al, 1998] Golden Ratio Sequences (GRS) – [Bhatia et al, 2000]

ICDT Other schemes [Atallah and Prabhakar, 2000] developed a scheme in two dimensional grids for k=2 n disks the has additive error of O(log k) [Sinha et al, 2001] proved lower bounds on the additive error of (log k) and (log (d-1)/2 k) for 2 dimensions and d (>2) dimensions respectively [Chen and Cheng, 2002] showed that an additive error of O(log (d-1) k) is achievable for any # of dimensions (>2)

ICDT Replication Placing records on multiple disks can further improve performance of declustering schemes Two Problems: –How to schedule a query (i.e., what tiles are retrieved from each disk) –How to use replication to balance load Approaches: –Chained Declustering [Hsiao and DeWitt, 1990] –Random Duplication Allocation [Sanders et al 2000], [Sanders, 2001], and [Czumaj and Scheidler, 2003]

ICDT Replication Results Chained Declustering –Fast Scheduling Algorithm O(m+k) time to test if a specific retrieval time is possible [Aerts et al, 2000] RDA –If mck(log k) then optimal with high prob [Czumaj and Scheideler, 2003] –Fast scheduling algorithm O(Δk O(1) ) time [Czumaj and Scheideler, 2003] Hybrid techniques [Chen and Cheng, 2002] –Use GRS with second random disk

ICDT Our Results We define a new class of schemes called the shift schemes Deterministic Any query with at least k(k-1)ε tiles can be answered in an optimal fashion Queries can be scheduled in O(m+k(log ε)) time If a single disk fails, then any query with at least k(k-1)ε tiles can be answered optimally Experimental performance similar to RDA (better for many cases)

ICDT Shift Scheme Definition Use any strong coloring scheme Use a modified chain declustering –Defined by shift value s (where gcd(s,k)=1) Base scheme is defined by function f(x,y) –Second color is (f(x,y)+s mod k)

ICDT Shift Scheme Definition Use any strong coloring scheme Use a modified chain declustering –Defined by shift value s (where gcd(s,k)=1) Base scheme is defined by function f(x,y) –Second color is (f(x,y)+s mod k) 0,31,42,03,14,2 2,03,14,20,31,4 4,20,31,42,03,1 1,42,03,14,20,3 3,14,20,31,42,0

ICDT Scheduling Can use modification of chain declustering scheduling algorithm to schedule queries in O(m+k(log ε)) time Essentially, use previous algorithm to test if a specific load is possible and do a binary search on the possible loads

ICDT Bound(1) There are k disks (D 0,…,D k-1 ) Disk D i has t i tiles initially (as the primary disk) The number of tiles is m=t 0 +…+t k-1 D i shifts d i tiles to D i+1 d i t i The goal is to minimize the most tiles at a disk, i.e., max 0ik-1 {d i-1 +t i -d i }

ICDT Bound(2) Recall, –o= m/k –max 0ik-1 {t i } o+ε Suppose mk(k-1)ε Then, –o (k-1)ε –Surplus ( ) is bounded by (k-1)ε –max 0ik-1 {d i } (k-1)ε o Two cases: –If disk has a surplus –If disk has a shortage

ICDT disks

ICDT disks

ICDT disks

ICDT disks, 3 dimensions

ICDT Generalizations Permutations Higher levels of replication Survivability –If the level of replication is r, can handle any r- 1 failures –When r=2, and a single disk fails then: Fast scheduling still possible Large queries still optimal

ICDT Summary Shift schemes are a new class of schemes –Optimal for large enough queries –Efficient scheduling algorithm –Resilient to disk failures Future Work –Better analysis of scheme –Choosing shift values