Data Placement Problems in Database Applications

Slides:



Advertisements
Similar presentations
On allocations that maximize fairness Uriel Feige Microsoft Research and Weizmann Institute.
Advertisements

Guy EvenZvi LotkerDana Ron Tel Aviv University Conflict-free colorings of unit disks, squares, & hexagons.
Class-constrained Resource Allocation Problems Tami Tamir Thesis advisor: Hadas Shachnai.
Chapter 4 Memory Management Basic memory management Swapping
The strength of routing Schemes. Main issues Eliminating the buzz: Are there real differences between forwarding schemes: OSPF vs. MPLS? Can we quantify.
Tight Bounds for Online Class- constrained Packing Hadas Shachnai Bell Labs and The Technion IIT Tami Tamir The Technion IIT.
Algorithm Design Methods Spring 2007 CSE, POSTECH.
Class-constrained Packing Problems with Application to Storage Management in Multimedia Systems Tami Tamir Department of Computer Science The Technion.
Nanxi Kang Princeton University
Allocating Memory.
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
Max-Min Fair Allocation of Indivisible Goods Amin Saberi Stanford University Joint work with Arash Asadpour TexPoint fonts used in EMF. Read the TexPoint.
CS-3013 & CS-502, Summer 2006 Multimedia topics (continued)1 Multimedia Topics (continued) CS-3013 & CS-502 Operating Systems.
ICNP'061 Benefit-based Data Caching in Ad Hoc Networks Bin Tang, Himanshu Gupta and Samir Das Department of Computer Science Stony Brook University.
CISS Princeton, March Optimization via Communication Networks Matthew Andrews Alcatel-Lucent Bell Labs.
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Dynamic routing – QoS routing Load sensitive routing QoS routing.
Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.
A Model for Minimizing Active Processor Time Jessica Chang Joint work with Hal Gabow and Samir Khuller.
Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.
Google File System Simulator Pratima Kolan Vinod Ramachandran.
Storage Management in Virtualized Cloud Environments Sankaran Sivathanu, Ling Liu, Mei Yiduo and Xing Pu Student Workshop on Frontiers of Cloud Computing,
Multimedia Operating Systems ●File System Paradigms ●File Replacement ●Caching ●Disk.
© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.
Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.
Princeton University COS 423 Theory of Algorithms Spring 2001 Kevin Wayne Approximation Algorithms These lecture slides are adapted from CLRS.
1 Short Term Scheduling. 2  Planning horizon is short  Multiple unique jobs (tasks) with varying processing times and due dates  Multiple unique jobs.
Minimizing Stall Time in Single Disk Susanne Albers, Naveen Garg, Stefano Leonardi, Carsten Witt Presented by Ruibin Xu.
1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.
Real-Time Support for Mobile Robotics K. Ramamritham (+ Li Huan, Prashant Shenoy, Rod Grupen)
Maximum Flow Problem (Thanks to Jim Orlin & MIT OCW)
15.082J and 6.855J March 4, 2003 Introduction to Maximum Flows.
Algorithmic Mechanism Design Shuchi Chawla 11/7/2001.
Competitive Queueing Policies for QoS Switches Nir Andelman Yishay Mansour An Zhu TAUTAUStanford.
1 Chapter 5 Branch-and-bound Framework and Its Applications.
Approximation algorithms for combinatorial allocation problems
Memory Management Chapter 7.
Memory Management.
Data Driven Resource Allocation for Distributed Learning
Multiway Search Trees Data may not fit into main memory
CS 440 Database Management Systems
Algorithm Design Methods
Maximum Matching in the Online Batch-Arrival Model
Chapter 12: Query Processing
Evaluation of Relational Operations
On Scheduling in Map-Reduce and Flow-Shops
Computability and Complexity
Operating System Concepts
Data Integration with Dependent Sources
Chapter 15 – Part 1 The Internal Operating System
Chapter 6. Large Scale Optimization
Data Orgnization Frequently accessed data on the same storage device?
So far… Text RO …. printf() RW link printf Linking, loading
Memory Management-I 1.
1.206J/16.77J/ESD.215J Airline Schedule Planning
Chapter 8: Memory management
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Integer Programming (정수계획법)
Selfish Load Balancing
Algorithms (2IL15) – Lecture 7
Algorithm Design Methods
Algorithm Design Methods
Clustering.
IIS Progress Report 2016/01/18.
Algorithm Design Methods
Outline Introduction Background Distributed DBMS Architecture
Chapter 6. Large Scale Optimization
CSE 542: Operating Systems
Presentation transcript:

Data Placement Problems in Database Applications An Zhu Stanford University

Data Placement Data objects Multiple disks Assignment of objects to disks Optimize performance Optimize I/O Handle dynamic situations 4/21/2019 AZ

Outline Multimedia Systems [GKKTZ 00] Maximize the total clients served Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed moves 4/21/2019 AZ

Outline Multimedia Systems [GKKTZ 00] Maximize the total clients served Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed moves 4/21/2019 AZ

Multimedia Storage Systems Movie objects Clients/subscribers Parallel disks Limited storage: # of movies—Nj Limited bandwidth: # of clients—Cj Homogeneous system: Nj=k, Cj=L,  j Uniform ratio: Cj/Nj=r,  j 4/21/2019 AZ

An Example Total Storage: 12 , Total Capacity: 1800 000/600 000/600 100 100 100 100 100 100 000/600 100 100 100 100 400 400 Total Storage: 12 , Total Capacity: 1800 4/21/2019 AZ

An Example Total Storage: 12 , Total Capacity: 1800 400/600 000/600 100 100 000/600 100 100 100 100 400 400 Total Storage: 12 , Total Capacity: 1800 4/21/2019 AZ

An Example Total Storage: 12 , Total Capacity: 1800 400/600 400/600 000/600 100 100 400 400 Total Storage: 12 , Total Capacity: 1800 4/21/2019 AZ

Not All Clients Can be Satisfied 400/600 400/600 600/600 400 Total Satisfied Clients: 1400/1800=7/9 4/21/2019 AZ

Sliding Window Algorithm Consider one disk at a time Maintain an ordered list of movies The first consecutive k movies (or less) with at least L combined clients Assign the first L clients to the disk and reconsider leftover clients 4/21/2019 AZ

An Example Max window size k=4 100 000/600 000/600 100 100 100 100 100 400 400 Max window size k=4 4/21/2019 AZ

An Example Max window size k=4 200 000/600 000/600 100 100 100 100 100 400 400 Max window size k=4 4/21/2019 AZ

An Example Max window size k=4 400 000/600 000/600 100 100 100 100 100 4/21/2019 AZ

An Example Max window size k=4 400 000/600 000/600 100 100 100 100 100 4/21/2019 AZ

An Example Max window size k=4 000/600 000/600 100 100 100 100 100 100 400 400 700 Max window size k=4 4/21/2019 AZ

An Example Max window size k=4 600/600 000/600 100 100 100 100 100 100 100 400 Max window size k=4 4/21/2019 AZ

An Example Max window size k=4 600/600 000/600 100 100 100 100 100 100 400 Max window size k=4 4/21/2019 AZ

An Example Max window size k=4 600/600 600/600 100 100 100 100 100 100 400 000/600 Max window size k=4 4/21/2019 AZ

An Example Total Satisfied Clients: 1600/1800=8/9 600/600 600/600 100 400/600 Total Satisfied Clients: 1600/1800=8/9 4/21/2019 AZ

Theoretical Bounds Satisfies at least fraction of total clients In the worst case, no algorithm can satisfy more clients Translates to an -approximation PTAS: (1+)-approximation, >0 4/21/2019 AZ

Theoretical Bounds Satisfies at least fraction of total clients In the worst case, no algorithm can satisfy more clients Translates to an -approximation PTAS: (1+)-approximation, >0 4/21/2019 AZ

Proof Sketch Load vs. storage saturated: ML, MS Least loaded disk: cL ML+MS=M, 0<c<1 All remaining movies each have no more than cL/k clients Initial instance is feasible (w.l.o.g.) 4/21/2019 AZ

An Example ML=2, MS=1, c=400/600 cL/k=100 600/600 600/600 100 100 ML=2, MS=1, c=400/600 cL/k=100 400/600 Total Satisfied Clients: 1600/1800=8/9 4/21/2019 AZ

Proof Outline If there is a load saturated disk with less than k movies All clients are satisfied Otherwise At most ML movies are left Satisfy at least fraction of the clients 4/21/2019 AZ

Lemma  If any of the load saturated disk has less than k objects Any k-1 remaining movies in the list has L clients or more 4/21/2019 AZ

Lemma  The remaining disks are all load saturated So, all clients are satisfied At least L At least L 4/21/2019 AZ

Otherwise… Each disk has exactly k movies Initial movies: N  M·k Total assigned movies: M·k Initial movies: N  M·k “New” movies generated:  ML # of movies left: ≤ ML # of clients/remaining movie: ≤ cL/k Total # of remaining clients: cLML/k 4/21/2019 AZ

Otherwise… Total clients: ≤ M·L Assigned clients:  ML·L + Ms·cL Total # of remaining clients : ≤ Ms·(1-c)L Final bound: 4/21/2019 AZ

Simulation Results M=5 L=100 N=M·k Zipf with =0.0 (  i-1 ) 4/21/2019 AZ

Recap The problem is NP-complete PTAS: best possible approximation bound : best possible absolute bound Sliding window algorithm: practical with O((M+N)log(M+N)) running time 4/21/2019 AZ

Outline Multimedia Systems [GKKTZ 00] Maximize the total clients served Relational Database Layout [AFMPZ 03] Minimize the total I/O access time Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed moves 4/21/2019 AZ

Relational Databases Objects: indexes, tables, views Multiple disks Minimize the total I/O access time 4/21/2019 AZ

Past Work Full striping Split uniformly across all available disks Utilize I/O parallelism : transfer rate 200MB 200MB =0.05s/MB,Tt=10s 4/21/2019 AZ

Past Work Full striping Split uniformly across all available disks Utilize I/O parallelism : transfer rate 50MB 200MB =0.05s/MB,Tt=10s =0.05s/MB,Tt=2.5s 50MB 50MB 50MB 50MB 50MB 50MB 4/21/2019 AZ

Past Work Co-accessed objects with Random I/O Seek time/per block size: 0.01s/0.1MB Seek rate:  =0.1s/MB Smaller object dominates A Ts=50·2=10s 50MB 50MB 50MB 50MB B 100MB 100MB 100MB 100MB 4/21/2019 AZ

Past Work Combined access time Transfer time: Tt=(50+100)·=7.5s Seek time: Ts=min(50,100)·=10s Combined time: Tt+Ts=17.5s A 50MB 50MB 50MB 50MB B 100MB 100MB 100MB 100MB 4/21/2019 AZ

Past Work Fully striping is no longer optimal [Agrawal Chaudhuri Das Narasayya 03’] Combined time: 200·=10s 200MB 200MB 100MB 100MB 4/21/2019 AZ

Data Layout Problem Work Load (SQL DML) A set of queries and/or updates A set of co-accessed objects (pairwise) Access stats (pairwise) Minimize the estimated I/O access time 4/21/2019 AZ

Theoretical Questions Approximation and its hardness Transfer time: P Seek time: Very Hard Combined time Hard Minimizing transfer time alone is a “good” approximation 4/21/2019 AZ

Transfer Time Heterogeneous disks Objects Different rate: j Storage constraint: cj Objects Different size: si Access frequency: i,i’ Solvable using Linear Programming (LP) 4/21/2019 AZ

LP Amount of object i assigned to disk j Each object must be completely assigned Each disk’s storage limit is kept Transfer time for (i,i’) on disk j Overall transfer time for (i,i’) Minimize the total transfer time 4/21/2019 AZ

Seek Time Hard even on disks with no storage constraint Integral assignment Each object is assigned to one machine only Conversion from a fraction assignment with no loss 4/21/2019 AZ

Conversion  f( , )=1, f( , )=1, f( , )=0 Total seek cost: 1002+1002 Want: each file is spread uniformly across a subset of disks A B C B A C 100MB 150MB 200MB 200MB 100MB 100MB 4/21/2019 AZ

Conversion  f( , )=1, f( , )=1, f( , )=0 Total seek cost: 1002+1002 New cost: 1002+1252 A B C B A C 125MB 125MB 200MB 200MB 100MB 100MB 4/21/2019 AZ

Conversion  f( , )=1, f( , )=1, f( , )=0 Total seek cost: 1002+1002 New cost: 1002 A B C B A C 250MB 125MB 125MB 200MB 200MB 100MB 100MB 4/21/2019 AZ

Conversion  f( , )=1, f( , )=1, f( , )=0 Total seek cost: 0 Each file resides on only one disk A B C B A C 400MB 250MB 250MB 200MB 200MB 200MB 100MB 100MB 4/21/2019 AZ

Implications A polynomial time algorithm Equivalent to Minimum Edge Deletion k-Partition NP-Hard to approximate: O(n2) Forces combined time be hard to approximate 4/21/2019 AZ

Combined Time Let Hard to approximate: ·, 1>>0 Optimize transfer time alone gives 1+ 4/21/2019 AZ

Outline Multimedia Systems [GKKTZ 00] Maximize the total clients served Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed moves 4/21/2019 AZ

Load Rebalancing Access pattern changes Initial layout no longer balanced MAX LOAD 1 3 6 9 7 4 10 2 8 5 11 4/21/2019 AZ

Load Rebalancing Relocate objects Minimize the max load with  k moves 9 1 6 3 4 7 10 2 5 8 11 4/21/2019 AZ

Simple Algorithm (O(nlogn)) Step 1: Repeat k times Remove the largest object from the most loaded disk The resulting max load: L(1) Step2: Relocate the removed k objects Assign each object to the least loaded disk The resulting max load: L(2) 4/21/2019 AZ

Example (k=3) Step1: L(1)  OPT 9 1 6 MAX LOAD 9 MAX LOAD 1 L(1) 6 3 4 7 10 2 5 8 11 4/21/2019 AZ

Example (k=3) Step2: L(2)  OPT + S  2OPT Overall: max(L(1),L(2))  2OPT 9 1 6 L(2) 9 1 3 MIN LOAD 6 MIN LOAD 4 7 10 2 5 8 11 4/21/2019 AZ

Can We Do Better? Blindly remove the large object is not wise MAX LOAD 9 1 6 3 4 7 10 2 5 8 11 4/21/2019 AZ

How can we do better Take care of large objects Large objects: size >1/2OPT Small objects: size 1/2OPT OPT 10 9 1 2 11 6 3 4 7 5 8 4/21/2019 AZ

Revising The Plan Step 1: Repeat k times Remove the largest object from the most loaded disk The resulting max load: L(1)  OPT Step2: Relocate the removed k objects Assign each object to the least loaded disk The resulting max load: L(2)  OPT +S  2OPT 4/21/2019 AZ

Revised Plan Step 1: with no more than k moves Shuffle large objects and remove small objects The resulting max load: L(1)  3/2 OPT Step2: Relocate the removed objects Assign each object to the least loaded disk (they are all small) The resulting max load: L(2)  OPT +S  3/2 OPT just to fill in the space 4/21/2019 AZ

Example Step 1 2 10 11 MAX LOAD 9 1 3/2 OPT 1 6 3 4 7 10 2 5 8 11 4/21/2019 AZ

Example Step 2 2 10 11  OPT+S 2 9 1 MIN LOAD 10 11 MIN LOAD MIN LOAD 6 3 4 7 5 8 4/21/2019 AZ

Recap Fast 1.5-approximation (O(nlogn)) NP-complete PTAS: generalized cost 4/21/2019 AZ

Summary Multimedia Systems [GKKTZ 00] Maximize the total clients served Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed moves 4/21/2019 AZ

Other Research Interests Algorithms for mobile, sensor networks and privacy preserving databases Online Algorithms: queue management, packet switching, web caching, scheduling Approximation Algorithms: network design, multi-product pricing Streaming Algorithms 4/21/2019 AZ