Download presentation
Presentation is loading. Please wait.
1
Data Placement Problems in Database Applications
An Zhu Stanford University
2
Data Placement Data objects Multiple disks
Assignment of objects to disks Optimize performance Optimize I/O Handle dynamic situations 4/21/2019 AZ
3
Outline Multimedia Systems [GKKTZ 00]
Maximize the total clients served Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed moves 4/21/2019 AZ
4
Outline Multimedia Systems [GKKTZ 00]
Maximize the total clients served Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed moves 4/21/2019 AZ
5
Multimedia Storage Systems
Movie objects Clients/subscribers Parallel disks Limited storage: # of movies—Nj Limited bandwidth: # of clients—Cj Homogeneous system: Nj=k, Cj=L, j Uniform ratio: Cj/Nj=r, j 4/21/2019 AZ
6
An Example Total Storage: 12 , Total Capacity: 1800 000/600 000/600
100 100 100 100 100 100 000/600 100 100 100 100 400 400 Total Storage: 12 , Total Capacity: 1800 4/21/2019 AZ
7
An Example Total Storage: 12 , Total Capacity: 1800 400/600 000/600
100 100 000/600 100 100 100 100 400 400 Total Storage: 12 , Total Capacity: 1800 4/21/2019 AZ
8
An Example Total Storage: 12 , Total Capacity: 1800 400/600 400/600
000/600 100 100 400 400 Total Storage: 12 , Total Capacity: 1800 4/21/2019 AZ
9
Not All Clients Can be Satisfied
400/600 400/600 600/600 400 Total Satisfied Clients: 1400/1800=7/9 4/21/2019 AZ
10
Sliding Window Algorithm
Consider one disk at a time Maintain an ordered list of movies The first consecutive k movies (or less) with at least L combined clients Assign the first L clients to the disk and reconsider leftover clients 4/21/2019 AZ
11
An Example Max window size k=4 100 000/600 000/600 100 100 100 100 100
400 400 Max window size k=4 4/21/2019 AZ
12
An Example Max window size k=4 200 000/600 000/600 100 100 100 100 100
400 400 Max window size k=4 4/21/2019 AZ
13
An Example Max window size k=4 400 000/600 000/600 100 100 100 100 100
4/21/2019 AZ
14
An Example Max window size k=4 400 000/600 000/600 100 100 100 100 100
4/21/2019 AZ
15
An Example Max window size k=4 000/600 000/600 100 100 100 100 100 100
400 400 700 Max window size k=4 4/21/2019 AZ
16
An Example Max window size k=4 600/600 000/600 100 100 100 100 100 100
100 400 Max window size k=4 4/21/2019 AZ
17
An Example Max window size k=4 600/600 000/600 100 100 100 100 100 100
400 Max window size k=4 4/21/2019 AZ
18
An Example Max window size k=4 600/600 600/600 100 100 100 100 100 100
400 000/600 Max window size k=4 4/21/2019 AZ
19
An Example Total Satisfied Clients: 1600/1800=8/9 600/600 600/600 100
400/600 Total Satisfied Clients: 1600/1800=8/9 4/21/2019 AZ
20
Theoretical Bounds Satisfies at least fraction of total clients
In the worst case, no algorithm can satisfy more clients Translates to an approximation PTAS: (1+)-approximation, >0 4/21/2019 AZ
21
Theoretical Bounds Satisfies at least fraction of total clients
In the worst case, no algorithm can satisfy more clients Translates to an approximation PTAS: (1+)-approximation, >0 4/21/2019 AZ
22
Proof Sketch Load vs. storage saturated: ML, MS Least loaded disk: cL
ML+MS=M, 0<c<1 All remaining movies each have no more than cL/k clients Initial instance is feasible (w.l.o.g.) 4/21/2019 AZ
23
An Example ML=2, MS=1, c=400/600 cL/k=100
600/600 600/600 100 100 ML=2, MS=1, c=400/600 cL/k=100 400/600 Total Satisfied Clients: 1600/1800=8/9 4/21/2019 AZ
24
Proof Outline If there is a load saturated disk with less than k movies All clients are satisfied Otherwise At most ML movies are left Satisfy at least fraction of the clients 4/21/2019 AZ
25
Lemma If any of the load saturated disk has less than k objects
Any k-1 remaining movies in the list has L clients or more 4/21/2019 AZ
26
Lemma The remaining disks are all load saturated
So, all clients are satisfied At least L At least L 4/21/2019 AZ
27
Otherwise… Each disk has exactly k movies Initial movies: N M·k
Total assigned movies: M·k Initial movies: N M·k “New” movies generated: ML # of movies left: ≤ ML # of clients/remaining movie: ≤ cL/k Total # of remaining clients: cLML/k 4/21/2019 AZ
28
Otherwise… Total clients: ≤ M·L Assigned clients: ML·L + Ms·cL
Total # of remaining clients : ≤ Ms·(1-c)L Final bound: 4/21/2019 AZ
29
Simulation Results M=5 L=100 N=M·k Zipf with =0.0 ( i-1 ) 4/21/2019
AZ
30
Recap The problem is NP-complete
PTAS: best possible approximation bound : best possible absolute bound Sliding window algorithm: practical with O((M+N)log(M+N)) running time 4/21/2019 AZ
31
Outline Multimedia Systems [GKKTZ 00]
Maximize the total clients served Relational Database Layout [AFMPZ 03] Minimize the total I/O access time Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed moves 4/21/2019 AZ
32
Relational Databases Objects: indexes, tables, views Multiple disks
Minimize the total I/O access time 4/21/2019 AZ
33
Past Work Full striping Split uniformly across all available disks
Utilize I/O parallelism : transfer rate 200MB 200MB =0.05s/MB,Tt=10s 4/21/2019 AZ
34
Past Work Full striping Split uniformly across all available disks
Utilize I/O parallelism : transfer rate 50MB 200MB =0.05s/MB,Tt=10s =0.05s/MB,Tt=2.5s 50MB 50MB 50MB 50MB 50MB 50MB 4/21/2019 AZ
35
Past Work Co-accessed objects with Random I/O
Seek time/per block size: 0.01s/0.1MB Seek rate: =0.1s/MB Smaller object dominates A Ts=50·2=10s 50MB 50MB 50MB 50MB B 100MB 100MB 100MB 100MB 4/21/2019 AZ
36
Past Work Combined access time Transfer time: Tt=(50+100)·=7.5s
Seek time: Ts=min(50,100)·=10s Combined time: Tt+Ts=17.5s A 50MB 50MB 50MB 50MB B 100MB 100MB 100MB 100MB 4/21/2019 AZ
37
Past Work Fully striping is no longer optimal [Agrawal Chaudhuri Das Narasayya 03’] Combined time: 200·=10s 200MB 200MB 100MB 100MB 4/21/2019 AZ
38
Data Layout Problem Work Load (SQL DML)
A set of queries and/or updates A set of co-accessed objects (pairwise) Access stats (pairwise) Minimize the estimated I/O access time 4/21/2019 AZ
39
Theoretical Questions
Approximation and its hardness Transfer time: P Seek time: Very Hard Combined time Hard Minimizing transfer time alone is a “good” approximation 4/21/2019 AZ
40
Transfer Time Heterogeneous disks Objects
Different rate: j Storage constraint: cj Objects Different size: si Access frequency: i,i’ Solvable using Linear Programming (LP) 4/21/2019 AZ
41
LP Amount of object i assigned to disk j
Each object must be completely assigned Each disk’s storage limit is kept Transfer time for (i,i’) on disk j Overall transfer time for (i,i’) Minimize the total transfer time 4/21/2019 AZ
42
Seek Time Hard even on disks with no storage constraint
Integral assignment Each object is assigned to one machine only Conversion from a fraction assignment with no loss 4/21/2019 AZ
43
Conversion f( , )=1, f( , )=1, f( , )=0
Total seek cost: 1002+1002 Want: each file is spread uniformly across a subset of disks A B C B A C 100MB 150MB 200MB 200MB 100MB 100MB 4/21/2019 AZ
44
Conversion f( , )=1, f( , )=1, f( , )=0
Total seek cost: 1002+1002 New cost: 1002+1252 A B C B A C 125MB 125MB 200MB 200MB 100MB 100MB 4/21/2019 AZ
45
Conversion f( , )=1, f( , )=1, f( , )=0
Total seek cost: 1002+1002 New cost: 1002 A B C B A C 250MB 125MB 125MB 200MB 200MB 100MB 100MB 4/21/2019 AZ
46
Conversion f( , )=1, f( , )=1, f( , )=0 Total seek cost: 0
Each file resides on only one disk A B C B A C 400MB 250MB 250MB 200MB 200MB 200MB 100MB 100MB 4/21/2019 AZ
47
Implications A polynomial time algorithm
Equivalent to Minimum Edge Deletion k-Partition NP-Hard to approximate: O(n2) Forces combined time be hard to approximate 4/21/2019 AZ
48
Combined Time Let Hard to approximate: ·, 1>>0
Optimize transfer time alone gives 1+ 4/21/2019 AZ
49
Outline Multimedia Systems [GKKTZ 00]
Maximize the total clients served Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed moves 4/21/2019 AZ
50
Load Rebalancing Access pattern changes
Initial layout no longer balanced MAX LOAD 1 3 6 9 7 4 10 2 8 5 11 4/21/2019 AZ
51
Load Rebalancing Relocate objects Minimize the max load with k moves
9 1 6 3 4 7 10 2 5 8 11 4/21/2019 AZ
52
Simple Algorithm (O(nlogn))
Step 1: Repeat k times Remove the largest object from the most loaded disk The resulting max load: L(1) Step2: Relocate the removed k objects Assign each object to the least loaded disk The resulting max load: L(2) 4/21/2019 AZ
53
Example (k=3) Step1: L(1) OPT 9 1 6 MAX LOAD 9 MAX LOAD 1 L(1) 6 3 4
7 10 2 5 8 11 4/21/2019 AZ
54
Example (k=3) Step2: L(2) OPT + S 2OPT
Overall: max(L(1),L(2)) 2OPT 9 1 6 L(2) 9 1 3 MIN LOAD 6 MIN LOAD 4 7 10 2 5 8 11 4/21/2019 AZ
55
Can We Do Better? Blindly remove the large object is not wise MAX LOAD
9 1 6 3 4 7 10 2 5 8 11 4/21/2019 AZ
56
How can we do better Take care of large objects
Large objects: size >1/2OPT Small objects: size 1/2OPT OPT 10 9 1 2 11 6 3 4 7 5 8 4/21/2019 AZ
57
Revising The Plan Step 1: Repeat k times
Remove the largest object from the most loaded disk The resulting max load: L(1) OPT Step2: Relocate the removed k objects Assign each object to the least loaded disk The resulting max load: L(2) OPT +S 2OPT 4/21/2019 AZ
58
Revised Plan Step 1: with no more than k moves
Shuffle large objects and remove small objects The resulting max load: L(1) 3/2 OPT Step2: Relocate the removed objects Assign each object to the least loaded disk (they are all small) The resulting max load: L(2) OPT +S 3/2 OPT just to fill in the space 4/21/2019 AZ
59
Example Step 1 2 10 11 MAX LOAD 9 1 3/2 OPT 1 6 3 4 7 10 2 5 8 11
4/21/2019 AZ
60
Example Step 2 2 10 11 OPT+S 2 9 1 MIN LOAD 10 11 MIN LOAD MIN LOAD
6 3 4 7 5 8 4/21/2019 AZ
61
Recap Fast 1.5-approximation (O(nlogn)) NP-complete
PTAS: generalized cost 4/21/2019 AZ
62
Summary Multimedia Systems [GKKTZ 00]
Maximize the total clients served Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed moves 4/21/2019 AZ
63
Other Research Interests
Algorithms for mobile, sensor networks and privacy preserving databases Online Algorithms: queue management, packet switching, web caching, scheduling Approximation Algorithms: network design, multi-product pricing Streaming Algorithms 4/21/2019 AZ
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.