Download presentation
Presentation is loading. Please wait.
1
Parallel Algorithms for Relational Operations
2
Many processors...and disks There is a collection of processors. –Often the number of processors p is large, in the hundreds or thousands. Of great importance to database processing is the fact that along with these processors are many disks, perhaps one or more per processor.
3
Shared Nothing Architecture This architecture is the most commonly used for DB operations.
4
C (R) in parallel Distribute data across as many disks as possible. –Use some hash function h. –Then if there are p processors, divide relation R's tuples among the p processor's disks. Each processor examines tuples of R present on its own disk. –The result, C (R), is divided among the processors, just like R is. Be careful about which attributes to use for distributing data. –E.g. if a=10 (R), don't use a.
5
Joins R(X,Y) S(Y,Z) Hash tuples of R and S using the same hash function on Y and distribute them among the p processors. We need B(R) + B(S) I/O’s for this. Then we must ship ((p-1)/p)*(B(R)+B(S)) data across network. Cost of shipment We will assume that shipment across the network is significantly cheaper than movement of data between disk and memory, because no physical motion is involved in shipment across a network, while it is for disk I/O.
6
Joins Each receiving processor has to store the data on its own disk, then execute a local join on the tuples received. –Sizes of relations that each processor receives are B(R)/p and B(S)/p approximately. –If we used a two-pass sort-join at each processor, we would do 3(B(R) + B(S))/p disk I/O's at each processor. Add another 2(B(R) + B(S))/p disk I/O's per processor, to account for: –first read of each tuple during hashing and distribution and –storing away of each tuple by the processor receiving the tuple during hash and distribution. More I/O's in total - 5 I/O's per block of data, rather than 3. However, the elapsed time, as measured by the number of I/O's performed at each processor has gone down from 3(B(R) + B(S)) to 5(B(R) + B(S))/p ---- significant win for large p.
7
We can do even better - Example B(R) = 1000 blocks, B(S) = 500 blocks, M = 101 buffers per processor 10 processors We begin by hashing each tuple of R and S to one of 10 "buckets," using a hash function h that depends only on the join attributes Y. These 10 "buckets" represent the 10 processors, and tuples are shipped to the processor corresponding to their "bucket.“ The total number of disk I/O's needed to read the tuples of R and S is 1500, or 150 per processor.
8
Example Each processor will have about 15 blocks worth of data for each other processor, so it ships 135 blocks to the other nine processors. We arrange that the processors ship the tuples of S before the tuples of R. Since each processor receives about 50 blocks of tuples from S, it can store those tuples in a main-memory data structure, using 50 of its 101 buffers. Then, when processors start sending R-tuples, each one is compared with the local S-tuples, and any resulting joined tuples are output. In this way, the only cost of the join is 1500 disk I/O's, much less than for any other method discussed.
9
Example Moreover, the elapsed time is primarily the 150 disk I/O's performed at each processor, plus the time to ship tuples between processors and perform the main-memory computations. Note that 150 disk I/O's is less than 1/10 th of the time to perform the same algorithm on a uniprocessor; –we have not only gained because we had 10 processors working for us, but the fact that there are a total of 1010 buffers among those 10 processors gives us additional efficiency.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.