1 SDDS-2000 : A Prototype System for Scalable Distributed Data Structures on Windows 2000 SDDS-2000 : A Prototype System for Scalable Distributed Data.

1 SDDS-2000 : A Prototype System for Scalable Distributed Data Structures on Windows 2000 SDDS-2000 : A Prototype System for Scalable Distributed Data Structures on Windows 2000 Witold Litwin Witold.Litwin@dauphine.fr Witold.Litwin@dauphine.fr

2 PlanPlan n What are SDDSs ? n Where are we in 2002 ? –LH* : Scalable Distributed Hash Partitioning –Scalable Distributed Range Partitioning RP* –High-Availability : LH* RS & RP* RS –DBMS coupling : AMOS-SDDS & SD-AMOS n Architecture of SDDS-2000 n Experimental performance results n Conclusion –Future work

3 What is an SDDS n A new type of data structure –Specifically for multicomputers n Designed for data intensive files : – horizontal scalability to very large sizes »larger than any single-site file – parallel and distributed processing »especially in (distributed) RAM –Record access time better than for any disk file –100-300  s usually under Win 2000 (100 Mb/s net, 700 MHZ CPU, 100B – 1 KB records) – access by multiple autonomous clients

4 Killer apps n Any traditional application using a large hash or B- tree or k-d file –Access time to an RAM SDDS record is 100 faster than to a disk record n Network storage servers (SNA and NAS) n DBMSs n WEB servers n Video servers n Real-time systems n High Perf. Comp.

5 MulticomputersMulticomputers n A collection of loosely coupled computers –common and/or preexisting hardware –share nothing architecture –message passing through high-speed net (  Mb/s) n Network multicomputers –use general purpose nets »LANs: Ethernet, Token Ring, Fast Ethernet, SCI, FDDI... »WANs: ATM... n Switched multicomputers –use a bus, or a switch »e.g., IBM-SP2, Parsytec

6 Client Server Typical Network Multicomputer Network segments

7 Why multicomputers ? n Potentially unbeatable price-performance ratio –Much cheaper and more powerful than supercomputers »1500 WSs at HPL with 500+ GB of RAM & TBs of disks n Potential computing power –file size –access and processing time –throughput n For more pro & cons : –Bill Gates at Microsoft Scalability Day –NOW project (UC Berkeley) –Tanenbaum: "Distributed Operating Systems", Prentice Hall, 1995 –www.microoft.com White Papers from Business Syst. Div.

8 Why SDDSs n Multicomputers need data structures and file systems n Trivial extensions of traditional structures are not best  hot-spots  scalability  parallel queries  distributed and autonomous clients  distributed RAM & distance to data

9 Distance to data (Jim Gray) 100 ns 1  sec 10 msec RAM distant RAM (gigabit net) local disk 100  sec distant RAM (Ethernet)

10 Distance to Data (Jim Gray) 1  s 10  s 10 ms RAM distant RAM (gigabit net) 100  s distant RAM (Ethernet) 1 m 10 m 2 h 8 d Moon local disk

11 Scalability Dimensions (Client view) Data size # servers and # clients Operation time (distant RAM) # operation / s Scale-up Linear (idéal) Sub-linear (usuel)

12 Scalability Dimensions (Client view, SDDS specific) Data size Multicomputer & SDDS Scale-up Linear Sub-linear (usuel) Disk & Cache Tapes, juke- box… Cache & Disk Operation time Single comp. Local RAM Cluster- Computer Cluster-Computer- = fix # of servers

13 Scalability Dimensions (Client view) # servers # operations/ s Speed-up Linear (ideal) Sub-linear (usuel)

14 What is an SDDS ? + Queries come from multiple autonomous clients + Data are on servers + Data are structured +records with keys  objects with OIDs + more semantics than in Unix flat-file model + abstraction most popular with applications + parallel scans & function shipping + Overflowing servers split into new servers

15 An SDDS Clients growth through splits under inserts Servers

19 SDDS : Addressing Principles + SDDS Clients + Are not informed about the splits. + Do not access any centralized directory for record address computations + Have each a less or more adequate private image of the actual file structure + Can make addressing errors + Sending queries or records to incorrect servers + Searching for a record that was moved elsewhere by splits + Sending a record that should be elsewhere for the same reason

20, Servers are able to forward the queries to the correct address –perhaps in several messages + Servers may send Image Adjustment Messages »Clients do not make same error twice  Servers supports parallel scans  Sent out by multicast or unicast  With deterministic or probabilistic termination n See the SDDS talk & papers for more –ceria.dauphine.fr/witold.html n Or the LH* ACM-TODS paper (Dec. 96) What is an SDDS ? SDDS : Addressing Principles

21 An SDDS Client Access Clients

22 Clients An SDDS Client Access

23 Clients IAM An SDDS Client Access

26 Known SDDSs Hash SDDS (1993) 1-d tree LH* DDH Breitbart & al RP* Kroll & Widmayer Breitbart & Vingralek m-d trees DS Classics LH*s k-RP* dPi-tree Nardelli-tree s-availability LH* SA LH* RS http://ceria.dauphine/SDDS-bibliograhie.html SDLSA Disk LH*m, LH*g Security

27 LH* (A classic) n Scalable distributed hash partitioning –Transparent for the applications »Unlike the current static schemes (i.e. DB2) –Generalizes the LH addressing schema »used in many products –SQL-Server, IIS, MsExchange, Frontpage, Netscape suites, Berkeley DB Library, LH-Server, Unify... n Typical load factor 70 - 90 % n In practice, at most 2 forwarding messages –regardless of the size of the file n In general, 1 message/insert and 2 messages/search on the average n 4 messages in the worst case n Several variants are known –LH* LH is most studied

28 Overview of LH n Extensible hash algorithm –Widely used, e.g., »Netscape browser (100M copies) »LH-Server by AR (700K copies sold) »MS Frontpage, Exchange, IIS –tought in most DB and DS classes –address space expands » to avoid overflows & access performance deterioration n the file has buckets with capacity b >> 1 n Hash by division h i : c -> c mod 2 i N provides the address h (c) of key c. n Buckets split through the replacement of h i with h i+1 ; i = 0,1,.. n On the average, b/2 keys move towards new bucket

29 Overview of LH n Basically, a split occurs when some bucket m overflows n One splits bucket n, pointed by pointer n. –usually m  n n n évolue : 0, 0,1, 0,1,..,2, 0,1..,3, 0,..,7, 0,..,2 i N, 0.. n One consequence => no index –characteristic of other EH schemes

30 LH File Evolution 35 12 7 15 24 h 0 ; n = 0 N = 1 b = 4 i = 0 h 0 : c -> 2 0 0

31 LH File Evolution 35 12 7 15 24 h 1 ; n = 0 N = 1 b = 4 i = 0 h 1 : c -> 2 1 0

32 LH File Evolution 12 24 h 1 ; n = 0 N = 1 b = 4 i = 1 h 1 : c -> 2 1 0 35 7 15 1

33 LH File Evolution 32 58 12 24 N = 1 b = 4 i = 1 h 1 : c -> 2 1 0 21 11 35 7 15 1 h1h1h1h1 h1h1h1h1

34 LH File Evolution 32 12 24 N = 1 b = 4 i = 1 h 2 : c -> 2 2 0 21 11 35 7 15 1 58 2 h2h2h2h2 h1h1h1h1 h2h2h2h2

35 LH File Evolution 32 12 24 N = 1 b = 4 i = 1 h 2 : c -> 2 2 0 33 21 11 35 7 15 1 58 2 h2h2h2h2 h1h1h1h1 h2h2h2h2

36 LH File Evolution 32 12 24 N = 1 b = 4 i = 1 h 2 : c -> 2 2 0 33 21 1 58 2 h2h2h2h2 h2h2h2h2 h2h2h2h2 11 35 7 15 3 h2h2h2h2

37 LH File Evolution 32 12 24 N = 1 b = 4 i = 2 h 2 : c -> 2 2 0 33 21 1 58 2 h2h2h2h2 h2h2h2h2 h2h2h2h2 11 35 7 15 3 h2h2h2h2

38 LH File Evolution n Etc –One starts h 3 then h 4... n The file can expand as much as needed –without too many overflows ever

39 Addressing Algorithm a <- h (i, c) if n = 0 then exit else if a < n then a <- h (i+1, c) ; end

40 LH*LH* n Property of LH : –Given j = i or j = i + 1, key c is in bucket m iff h j (c) = m ; j = i or j = i + 1 »Verify yourself n Ideas for LH* : –LH addresing rule = global rule for LH* file –every bucket at a server –bucket level j in the header –Check the LH property when the key comes form a client

41 LH* : file structure j = 4 0 1 j = 3 2 7 j = 4 8 9 n = 2 ; i = 3 n' = 0, i' = 0n' = 3, i' = 2 Coordinator Client servers

42 LH* : file structure j = 4 0 1 j = 3 2 7 j = 4 8 9 n = 2 ; i = 3 n' = 0, i' = 0n' = 3, i' = 2 Coordinator Client servers

43 LH* : split j = 4 0 1 j = 3 2 7 j = 4 8 9 n = 2 ; i = 3 n' = 0, i' = 0n' = 3, i' = 2 Coordinator Client servers

44 LH* : split j = 4 0 1 j = 3 2 7 j = 4 8 9 n = 2 ; i = 3 n' = 0, i' = 0n' = 3, i' = 2 Coordinator Client servers

45 LH* : split j = 4 0 1 2 j = 3 7 j = 4 8 9 n = 3 ; i = 3 n' = 0, i' = 0n' = 3, i' = 2 Coordinator Client servers j = 4 10

46 LH* Addressing Schema n Client –computes the LH address m of c using its image, –send c to bucket m n Server –Server a getting key c, a = m in particular, computes : a' := h j (c) ; if a' = a then accept c ; else a'' := h j - 1 (c) ; if a'' > a and a'' a and a'' < a' then a' := a'' ; send c to bucket a' ;

47 LH* Addressing Schema n Client –computes the LH address m of c using its image, –send c to bucket m n Server –Server a getting key c, a = m in particular, computes : a' := h j (c) ; if a' = a then accept c ; else a'' := h j - 1 (c) ; if a'' > a and a'' a and a'' < a' then a' := a'' ; send c to bucket a' ; n n See [LNS93] for the (long) proof Simple ?

48 Client Image Adjustement n The IAM consists of address a where the client sent c and of j (a) –i' is presumed i in client's image. –n' is preumed value of pointer n in client's image. –initially, i' = n' = 0. if j > i' then i' := j - 1, n' := a +1 ; if j > i' then i' := j - 1, n' := a +1 ; if n'  2^i' then n' = 0, i' := i' +1 ; n The algo. garantees that client image is within the file [LNS93] –if there is no file contractions (merge)

49 LH* : addressing j = 4 0 1 2 j = 3 7 j = 4 8 9 n = 3 ; i = 3 n' = 0, i' = 0n' = 3, i' = 2 Coordinator Client servers j = 4 10 15

51 LH* : addressing j = 4 0 1 2 j = 3 7 j = 4 8 9 n = 3 ; i = 3 n' = 1, i' = 3n' = 3, i' = 2 Coordinator Client servers j = 4 10 15 j = 4

55 LH* : addressing j = 4 0 1 2 j = 3 7 j = 4 8 9 n = 3 ; i = 3 n' = 1, i' = 3n' = 3, i' = 2 Coordinator Client servers j = 4 10 9 j = 4

56 ResultResult n The distributed file can grow to even whole Internet so that : –every insert and search are done in four messages (IAM included) –in general an insert is done in one message and search in two messages –proof in [LNS 93]

57 10,000 inserts Global cost Client's cost

60 Inserts by two clients

61 Parallel Queries n A query Q for all buckets of file F with independent local executions –every buckets should get Q exactly once n The basis for function shipping –fundamental for high-perf. DBMS appl. n Send Mode : –multicast »not always possible or convenient –unicast »client may not know all the servers »severs have to forward the query –how ?? Image File Q

62 LH* Algorithm for Parallel Queries (unicast) n Client sends Q to every bucket a in the image n The message with Q has the message level j' : –initialy j' = i' if n'  i' else j' = i' + 1 –bucket a (of level j ) copies Q to all its children using the alg. : while j' < j do j' := j' + 1 forward (Q, j' ) to bucket a + 2 j' - 1 ; endwhile n Prove it !

63 Termination of Parallel Query (multicast or unicast) n How client C knows that last reply came ? n Deterministic Solution (expensive) –Every bucket sends its j, m and selected records if any » m is its (logical) address –The client terminates when it has every m fullfiling the condition ; »m = 0,1..., 2 i + n where – i = min (j) and n = min (m) where j = i i+1i n

64 Termination of Parallel Query (multicast or unicast) n Probabilistic Termination ( may need less messaging) –all and only buckets with selected records reply –after each reply C reinitialises a time-out T –C terminates when T expires n Practical choice of T is network and query dependent –ex. 5 times Ethernet everage retry time »1-2 msec ? –experiments needed n Which termination is finally more useful in practice ? –an open problem

65 LH* variants n With/without load (factor) control n With/without the (split) coordinator –the former one was discussed –the latter one is a token-passing schema »bucket with the token is next to split –if an insert occurs, and file overload is guessed –several algs. for the decision –use cascading splits n See the talk on LH* at http://ceria.dauphine.fr/ http://ceria.dauphine.fr/

66 RP* schemes n Produce scalable distributed 1-d ordered files –for range search n Each bucket (server) has the unique range of keys it may contain n Ranges partition the key space n Ranges evolve dynamically through splits –Transparently for the application n Use RAM m-ary trees at each server –Like B-trees –Optimized for the RP* split efficiency

67 Current PDBMS technology (Ex. Non-Stop SQL) n Static Range Partitioning n Done manually by DBA n Requires goods skills n Not scalable

68 RP* schemes

69 High-availability SDDS schemes n Data remain available despite : –any single server failure & most of two server failures –or any up to n-server failure –and some catastrophic failures n n scales with the file size –To offset the reliability decline which would otherwise occur n Three principles for high-availability SDDS schemes are currently known –mirroring (LH*m) –striping (LH*s) –grouping (LH*g, LH*sa, LH*rs, RP* rs) n Realize different performance trade-offs

70 LH* RS : Record Groups n LH* RS records –LH* data records & parity records n Records with same rank r in the bucket group form a record group n Each record group gets n parity records –Computed using Reed-Salomon erasure correction codes »Additions ans multiplications in Galois Fields »See the Sigmod 2000 paper on the Web site for details n r is the common key of these records n Each group supports unavailability of up to n of its members

71 LH* RS Record Groups Data recordsParity records

72 LH* RS : Parity Management n An insert of data record with rank r creates or, usually, updates parity records r n An update of data record with rank r updates parity records r n A split recreates parity records –Data record usually change the rank after the split

73 LH* RS Scalable availability n Create 1 parity bucket per group until M = 2 i 1 buckets n Then, at each split, –add 2 nd parity bucket to each existing group –create 2 parity buckets for new groups until 2 i 2 buckets n etc.

74 LH* RS Scalable availability

79 SDDS-2000 : global architecture

80 SDDS-2000 : global architecture Applications etc SDDS Data server SDDS Data server SDDS Data server SDDS-2000 Server SDDS-2000 Client Network UDP & TCP

81 SDDS-2000: Client Architecture (RP*c)  2 Modules  Send Module  Receive Module  Multithread Architecture  SendRequest  ReceiveRequest  AnalyzeResponse1..4  GetRequest  ReturnResponse  Synchronization Queues  Client Images  Flow control

82 SDDS-2000: Server Architecture (RP*c)  Multithread architecture  Synchronization queues  Listen Thread for incoming requests  SendAck Thread for flow control  Work Threads for  request processing  response sendout  request forwarding  UDP for shorter messages (< 64K)  TCP/IP for longer data exchanges  Several buckets of different SDDS files

83 AMOS-SDDS Architecture n For database queries –Especially parallel scans n Couples SDDS-2000 and Amos II : –RAM OR-DBMS. –AMOSQL declarative query language » can be embedded into C and Lisp. –call-level interface (callin ) –external procedures (functions) (callout) n See the AMOS-II talk & papers for more http://www.dis.uu.se/~udbl/

84 AMOS-SDDS Architecture n SDDS is used as the distributed RAM storage manager. –RP* scheme for the scalable distributed range partitioning. –Like in a B-tree, records are lexicographically ordered according to their keys. –Supports efficiently the range queries. n Amos II provide a fast SDDS-based RAM OR- DBMS. n The callout capability realizes the AMOSQL object- relational capability usually called external or foreign functions.

85 AMOS-SDDS Architecture

86 AMOS-SDDS Architecture AMOS-SDDS scalable distributed query processing

87 AMOS-SDDS Server Query Processing n E-strategy –Data stay external to AMOS »within the SDDS bucket –Custom foreign functions perform the query n I-strategy –Data are on-the-fly imported into AMOS-II »Perhaps with the local index creation –Good for joins –AMOS performs the query n Which strategy is preferable ? –Good question

88 SD-AMOSSD-AMOS Server storage manager is a full scale DBMS –AMOS-II in our case since it is a RAM DBMS –Could be any DBMS n SDDS-2000 provides the scalable distributed partitioning schema n Server DBMS performs the splits –When et How ??? n Client manages scalable query decomposition & execution –Easier said than done n The whole system generalizes the PDBMS technology –Static partitioning only

89 Scalability Analysis n Theoretical –To validate an SDDS »See the papers –To get an idea of system performance »Limited validity n Experimental –More accurate validation of design issues –Practical necessity –Costs orders of magnitude more of time and money

90 Experimental Configuration n 6 machines : 700 Mhz P3 n 100 Mb/s Ethernet n 150 byte records

91 LH* file creation Ph. D Thesis of F. Bennour, 2000 Bucket size b = 5.000 Flow control On # of buckets # of inserts with splits w / splits Time (ms) LH* Scalability is confirmed Performance bound by the client processing speed

92 LH* Key search File Size in records and # servers Time (ms) Actual client image New client image & IAMs Performance bound by the client processing speed 2345

93 LH* RS Experimental Performance (Preliminary results) Insert time during the file creation (moving average) Number of records Time (ms)

94 LH* RS Experimental Performance (Preliminary results) File Creation Time Number of records Time (s)

95 LH* RS Experimental Performance (Preliminary results) n Normal key search –Unaffected by the parity calculus » 0.3 ms per key search n Degraded key search –About 2 ms for the application »1.1 ms (k =4) for the record recovery »1 ms for the client time-out and the coordinator action n Bucket recovery at the spare –0.3 ms per record (k =4)

96 LH* RS Experimental Performance (Preliminary results) Insert time during the file creation (moving average) Number of records Time (ms)

97 LH* RS Experimental Performance (Preliminary results) File Creation Time Number of records Time (s)

98 LH* RS Experimental Performance (Preliminary results) n Normal key search –Unaffected by the parity calculus » 0.3 ms per key search n Degraded key search –About 2 ms for the application »1.1 ms (k =4) for the record recovery »1 ms for the client time-out and the coordinator action n Bucket recovery at the spare –0.3 ms per record (k =4)

99 Performance Analysis (RP*) Experimental Environment Six Pentium III 700 MHz Six Pentium III 700 MHz o Windows 2000 – 128 MB RAM extended later to 256 MB RAM – 100 Mb/s Ethernet Messages Messages – 180 bytes : 80 for the header, 100 for the record – Keys are random integers within some interval – Flow Control sliding window of 10 messages Index Index –Capacity of an internal node : 80 index elements –Capacity of a leaf : 100 records

100 Performance Analysis File Creation Bucket capacity : 50.000 records Bucket capacity : 50.000 records 150.000 random inserts by a single client 150.000 random inserts by a single client With flow control (FC) or without With flow control (FC) or without File creation time Average insert time

101 DiscussionDiscussion n Creation time is almost linearly scalable n Flow control is quite expensive –Losses without were negligible n Both schemes perform almost equally well –RP* C slightly better »As one could expect n Insert time 30 faster than for a disk file n Insert time appears bound by the client speed

102 Performance Analysis File Creation File created by 120.000 random inserts by 2 clients File created by 120.000 random inserts by 2 clients Without flow control Without flow control File creation by two clients : total time and per insert Comparative file creation time by one or two clients

103 DiscussionDiscussion n Performance improves n Insert times appear bound by a server speed n More clients would not improve performance of a server

104 Performance Analysis Split Time Split times versus bucket capacity

105 DiscusionDiscusion n About linear scalability in function of bucket size n Larger buckets are more efficient n Splitting is very efficient –Reaching as little as 40  s per record

106 Performance Analysis Key Search A single client sends 100.000 successful random search requests A single client sends 100.000 successful random search requests Flow control : the client sends at most 10 requests without reply Flow control : the client sends at most 10 requests without reply Search time (ms)

107 Performance Analysis Key Search Total search time Search time per record A single client sends 100.000 successful random search requests A single client sends 100.000 successful random search requests Flow control : the client sends at most 10 requests without reply Flow control : the client sends at most 10 requests without reply

108DiscussionDiscussion n Single search time about 30 times faster than for a disk file – 350  s per search n Search throughput more than 65 times faster than that of a disk file – 145  s per search n RP* N appears again surprisingly efficient with respect RP*c for more buckets

109 Performance Analysis Range Query Deterministic termination Deterministic termination Parallel scan of the entire file with all the 100.000 records sent to the client Parallel scan of the entire file with all the 100.000 records sent to the client Range query total time Range query time per record

110 DiscussionDiscussion n Range search appears also very efficient –Reaching 10  s per record delivered n More servers should further improve the efficiency –Curves do not become flat yet

111 Range Query Parallel Execution Strategies Study of MM. Tsangou (Master Th.) & Prof. Samba (U. Dakar) # servers Response Time (ms) Sc. 3 ; 1 server at the timeSc. 1,2 ; all servers together Sc. 1 ; single connection request per server

112 File Size Limits n Bucket capacity : 751K records, 196 MB n Number of inserts : 3M n Flow control (FC) is necessary to limit the input queue at each server

113 File Size Limits n Bucket capacity : 751K records, 196 MB n Number of inserts : 3M n GA : Global Average; MA : Moving Average

114 Related Works Comparative Analysis Suspicious

115 AMOS-SDDSAMOS-SDDS n Benchmark data –Table Pers (SS#, Name, City) –Size 20.000 to 300.000 tuples –50 Cities –Random distribution n Benchmark queries –Join : SS# and Name of persons in the same city »Nested loop or Local index –Count & Join : Count couples in the same city »To determine the result transfer time to the client –Count (*) Pers, Max (SS#) from Pers n Measures –Scale-up & Speed-up –Comparison to AMOS-II alone

116 Join : best time per tuple 20.000 tuples in Pers 14.4 2.4 1.6 AMOS-II alone 13.5 Nested loop 2.25 Index lookup 3,990,070 tuples produced

117 Join & Count: best time per tuple I-strategy 20.000 tuples in Pers AMOS-II alone 13.5 Nested loop 2.25 Index lookup 3,990,070 tuples produced 1.8 1.0

118 Join : Speed-up I-strategy 20.000 tuples in Pers AMOS-II alone 13.5 Nested loop 2.25 Index lookup 3,990,070 tuples produced

119 Count : Speed-up 100.000 tuples in Pers E-strategy wins AMOS-II alone: 280 ms 341

120 Join Scale-up Performance n The file scales to 300.000 tuples n Spreading from 1 to 15 AMOS-SDDS Servers –Transparently for the application ! n 3 servers per machine –The poor men configuration has only 5 server machines n Results are extrapolated to 1 server per machine –Basically, the CPU component of the elapsed time time is divided by 3

121 Join : Elapsed Time Scale-up AMOS-SDDS: I-Strategy with Index Lookup Join

122 File size20 00060 000100 000160 000200 000240 000300 000 # SDDS servers1358101215 Result size 3 990 07035 970 60099 951 670255 924 270399 906 670575 889 600899 865 000 Actual Join Time (s)613016841817255539015564 Join & Count (s)51185335986127020222624 Join w. extrp. (s) 613016841324192025533815 Join & Count.w. extrp. (s) 51185335498640681881 AMOS-II (s)46430120131064824697910 933 Index creation time (s) 32987230360520819 Join : Elapsed Time Scale-up

123 Join : Time per Tuple Scale-up Join w. Count flat ! Better scalability than any current P-DBMS could provide

124 SD-AMOS : File Creation Flat & unexpectedly fast insert time Global Avg. Moving Avg. Insert Time (ms) # servers # inserts b = 4.000

125 SD-AMOS : Large File Creation Flat & fast insert time remains Insert Time (ms) Global Avg. Moving Avg. # servers # inserts b = 40.000

126 SD-AMOS : Large File Search (time per record) File of 300K RecordsClient max process. speed is reached

127 SD-AMOS : Very Large File Creation Bucket size 750 K records Max file size 3M records Record size : 100 B

128ConclusionConclusion SDDS-2000 : a prototype SDDS manager for Windows multicomputer SDDS-2000 : a prototype SDDS manager for Windows multicomputer  Several variants of LH* and RP*  High-availability  Scalable distributed database query processing  AMOS-SDDS & SD-AMOS

129 ConclusionConclusion Experimental performance of SDDS schemes appears in line with the expectations Experimental performance of SDDS schemes appears in line with the expectations  Record search & insert times in the range of a fraction of a millisecond About 30 to 100 times faster than a disk file access performance About 30 to 100 times faster than a disk file access performance  About ideal (linear) scalability Including the query processing Including the query processing Results prove the overall efficiency of SDDS-2000 system Results prove the overall efficiency of SDDS-2000 system

130 Current & Future Work n SDDS-2000 Implementation n High-Availability through RS-Codes –CERIA & U. Santa Clara (Prof. Th. Schwarz) n Disk-based High-Availability SDDSs –CERIA & IBM-Almaden (J. Menon) n Parallel Queries –U. Dakar (Prof. S. Ndiaye) & CERIA n Concurrency & Transactions –U. Dakar (Prof. T. Seck) n Overall Performance Analysis n SD-AMOS & SD-DBMS in general –CERIA & U. Uppsala (Prof. T. Risch) & U. Dakar n SD-SQL-Server ? n Extent dependent basically on available funding

131 CreditsCredits n SDDS-2000 Implementation –CERIA Ph. D Students »F. Bennour (Now Post-Doc) (SDDS-2000 ; LH*) »A. Wan Diene, Y. Ndiaye (SDDS-2000 RP* & AMOS-SDDS & SD-AMOS) »R. Moussa (RS-Parity Subsystem) –Master Student Thesis »At CERIA and coop. Universities –See Ceria Web page ceria.dauphine.fr n Partial Support for SDDS-2000 research –HPL, IBM Research, Microsoft Research

132 Problems & Exercices n Install SDDS-2000 and experiment with the interactive appl. Comment the experience on a few pages. n Create your favorite appl. Using the.h interfaces provided with the package. n Comment on on a few pages on LH*rs goal and way to work, on the basis of the Sigmod paper and the WDAS-2002 paper by Rim Moussa. n Comment on a few pages on the strategies for a scalable distributed hash joins according to D. Schneider & al paper. Your own ? n Can you propose how to deal with Theta-joins ? n You should split a table under one SQL Server into two tables on two SQL Servers. You wish to generate RP* like range partitioning on the key attribute(s). You wish to use as much as possible standard SQL queries –How can you find the record with median (middle) key ? –What are the catalog tables where you can find the table size, the associated check constraints, indexes, triggers, stored procedures… to move to the new node –How will you move the records that should leave to the new node. n Idem for a stored funciton under Amos n Idem for Oracle n Idem for DB2

133 ENDEND Thank you for your attention Witold Litwin litwin@dauphine.fr wlitwin@cs.berkeley.edu

1 SDDS-2000 : A Prototype System for Scalable Distributed Data Structures on Windows 2000 SDDS-2000 : A Prototype System for Scalable Distributed Data.

Similar presentations

Presentation on theme: "1 SDDS-2000 : A Prototype System for Scalable Distributed Data Structures on Windows 2000 SDDS-2000 : A Prototype System for Scalable Distributed Data."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 SDDS-2000 : A Prototype System for Scalable Distributed Data Structures on Windows 2000 SDDS-2000 : A Prototype System for Scalable Distributed Data.

Similar presentations

Presentation on theme: "1 SDDS-2000 : A Prototype System for Scalable Distributed Data Structures on Windows 2000 SDDS-2000 : A Prototype System for Scalable Distributed Data."— Presentation transcript:

Similar presentations

About project

Feedback