Aubrey L. Tatarowicz #1, Carlo Curino #2, Evan P. C. Jones #3, Sam Madden #4 # Massachusetts Institute of Technology, USA.

Aubrey L. Tatarowicz #1, Carlo Curino #2, Evan P. C. Jones #3, Sam Madden #4 # Massachusetts Institute of Technology, USA

 Scale a distributed OLTP DBMS--- Partition horizontally partition  To be effective--Strategy Minimize the number of nodes involved  The most common strategy horizontally partition the database using hash partition or range partition

 Many to many relationships are hard to partition For social networking, simple partitioning schemes create a large fraction of distributed queries/transactions.  While queries on the partitioning attribute go to a single partition, queries on other attributes must be broadcast to all partitions.

 Use a fine-grained partitioning strategy. Related individual tuples are co-located together in the same partition  Partition index. It specifies which partitions contain tuples matching a given attribute value, without partitioning the data by those attributes.

 To solve both the fined-grained partitioning and partition index problems, we introduce lookup tables  Lookup tables map from a key to a set of partition ids that store the corresponding tuples.  Lookup tables are small enough that they can be cached in memory on database query routers, even for large databases.

 Lookup tables must be stored compactly in RAM To avoid adding additional disk accesses when processing queries.  Efficiently maintaining lookup tables in the presence of updates

 Interact with the database using JDBC driver.  Consist of two layers Backend databases(plus an agent) Query routers Contain Lookup table and partitioning metadata.

 The routers are given the network address for each backend,the schema, and the partitioning metadata when they are started.  Lookup tables are stored in memory and consulted to determine which backends should run each query.  Query routers send queries to backend databases.  Result in excellent performance, providing 40% to 300% better throughput.

 LOOKUP TABLE QUERY PROCESSING  START-UP, UPDATES AND RECOVERY  STORAGE ALTERNATIVES  EXPERIMENTAL EVALUATION  CONCLUSION

 When receives a query,firstly,Lookup tables tell which backends store the data that is referenced. If (queries referencing a column that uses a lookup table ) the router consults its local copy of the lookup table and determines where to send the query; If(multiple backends are referenced) rewrite the query and a separate query is sent to each backend;

 We will use two tables  Users id status  Followers source destination source and destination are two foreign keys to users

 Users want to get the status for all users they are following.  R=SELECT destination FROM followers WHERE source=x  SELECT * FROM users WHERE id IN (R)

 Traditional hash partitioning Partition the users table by id Partition the followers table by source  Problem: The second query accesses several partitions Hard to scale this system by adding more machines

idstatus 10 20 31 41 sourcedestination 12 14 21 31 32 34 12 14 21 31 32 34 idstatus 10 20 30 40

idstatus 10 31 20 41 sourcedestination 12 14 31 32 34 21 41

 Defining Lookup Tables  Query Planning

 CREATE TABLE users ( id int,..., PRIMARY KEY (id), PARTITION BY lookup(id) ON (part1, part2) DEFAULT NEW ON hash(id));  This says that users is partitioned with a lookup table on id.  ALTER TABLE users SET PARTITION=part2 WHERE id=27;  Place one or more users into a given partition

 ALTER TABLE followers PARTITION BY lookup(source) SAME AS users;  Specify that the followers table should be partitioned in the same way as the users table  It means each followers tuple f should be placed on the same partition as the users tuple u where u.id = f.source  CREATE SECONDARY LOOKUP l_a ON users(name); Define partition indexes. This specifies that a lookup table l_a should be maintained.

 Each router maintains a copy of the partitioning metadata.This metadata describes how each table is partitioned or replicated.  The router parses each query to extract the tables and attributes that are being accessed  The goal is to push the execution of queries to the backend nodes, involving as few of them as possible.

 When starting, each router know the network address of each backend. This is part of the static configuration data.  The router then attempts to contact other routers to copy their lookup table.  As a last resort, it contacts each backend agent to obtain the latest copy of each lookup table subset.

 To ensure correctness, the copy of the lookup table at each router is considered a cache that may not be up to date.  To keep the routers up to date, backends piggyback changes with query responses.  This is only a performance optimization, and is not required for correctness.

 Lookup tables are usually unique.  If(tuples are found) The existence of a tuple on a backend indicates that the query was routed correctly;  Else Stale lookup table entry No lookup table entry

 Lookup Tables must be stored in RAM to avoid imposing performance penalty.  Two implementations of lookup tables Hash tables ◦ Hash tables can support any data type and sparse key spaces, and hence are a good default choice. Arrays Arrays work better for dense key-spaces. Arrays are not always an option because they require mostly-dense, countable key spaces.

 Lookup Table Reuse Reuse the same lookup table in the router for tables with location dependencies At the cost of a slightly more complex handling of metadata.  Compressed Tables Trade CPU time to reduce space. Specifically, we used Huffman encoding.

 Hybrid Partitioning Combine the fine-grained partitioning of a lookup table with the space-efficient representation of range or hash partitioning. The idea is to place “important” tuples in specific partitions, while treating the remaining tuples with a default policy To derive a hybrid partitioning, we use decision tree classifiers

 Partial Lookup Tables Trade memory performance by maintaining only the recently used part of a lookup table. It is effective if the data is accessed with skew. The basic approach is to allow each router to maintain its own least-recently used lookup table over part of the data. If the id being accessed is not found in the table, the router falls back to a broadcast query, and adds the mapping to its current table.

 Backend nodes run Linux and MySQL.  The backend servers are older single-CPU, single-disk systems.  Query router is written in Java, and communicates with the backends using MySQL’s protocol via JDBC.  All machines were connected to the same gigabit Ethernet switch.  The network was not a bottleneck.

 Partition them using both lookup tables and hash/range partitioning.  Include approximately 1.5 million entries in each of the revision and text tables. And occupies 36 GB of space in MySQL.  Extracted the most common operation: fetch the current version of an article.

 R=select pid from page where title=“world”  Z=select rid,page, text_id from R,revision where revision.page=R.pid and revision.rid=R.latest  select text.tid from text where text.tid=Z.text_id

 Partition page on title, revision on rid, and text on tid.  The first query will be efficient and go to a single partition---1message.  The join must be executed in two steps across all partitions (fetch page by pid which queries all partitions, then fetch revision where rid = p.latest)—k+1messages.  Finally, text can be fetched directly from one partition.—1message

 The read-only distributed transaction can be committed with another broadcast to all partitions (Because of the 2PC read-only optimization. A distributed transaction that accesses more than one partition and must use two-phase commit).---k messages  Total: 2K+3 messages

 Partition page on pid, revision on page, and text on tid.  The first query goes everywhere—K messages  The join is pushed down to a single partition.—1message  The final query goes to a single partition.— 1message  This results in a total of 2k + 2 messages.

 Hash or range partition page on title.— 1message  Build a lookup table on page.pid.  Co-locate revisions together with their corresponding page by partitioning revision using the lookup table. -2messages  Create a lookup table on revision.text_id and  partitioning on text.tid = revision.text_id- 1message  A total of 4 messages.

 Lookup tables are mostly dense integers (76 to 92% dense), we use an array implementation of lookup tables.  We reuse lookup tables when there are location dependencies. In this case, there is one a lookup table shared for both page.pid and revision.page, and a second table for revision.text_id and text.tid.  We can store the 360 million tuples in the complete Wikipedia snapshot in less than 200MB of memory, which easily fits in RAM.

 The primary benefit of lookup tables is to reduce the number of distributed queries and transactions.  Examining the cost of distributed queries. Scale the number of backends. Increase the percentage of distributed queries

 The throughput with 1,4,8 backends  As the percentage of distributed queries increases, the throughput decreases.  The reason is that the communication overhead for each query is a significant cost

 Partitioned it across 1, 2, 4 and 8 backends.  With both hash partitioning and lookup tables.

 Shared nothing distributed databases typically only support hash or range partitioning of the data.  Lookup tables can be used with all these systems, in conjunction with their existing support for partitioning.

 We use lookup tables as a type of secondary index for tables that are accessed via more than one attribute.  Bubba proposed Extended Range Declustering, where a secondary index on the non-partitioned attributes is created and distributed across the database nodes.  Our approach simply stores this secondary data in memory across all query routers, avoiding an additional round trip.

 Previous work has argued that hard to partition applications containing many-to- many relationships can be partitioned effectively by allowing tuples to be placed in partitions based on their relationships.  Schism uses graph partitioning algorithms to derive the partitioning. It does not discuss how to use the fine-grained partitioning it produces

 Using lookup tables, application developers can implementany partitioning scheme they desire, and can also create partition indexes that make it possible to efficiently route queries to just the partitions they need to access.  The article presented a set of techniques to efficiently store and compress lookup tables, and to manage updates, inserts, and deletes to them.

 With these applications, we showed that lookup tables with an appropriate partitioning scheme can achieve from 40% to 300% better performance than either hash or range partitioning and shows greater potential for further scale-out.

Aubrey L. Tatarowicz #1, Carlo Curino #2, Evan P. C. Jones #3, Sam Madden #4 # Massachusetts Institute of Technology, USA.

Similar presentations

Presentation on theme: "Aubrey L. Tatarowicz #1, Carlo Curino #2, Evan P. C. Jones #3, Sam Madden #4 # Massachusetts Institute of Technology, USA."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Aubrey L. Tatarowicz #1, Carlo Curino #2, Evan P. C. Jones #3, Sam Madden #4 # Massachusetts Institute of Technology, USA.

Similar presentations

Presentation on theme: "Aubrey L. Tatarowicz #1, Carlo Curino #2, Evan P. C. Jones #3, Sam Madden #4 # Massachusetts Institute of Technology, USA."— Presentation transcript:

Similar presentations

About project

Feedback