Aubrey L. Tatarowicz #1, Carlo Curino #2, Evan P. C. Jones #3, Sam Madden #4 # Massachusetts Institute of Technology, USA.

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Megastore: Providing Scalable, Highly Available Storage for Interactive Services. Presented by: Hanan Hamdan Supervised by: Dr. Amer Badarneh 1.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Chapter 10: Designing Databases
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
Chapter 13 (Web): Distributed Databases
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Technical Architectures
Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
CSCI 4550/8556 Computer Networks Comer, Chapter 19: Binding Protocol Addresses (ARP)
Physical Database Monitoring and Tuning the Operational System.
© 2007 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.1 Computer Networks and Internets with Internet Applications, 4e By Douglas.
Wide-area cooperative storage with CFS
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Distributed Databases
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
Cloud Computing Lecture Column Store – alternative organization for big relational data.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
CSC271 Database Systems Lecture # 30.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Lecture 9 Methodology – Physical Database Design for Relational Databases.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
1 Design Issues in XML Databases Ref: Designing XML Databases by Mark Graves.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
IP1 The Underlying Technologies. What is inside the Internet? Or What are the key underlying technologies that make it work so successfully? –Packet Switching.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
C-Store: Data Model and Data Organization Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May 17, 2010.
Methodology – Physical Database Design for Relational Databases.
SQL Server 2005 Implementation and Maintenance Chapter 12: Achieving High Availability Through Replication.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Bits Eugene Wu, Carlo Curino, Sam Madden
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections )
Data Indexing in Peer- to-Peer DHT Networks Garces-Erice, P.A.Felber, E.W.Biersack, G.Urvoy-Keller, K.W.Ross ICDCS 2004.
REED : Robust, Efficient Filtering and Event Detection in Sensor Network Daniel J. Abadi, Samuel Madden, Wolfgang Lindner Proceedings of the 31st VLDB.
Handling Data Skew in Parallel Joins in Shared-Nothing Systems Yu Xu, Pekka Kostamaa, XinZhou (Teradata) Liang Chen (University of California) SIGMOD’08.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Practical Database Design and Tuning
Module 11: File Structure
COMP 430 Intro. to Database Systems
Hash-Based Indexes Chapter 11
Physical Database Design for Relational Databases Step 3 – Step 8
Database Performance Tuning and Query Optimization
Chapter 11: Indexing and Hashing
Practical Database Design and Tuning
Hash-Based Indexes Chapter 10
Indexing and Hashing Basic Concepts Ordered Indices
Implementing an OpenFlow Switch on the NetFPGA platform
Chapter 11 Database Performance Tuning and Query Optimization
Chapter 11: Indexing and Hashing
The Gamma Database Machine Project
Presentation transcript:

Aubrey L. Tatarowicz #1, Carlo Curino #2, Evan P. C. Jones #3, Sam Madden #4 # Massachusetts Institute of Technology, USA

 Scale a distributed OLTP DBMS--- Partition horizontally partition  To be effective--Strategy Minimize the number of nodes involved  The most common strategy horizontally partition the database using hash partition or range partition

 Many to many relationships are hard to partition For social networking, simple partitioning schemes create a large fraction of distributed queries/transactions.  While queries on the partitioning attribute go to a single partition, queries on other attributes must be broadcast to all partitions.

 Use a fine-grained partitioning strategy. Related individual tuples are co-located together in the same partition  Partition index. It specifies which partitions contain tuples matching a given attribute value, without partitioning the data by those attributes.

 To solve both the fined-grained partitioning and partition index problems, we introduce lookup tables  Lookup tables map from a key to a set of partition ids that store the corresponding tuples.  Lookup tables are small enough that they can be cached in memory on database query routers, even for large databases.

 Lookup tables must be stored compactly in RAM To avoid adding additional disk accesses when processing queries.  Efficiently maintaining lookup tables in the presence of updates

 Interact with the database using JDBC driver.  Consist of two layers Backend databases(plus an agent) Query routers Contain Lookup table and partitioning metadata.

 The routers are given the network address for each backend,the schema, and the partitioning metadata when they are started.  Lookup tables are stored in memory and consulted to determine which backends should run each query.  Query routers send queries to backend databases.  Result in excellent performance, providing 40% to 300% better throughput.

 LOOKUP TABLE QUERY PROCESSING  START-UP, UPDATES AND RECOVERY  STORAGE ALTERNATIVES  EXPERIMENTAL EVALUATION  CONCLUSION

 When receives a query,firstly,Lookup tables tell which backends store the data that is referenced. If (queries referencing a column that uses a lookup table ) the router consults its local copy of the lookup table and determines where to send the query; If(multiple backends are referenced) rewrite the query and a separate query is sent to each backend;

 We will use two tables  Users id status  Followers source destination source and destination are two foreign keys to users

 Users want to get the status for all users they are following.  R=SELECT destination FROM followers WHERE source=x  SELECT * FROM users WHERE id IN (R)

 Traditional hash partitioning Partition the users table by id Partition the followers table by source  Problem: The second query accesses several partitions Hard to scale this system by adding more machines

idstatus sourcedestination idstatus

idstatus sourcedestination

 Defining Lookup Tables  Query Planning

 CREATE TABLE users ( id int,..., PRIMARY KEY (id), PARTITION BY lookup(id) ON (part1, part2) DEFAULT NEW ON hash(id));  This says that users is partitioned with a lookup table on id.  ALTER TABLE users SET PARTITION=part2 WHERE id=27;  Place one or more users into a given partition

 ALTER TABLE followers PARTITION BY lookup(source) SAME AS users;  Specify that the followers table should be partitioned in the same way as the users table  It means each followers tuple f should be placed on the same partition as the users tuple u where u.id = f.source  CREATE SECONDARY LOOKUP l_a ON users(name); Define partition indexes. This specifies that a lookup table l_a should be maintained.

 Each router maintains a copy of the partitioning metadata.This metadata describes how each table is partitioned or replicated.  The router parses each query to extract the tables and attributes that are being accessed  The goal is to push the execution of queries to the backend nodes, involving as few of them as possible.

 When starting, each router know the network address of each backend. This is part of the static configuration data.  The router then attempts to contact other routers to copy their lookup table.  As a last resort, it contacts each backend agent to obtain the latest copy of each lookup table subset.

 To ensure correctness, the copy of the lookup table at each router is considered a cache that may not be up to date.  To keep the routers up to date, backends piggyback changes with query responses.  This is only a performance optimization, and is not required for correctness.

 Lookup tables are usually unique.  If(tuples are found) The existence of a tuple on a backend indicates that the query was routed correctly;  Else Stale lookup table entry No lookup table entry

 Lookup Tables must be stored in RAM to avoid imposing performance penalty.  Two implementations of lookup tables Hash tables ◦ Hash tables can support any data type and sparse key spaces, and hence are a good default choice. Arrays Arrays work better for dense key-spaces. Arrays are not always an option because they require mostly-dense, countable key spaces.

 Lookup Table Reuse Reuse the same lookup table in the router for tables with location dependencies At the cost of a slightly more complex handling of metadata.  Compressed Tables Trade CPU time to reduce space. Specifically, we used Huffman encoding.

 Hybrid Partitioning Combine the fine-grained partitioning of a lookup table with the space-efficient representation of range or hash partitioning. The idea is to place “important” tuples in specific partitions, while treating the remaining tuples with a default policy To derive a hybrid partitioning, we use decision tree classifiers

 Partial Lookup Tables Trade memory performance by maintaining only the recently used part of a lookup table. It is effective if the data is accessed with skew. The basic approach is to allow each router to maintain its own least-recently used lookup table over part of the data. If the id being accessed is not found in the table, the router falls back to a broadcast query, and adds the mapping to its current table.

 Backend nodes run Linux and MySQL.  The backend servers are older single-CPU, single-disk systems.  Query router is written in Java, and communicates with the backends using MySQL’s protocol via JDBC.  All machines were connected to the same gigabit Ethernet switch.  The network was not a bottleneck.

 Partition them using both lookup tables and hash/range partitioning.  Include approximately 1.5 million entries in each of the revision and text tables. And occupies 36 GB of space in MySQL.  Extracted the most common operation: fetch the current version of an article.

 R=select pid from page where title=“world”  Z=select rid,page, text_id from R,revision where revision.page=R.pid and revision.rid=R.latest  select text.tid from text where text.tid=Z.text_id

 Partition page on title, revision on rid, and text on tid.  The first query will be efficient and go to a single partition---1message.  The join must be executed in two steps across all partitions (fetch page by pid which queries all partitions, then fetch revision where rid = p.latest)—k+1messages.  Finally, text can be fetched directly from one partition.—1message

 The read-only distributed transaction can be committed with another broadcast to all partitions (Because of the 2PC read-only optimization. A distributed transaction that accesses more than one partition and must use two-phase commit).---k messages  Total: 2K+3 messages

 Partition page on pid, revision on page, and text on tid.  The first query goes everywhere—K messages  The join is pushed down to a single partition.—1message  The final query goes to a single partition.— 1message  This results in a total of 2k + 2 messages.

 Hash or range partition page on title.— 1message  Build a lookup table on page.pid.  Co-locate revisions together with their corresponding page by partitioning revision using the lookup table. -2messages  Create a lookup table on revision.text_id and  partitioning on text.tid = revision.text_id- 1message  A total of 4 messages.

 Lookup tables are mostly dense integers (76 to 92% dense), we use an array implementation of lookup tables.  We reuse lookup tables when there are location dependencies. In this case, there is one a lookup table shared for both page.pid and revision.page, and a second table for revision.text_id and text.tid.  We can store the 360 million tuples in the complete Wikipedia snapshot in less than 200MB of memory, which easily fits in RAM.

 The primary benefit of lookup tables is to reduce the number of distributed queries and transactions.  Examining the cost of distributed queries. Scale the number of backends. Increase the percentage of distributed queries

 The throughput with 1,4,8 backends  As the percentage of distributed queries increases, the throughput decreases.  The reason is that the communication overhead for each query is a significant cost

 Partitioned it across 1, 2, 4 and 8 backends.  With both hash partitioning and lookup tables.

 Shared nothing distributed databases typically only support hash or range partitioning of the data.  Lookup tables can be used with all these systems, in conjunction with their existing support for partitioning.

 We use lookup tables as a type of secondary index for tables that are accessed via more than one attribute.  Bubba proposed Extended Range Declustering, where a secondary index on the non-partitioned attributes is created and distributed across the database nodes.  Our approach simply stores this secondary data in memory across all query routers, avoiding an additional round trip.

 Previous work has argued that hard to partition applications containing many-to- many relationships can be partitioned effectively by allowing tuples to be placed in partitions based on their relationships.  Schism uses graph partitioning algorithms to derive the partitioning. It does not discuss how to use the fine-grained partitioning it produces

 Using lookup tables, application developers can implementany partitioning scheme they desire, and can also create partition indexes that make it possible to efficiently route queries to just the partitions they need to access.  The article presented a set of techniques to efficiently store and compress lookup tables, and to manage updates, inserts, and deletes to them.

 With these applications, we showed that lookup tables with an appropriate partitioning scheme can achieve from 40% to 300% better performance than either hash or range partitioning and shows greater potential for further scale-out.