1 Large-scale Incremental Processing Using Distributed Transactions and Notifications Written By Daniel Peng and Frank Dabek Presented By Michael Over.

Slides:



Advertisements
Similar presentations
Chen Zhang Hans De Sterck University of Waterloo
Advertisements

Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.
Megastore: Providing Scalable, Highly Available Storage for Interactive Services. Presented by: Hanan Hamdan Supervised by: Dr. Amer Badarneh 1.
Introduction to Data Center Computing Derek Murray October 2010.
Omid Efficient Transaction Management and Incremental Processing for HBase Copyright © 2013 Yahoo! All rights reserved. No reproduction or distribution.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Transaction.
Overview Distributed vs. decentralized Why distributed databases
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
BUSINESS DRIVEN TECHNOLOGY
Distributed storage for structured data
Bigtable: A Distributed Storage System for Structured Data
BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
David Gibbs and Govardhan Tanniru Georgia State University Department of Computer Science P.O. Box 3965 Atlanta, GA
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Database Management System Lecture 2 Introduction to Database management.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
NOVA: CONTINUOUS PIG/HADOOP WORKFLOWS. storage & processing scalable file system e.g. HDFS distributed sorting & hashing e.g. Map-Reduce dataflow programming.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California,
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Bigtable: A Distributed Storage System for Structured Data 1.
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI Feb 2012 Presentation.
Data in the Cloud – I Parallel Databases The Google File System Parallel File Systems.
Storing Organizational Information - Databases
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Server to Server Communication Redis as an enabler Orion Free
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
SQL Server 2005 Implementation and Maintenance Chapter 12: Achieving High Availability Through Replication.
Module 6: Data Protection. Overview What does Data Protection include? Protecting data from unauthorized users and authorized users who are trying to.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.
CS 540 Database Management Systems
Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,
Bigtable: A Distributed Storage System for Structured Data
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 24: GFS.
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
© 2016 A. Haeberlen, Z. Ives CIS 455/555: Internet and Web Systems 1 University of Pennsylvania Bigtable and Percolator April 25, 2016.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Gorilla: A Fast, Scalable, In-Memory Time Series Database
Percolator: Incrementally Indexing the Web OSDI’10.
Bigtable A Distributed Storage System for Structured Data.
Google Cloud computing techniques (Lecture 03) 18th Jan 20161Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
Percolator Data Management in the Cloud
DBMS & TPS Barbara Russell MBA 624.
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
Spark Presentation.
Google Filesystem Some slides taken from Alan Sussman.
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
The Anatomy of a Large-Scale Hypertextual Web Search Engine
AWS Cloud Computing Masaki.
Presentation transcript:

1 Large-scale Incremental Processing Using Distributed Transactions and Notifications Written By Daniel Peng and Frank Dabek Presented By Michael Over

2 Abstract Task: Updating an index of the web as documents are crawled Task: Updating an index of the web as documents are crawled Requires continuously transforming a large repository of existing documents as new documents arrive Requires continuously transforming a large repository of existing documents as new documents arrive One example of a class of data processing tasks that transform a large repository of data via small, independent mutations One example of a class of data processing tasks that transform a large repository of data via small, independent mutations

3 Abstract These tasks lie in a gap between the capabilities of existing infrastructure These tasks lie in a gap between the capabilities of existing infrastructure Databases – Databases – MapReduce – MapReduce – Percolator Percolator A system for incrementally processing updates to a large data set A system for incrementally processing updates to a large data set Deployed to create the Google web search index Deployed to create the Google web search index Now processes the same number of documents per day but reduced the average age of documents in Google search results by 50% Now processes the same number of documents per day but reduced the average age of documents in Google search results by 50% Storage/throughput requirements Create large batches for efficiency

4 Outline Introduction Introduction Design Design Bigtable Bigtable Transactions Transactions Timestamps Timestamps Notifications Notifications Evaluation Evaluation Related Work Related Work Conclusion and Future Work Conclusion and Future Work

5 Task Task: Build an index of the web that can be used to answer search queries. Task: Build an index of the web that can be used to answer search queries. Approach: Approach: Crawl every page on the web and process them Crawl every page on the web and process them Maintain a set of invariants – same content, link inversion Maintain a set of invariants – same content, link inversion Could be done using a series of MapReduce operations Could be done using a series of MapReduce operations

6 Challenge Challenge: Update the index after recrawling some small portion of the web. Challenge: Update the index after recrawling some small portion of the web. Could we run MapReduce over just the recrawled pages? Could we run MapReduce over just the recrawled pages? No, there are links between the new pages and the rest of the web No, there are links between the new pages and the rest of the web Could we run MapReduce over the entire repository? Could we run MapReduce over the entire repository? Yes, this is how Google’s web search index was produced prior to this work Yes, this is how Google’s web search index was produced prior to this work What are some effects of this? What are some effects of this?

7 Challenge What about a DBMS? What about a DBMS? Cannot handle the sheer volume of data Cannot handle the sheer volume of data What about distributed storage systems like Bigtable? What about distributed storage systems like Bigtable? Scalable but does not provide tools to maintain data invariants in the face of concurrent updates Scalable but does not provide tools to maintain data invariants in the face of concurrent updates Ideally, the data processing system for the task of maintaining the web search index would be optimized for incremental processing and able to maintain invariants Ideally, the data processing system for the task of maintaining the web search index would be optimized for incremental processing and able to maintain invariants

8 Percolator Provides the user with random access to a multiple petabyte repository Provides the user with random access to a multiple petabyte repository Process documents individually Process documents individually Many concurrent threads  ACID compliant transactions Many concurrent threads  ACID compliant transactions Observers – Invoked when a user-specified column changes Observers – Invoked when a user-specified column changes Designed specifically for incremental processing Designed specifically for incremental processing

9 Percolator Google uses Percolator to prepare web pages for inclusion in the live web search index Google uses Percolator to prepare web pages for inclusion in the live web search index Can now process documents as they are crawled Can now process documents as they are crawled Reducing the average document processing latency by a factor of 100 Reducing the average document processing latency by a factor of 100 Reducing the average age of a document appearing in a search result by nearly 50% Reducing the average age of a document appearing in a search result by nearly 50%

10 Outline Introduction Introduction Design Design Bigtable Bigtable Transactions Transactions Timestamps Timestamps Notifications Notifications Evaluation Evaluation Related Work Related Work Conclusion and Future Work Conclusion and Future Work

11 Design Two main abstractions for performing incremental processing at large scale: Two main abstractions for performing incremental processing at large scale: ACID compliant transactions over a random access repository ACID compliant transactions over a random access repository Observers – a way to organize an incremental computation Observers – a way to organize an incremental computation A Percolator system consists of three binaries: A Percolator system consists of three binaries: A Percolator worker A Percolator worker A Bigtable tablet server A Bigtable tablet server A GFS chunkserver A GFS chunkserver

12 Outline Introduction Introduction Design Design Bigtable Bigtable Transactions Transactions Timestamps Timestamps Notifications Notifications Evaluation Evaluation Related Work Related Work Conclusion and Future Work Conclusion and Future Work

13 Bigtable Overview Percolator is built on top of the Bigtable distributed storage system Percolator is built on top of the Bigtable distributed storage system Multi-dimensional sorted map Multi-dimensional sorted map Keys: (row, column, timestamp) tuples Keys: (row, column, timestamp) tuples Provides lookup and update operations on each row Provides lookup and update operations on each row Row transactions enable atomic read-modify-write operations on individual rows Row transactions enable atomic read-modify-write operations on individual rows Runs reliably on a large number of unreliable machines handling petabytes of data Runs reliably on a large number of unreliable machines handling petabytes of data

14 Bigtable Overview A running BigTable consists of a collection of tablet servers A running BigTable consists of a collection of tablet servers Each tablet server is responsible for serving several tablets Each tablet server is responsible for serving several tablets Percolator maintains the gist of Bigtable’s interface Percolator maintains the gist of Bigtable’s interface Percolator’s API closely resembles Bigtable’s Percolator’s API closely resembles Bigtable’s Challenge: Provide the additional features of multirow transactions and the observer framework Challenge: Provide the additional features of multirow transactions and the observer framework

15 Outline Introduction Introduction Design Design BigTable BigTable Transactions Transactions Timestamps Timestamps Notifications Notifications Evaluation Evaluation Related Work Related Work Conclusion and Future Work Conclusion and Future Work

16 Transactions Percolator provides cross-row, cross-table transactions with ACID snapshot-isolation semantics Percolator provides cross-row, cross-table transactions with ACID snapshot-isolation semantics Stores multiple versions of each data item using Bigtable’s timestamp dimension Stores multiple versions of each data item using Bigtable’s timestamp dimension Provides snapshot isolation, which protects against write-write conflicts Provides snapshot isolation, which protects against write-write conflicts Percolator must explicitly maintain locks Percolator must explicitly maintain locks Example of transaction involving bank accounts Example of transaction involving bank accounts

17 Transactions 8: 7 7: 6: 5 5: 8: 7: 6: 5: 8: 7: $6 6: 5: $2 8: 7 7: 6: 5 5: 8: 7: 6: 5: 8: 7: $6 6: 5: $10 Bal:WriteBal:LockBal:DataKey Joe Bob Key Bob Key Bob Key Joe Bob I am Primary Bob.bal

18 Outline Introduction Introduction Design Design BigTable BigTable Transactions Transactions Timestamps Timestamps Notifications Notifications Evaluation Evaluation Related Work Related Work Conclusion and Future Work Conclusion and Future Work

19 Timestamps Server hands out timestamps in strictly increasing order Every transaction requires contacting the timestamp oracle twice, so this server must scale well For failure recovery, the timestamp oracle needs to write the highest allocated timestamp to disk before responding to a request. For efficiency, it batches writes, and "pre-allocates" a whole block of timestamps. How many timestamps do you think Google’s timestamp oracle serves per second from 1 machine? Answer: 2,000,000 (2 million) per second

20 Outline Introduction Introduction Design Design BigTable BigTable Transactions Transactions Timestamps Timestamps Notifications Notifications Evaluation Evaluation Related Work Related Work Conclusion and Future Work Conclusion and Future Work

21 Notifications Transactions let the user mutate the table while maintaining invariants, but users also need a way to trigger and run the transactions. Transactions let the user mutate the table while maintaining invariants, but users also need a way to trigger and run the transactions. In Percolator, the user writes “observers” to be triggered by changes to the table In Percolator, the user writes “observers” to be triggered by changes to the table Percolator invokes the function after data is written to one of the columns registered by an observer Percolator invokes the function after data is written to one of the columns registered by an observer

22 Notifications Percolator applications are structured as a series of observers Percolator applications are structured as a series of observers Notifications are similar to database triggers or events in active database but they cannot maintain data invariants Notifications are similar to database triggers or events in active database but they cannot maintain data invariants Percolator needs to efficiently find dirty cells with observers that need to be run Percolator needs to efficiently find dirty cells with observers that need to be run To do so, it maintains a special “notify” Bigtable column, containing an entry for each dirty cell To do so, it maintains a special “notify” Bigtable column, containing an entry for each dirty cell

23 Outline Introduction Introduction Design Design BigTable BigTable Transactions Transactions Timestamps Timestamps Notifications Notifications Evaluation Evaluation Related Work Related Work Conclusion and Future Work Conclusion and Future Work

24 Evaluation Percolator lies somewhere in the performance space between MapReduce and DBMSs Percolator lies somewhere in the performance space between MapReduce and DBMSs Converting from MapReduce – Percolator was built to create Google’s large “base” index, a task previously done by MapReduce Converting from MapReduce – Percolator was built to create Google’s large “base” index, a task previously done by MapReduce In MapReduce, each day several billions of documents were crawled and fed through a series of 100 MapReduces, resulting in an index which answered user queries In MapReduce, each day several billions of documents were crawled and fed through a series of 100 MapReduces, resulting in an index which answered user queries

25 Evaluation Using MapReduce, each document spent 2-3 days being indexed before it could be returned as a search result Using MapReduce, each document spent 2-3 days being indexed before it could be returned as a search result Percolator crawls the same number of documents, but the document is sent through Percolator as it is crawled Percolator crawls the same number of documents, but the document is sent through Percolator as it is crawled The immediately advantage is a reduction in latency (the median document moves through over 100x faster than with MapReduce) The immediately advantage is a reduction in latency (the median document moves through over 100x faster than with MapReduce)

26 Evaluation Percolator freed Google from needing to process the entire repository each time documents were indexed Percolator freed Google from needing to process the entire repository each time documents were indexed Therefore, they can increase the size of the repository (and have, now 3x it’s previous size) Therefore, they can increase the size of the repository (and have, now 3x it’s previous size) Percolator is easier to operate – there are fewer moving parts: just tablet servers, Percolator workers, and chunkservers Percolator is easier to operate – there are fewer moving parts: just tablet servers, Percolator workers, and chunkservers

27 Evaluation Question: How do you think Percolator performs in comparison to MapReduce if: Question: How do you think Percolator performs in comparison to MapReduce if: 1% of the repository needs to be updated per hour? 1% of the repository needs to be updated per hour? 30% of the repository needs to be updated per hour? 30% of the repository needs to be updated per hour? 60% of the repository needs to be updated per hour? 60% of the repository needs to be updated per hour? 90% of the repository needs to be updated per hour? 90% of the repository needs to be updated per hour?

28 Evaluation

29 Evaluation Comparing Percolator versus “raw” Bigtable Comparing Percolator versus “raw” Bigtable Percolator introduces overhead relative to Bigtable, a factor of four overhead on writes due to 4 round trips: Percolator introduces overhead relative to Bigtable, a factor of four overhead on writes due to 4 round trips: Percolator -> Timestamp Server -> Percolator -> Tentative Write -> Percolator -> Timestamp Server -> Percolator -> Commit -> Percolator Percolator -> Timestamp Server -> Percolator -> Tentative Write -> Percolator -> Timestamp Server -> Percolator -> Commit -> Percolator

30 Outline Introduction Introduction Design Design BigTable BigTable Transactions Transactions Timestamps Timestamps Notifications Notifications Evaluation Evaluation Related Work Related Work Conclusion and Future Work Conclusion and Future Work

31 Related Work Batch processing systems like MapReduce are well suited for efficiently transforming or analyzing an entire repository Batch processing systems like MapReduce are well suited for efficiently transforming or analyzing an entire repository DBMSs satisfy many of the requirements of an incremental system but does not scale like Percolator DBMSs satisfy many of the requirements of an incremental system but does not scale like Percolator Bigtable is a scalable, distributed, and fault tolerant storage system, but is not designed to be a data transformation system Bigtable is a scalable, distributed, and fault tolerant storage system, but is not designed to be a data transformation system CloudTPS builds an ACID-compliant datastore on top of distributed storage but is intended to be a backend for a website (stronger focus on latency and partition tolerance than Percolator) CloudTPS builds an ACID-compliant datastore on top of distributed storage but is intended to be a backend for a website (stronger focus on latency and partition tolerance than Percolator)

32 Outline Introduction Introduction Design Design BigTable BigTable Transactions Transactions Timestamps Timestamps Notifications Notifications Evaluation Evaluation Related Work Related Work Conclusion and Future Work Conclusion and Future Work

33 Conclusion and Future Work Percolator has been deployed to produce Google’s websearch index since April, 2010 Percolator has been deployed to produce Google’s websearch index since April, 2010 It’s goals were reducing the latency of indexing a single document with an acceptable increase in resource usage It’s goals were reducing the latency of indexing a single document with an acceptable increase in resource usage Scaling the architecture costs a very significant 30-fold overhead compared to traditional database architectures Scaling the architecture costs a very significant 30-fold overhead compared to traditional database architectures How much of this is fundamental to distributed storage systems and how much could be optimized away? How much of this is fundamental to distributed storage systems and how much could be optimized away?