Data Freeway : Scaling Out to Realtime Eric Hwang, Sam Rash

Slides:



Advertisements
Similar presentations
Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.
Advertisements

Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash Speaker : Haiping Wang
More on File Management
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
By: Chris Hayes. Facebook Today, Facebook is the most commonly used social networking site for people to connect with one another online. People of all.
Real-time Analytics at Facebook Zheng Shao 10/18/2011.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
The Hadoop Distributed File System, by Dhyuba Borthakur and Related Work Presented by Mohit Goenka.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Introduction to Hadoop 趨勢科技研發實驗室. Copyright Trend Micro Inc. Outline Introduction to Hadoop project HDFS (Hadoop Distributed File System) overview.
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
Map Reduce and Hadoop S. Sudarshan, IIT Bombay
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Introduction to Hadoop Owen O’Malley Yahoo!, Grid Team
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
HDFS (Hadoop Distributed File System) Taejoong Chung, MMLAB.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Data Disaster Recovery Planning Greg Fibiger 1/7/2016.
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
BIG DATA/ Hadoop Interview Questions.
Gorilla: A Fast, Scalable, In-Memory Time Series Database
Microsoft Ignite /28/2017 6:07 PM
eBay Marketplaces Ming Ma June 27 th, 2013.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Hadoop Aakash Kag What Why How 1.
Hadoop.
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
Hadoop Clusters Tess Fulkerson.
Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016
CS6604 Digital Libraries IDEAL Webpages Presented by
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Hadoop Basics.
The Google File System (GFS)
The Google File System (GFS)
The Google File System (GFS)
Charles Tappert Seidenberg School of CSIS, Pace University
The Google File System (GFS)
THE GOOGLE FILE SYSTEM.
Lecture 4: File-System Interface
Pig Hive HBase Zookeeper
Presentation transcript:

Data Freeway : Scaling Out to Realtime Eric Hwang, Sam Rash

Agenda »Data at Facebook »Data Freeway System Overview »Realtime Requirements »Realtime Components › Calligraphus/Scribe › HDFS use case and modifications › Calligraphus: a Zookeeper use case › ptail › Puma »Future Work

Big Data, Big Applications / Data at Facebook »Lots of data › more than 500 million active users › 50 million users update their statuses at least once each day › More than 1 billion photos uploaded each month › More than 1 billion pieces of content (web links, news stories, blog posts, notes, photos, etc.) shared each week › Data rate: over 7 GB / second »Numerous products can leverage the data › Revenue related: Ads Targeting › Product/User Growth related: AYML, PYMK, etc › Engineering/Operation related: Automatic Debugging › Puma: streaming queries

Data Freeway System Diagram

Realtime Requirements › Scalability: GBytes/second › Reliability: No single point of failure › Data loss SLA: 0.01% loss due to hardware: means at most 1 out of 10,000 machines can lose data › Delay of less than 10 sec for 99% of data Typically we see 2s › Easy to use: as simple as ‘tail –f /var/log/my-log-file’

Scribe Scalable distributed logging framework Very easy to use: scribe_log(string category, string message) Mechanics: Runs on every machine at Facebook Built on top of Thrift Collect the log data into a bunch of destinations Buffer data on local disk if network is down History: 2007: Started at Facebook 2008 Oct: Open-sourced

Calligraphus »What › Scribe-compatible server written in Java › emphasis on modular, testable code-base, and performance »Why? › extract simpler design from existing Scribe architecture › cleaner integration with Hadoop ecosystem HDFS, Zookeeper, HBase, Hive »History › In production since November 2010 › Zookeeper integration since March 2011

HDFS : a different use case »message hub › add concurrent reader support and sync › writers + concurrent readers a form of pub/sub model

HDFS : add Sync »Sync › implement in 0.20 (HDFS-200) partial chunks are flushed blocks are persisted › provides durability › lowers write-to-read latency

HDFS : Concurrent Reads Overview »Without changes, stock Hadoop 0.20 does not allow access to the block being written »Need to read the block being written for realtime apps in order to achieve < 10s latency

HDFS : Concurrent Reads Implementation 1. DFSClient asks Namenode for blocks and locations 2. DFSClient asks Datanode for length of block being written 3. opens last block

HDFS : Checksum Problem »Issue: data and checksum updates are not atomic for last chunk »0.20-append fix: › detect when data is out of sync with checksum using a visible length › recompute checksum on the fly »0.22 fix › last chunk data and checksum kept in memory for reads

Calligraphus: Log Writer Calligraphus Servers HDFS Scribe categories Server Category 1 Category 2 Category 3 How to persist to HDFS?

Calligraphus (Simple) Calligraphus Servers HDFS Scribe categories Number of categories Number of servers Total number of directories x = Server Category 1 Category 2 Category 3

Calligraphus Servers HDFS Scribe categories Number of categories Total number of directories = Category 1 Category 2 Category 3 Router Writer Calligraphus (Stream Consolidation) ZooKeeper

ZooKeeper: Distributed Map »Design › ZooKeeper paths as tasks (e.g. /root/ / ) › Cannonical ZooKeeper leader elections under each bucket for bucket ownership › Independent load management – leaders can release tasks › Reader-side caches › Frequent sync with policy db A A B B C C D D Root

ZooKeeper: Distributed Map »Real-time Properties › Highly available › No centralized control › Fast mapping lookups › Quick failover for writer failures › Adapts to new categories and changing throughput

Distributed Map: Performance Summary »Bootstrap (~3000 categories) › Full election participation in 30 seconds › Identify all election winners in 5-10 seconds › Stable mapping converges in about three minutes »Election or failure response usually <1 second › Worst case bounded in tens of seconds

Canonical Realtime Application »Examples › Realtime search indexing › Site integrity: spam detection › Streaming metrics

Parallel Tailer Why? Access data in 10 seconds or less Data stream interface Command-line tool to tail the log Easy to use: ptail -f cat1 Support checkpoint: ptail -cp XXX cat1

Canonical Realtime ptail Application

Puma Overview »realtime analytics platform »metrics › count, sum, unique count, average, percentile »uses ptail checkpointing for accurate calculations in the case of failure »Puma nodes are sharded by keys in the input stream »HBase for persistence

Puma Write Path

Puma Read Path

Summary - Data Freeway »Highlights: › Scalable: 4G-5G Bytes/Second › Reliable: No single-point of failure; < 0.01% data loss with hardware failures › Realtime: delay < 10 sec (typically 2s) »Open-Source › Scribe, HDFS › Calligraphus/Continuous Copier/Loader/ptail (pending) »Applications › Realtime Analytics › Search/Feed › Spam Detection/Ads Click Prediction (in the future)

Future Work »Puma › Enhance functionality: add application-level transactions on Hbase › Streaming SQL interface »Seekable Compression format › for large categories, the files are MB › need an efficient way to get to the end of the stream › Simple Seekable Format container with compressed/uncompressed stream offsets contains data segments which are independent virtual files

Fin »Questions?