Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Slides:



Advertisements
Similar presentations
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Advertisements

TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Lecture 17 I/O Optimization. Disk Organization Tracks: concentric rings around disk surface Sectors: arc of track, minimum unit of transfer Cylinder:
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
PROMISE: Peer-to-Peer Media Streaming Using CollectCast Presented by: Randeep Singh Gakhal CMPT 886, July 2004.
COMPUTING ON JETSTREAM: STREAMING ANALYTICS IN THE WIDE-AREA Matvey Arye Joint work with: Ari Rabkin, Sid Sen, Mike Freedman and Vivek Pai.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
Chapter 1 Overview of Databases and Transaction Processing.
Hadoop & Cheetah. Key words Cluster  data center – Lots of machines thousands Node  a server in a data center – Commodity device fails very easily Slot.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Performance Tuning Cubes and Queries in Analysis Services 2008 Chris Webb
Making Every Bit Count in Wide Area Analytics Ariel Rabkin Joint work with: Matvey Arye, Siddhartha Sen, Michael J. Freedman, and Vivek Pai 1.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Session-8 Data Management for Decision Support
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI Feb 2012 Presentation.
Overview – Chapter 11 SQL 710 Overview of Replication
Ariel Rabkin Princeton University Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area Work done with.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
1 Distributed Databases BUAD/American University Distributed Databases.
John D. McGregor Class 4 – Initial decomposition
Introduction to z/OS Basics © 2006 IBM Corporation Chapter 7: Batch processing and the Job Entry Subsystem (JES) Batch processing and JES.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
DATABASE OPERATORS AND SOLID STATE DRIVES Geetali Tyagi ( ) Mahima Malik ( ) Shrey Gupta ( ) Vedanshi Kataria ( )
Chapter 1 Overview of Databases and Transaction Processing.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Magnetic Disks Have cylinders, sectors platters, tracks, heads virtual and real disk blocks (x cylinders, y heads, z sectors per track) Relatively slow,
HERON.
Module 12: I/O Systems I/O hardware Application I/O Interface
Hadoop Aakash Kag What Why How 1.
Hadoop.
Introduction to Distributed Platforms
Chapter 10 Data Analytics for IoT
Steve Ko Computer Sciences and Engineering University at Buffalo
Steve Ko Computer Sciences and Engineering University at Buffalo
Hadoop MapReduce Framework
Andy Wang COP 5611 Advanced Operating Systems
Alternative system models
Parallel Programming By J. H. Wang May 2, 2017.
CSE451 I/O Systems and the Full I/O Path Autumn 2002
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
COS 518: Advanced Computer Systems Lecture 11 Michael Freedman
湖南大学-信息科学与工程学院-计算机与科学系
Operating System Concepts
13: I/O Systems I/O hardwared Application I/O Interface
CS703 - Advanced Operating Systems
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
Chapter 2: Operating-System Structures
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Parallel Programming in C with MPI and OpenMP
Chapter 2: Operating-System Structures
COS 518: Advanced Computer Systems Lecture 12 Michael Freedman
Module 12: I/O Systems I/O hardwared Application I/O Interface
Presentation transcript:

Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare

Awesome CDN service for websites big & small Millions of request a second peak 24 data centers across the globe

Data Analysis – Customer facing analytics – System health monitoring – Security monitoring => Need global view

Functionality Calculate aggregate functions on fast, big data Aggregate across nodes (across datacenters) Data stored at different time granularities

Storm & Rainbird

Basic Design Requirements 1.Reliability – Exactly-once semantics 2.High Data Volumes

Our Environment Source Storage Source Stream processing

Basic Programming Model Op Storage Op Storage Op Storage Op

Existing Systems S4 The reliability model is not consistent Storm Exactly-once-semantics requires batching Reliability only inside the stream processing system What if a source goes down? The DB?

The Need For End-to-End Reliability Source Stream Proccessing Storage When source comes back up where does it start sending data from? If using something like Storm, need additional reliability mechanisms

The Takeaway Need end-to-end reliability - Or - Multiple reliability mechanisms Reliability of stream processing not enough

Design of Reliability Avoid queuing because destination has failed – Rely on storage at the edges – Minimize replication Minimize edge cases No specialized hardware

Big Design Decisions End-to-end reliability Only transient operator state

Recovering From Failure Source I am starting a stream with you. What have you already seen from me? Storage I’ve seen Source Okie dokie. Here is all the new stuff.

Tracking what you have seen Store identifier for all items Store one identifier for highest number

Tracking what you have seen Store identifier for all items The answer to what have I seen is huge Requires lots of storage for IDs Store one identifier for highest number Parallel processing of ordered data is tricky

Tension between Parallelization High Volume Data Ordering Reliability

Go Makes This Easier Language from Google written for concurrency Goroutine I run code Goroutine I run code Goroutine I run code Goroutine I run code Channels send data between Go routines Most synchronization is done by passing data

Goroutine Scheduling Channels are FIFO queues that have a maximum capacity So goroutine can be in 4 states: 1.Executing Code 2.Waiting for a thread to execute code 3.Blocking to receive data from a channel 4.Blocking to send data to a channel Scheduler optimizes assignment of goroutines to threads.

Efficient Ordering Under The Hood Source distributes items to workers in a specific order Reading from each worker: 1.Read one tuple off the count channel. Assign count to X 2.Read X tuples of the result channel Count of output tuples for each input Actual result tuples Input tuple

Intuition behind design Multiple output channels allows each worker to write independently. Count channel tells reader how many tuples to expect. Does not block except when result needed to satisfy ordering. Judicious blocking allows scheduler to use blocking as a signal for which worker to schedule.

Throughput does not suffer

The Big Picture - Reliability Source provide monotonically increasing ids – per stream Stream processor preserves ordering – per source-stream Central DB maintains mapping of: Source-stream => highest ID processed

Functionality of Stream Processor Compression, serialization Partitioning for distributed sinks Bucketing – Take individual records and construct aggregates Across source nodes Across time – adjustable granularity Batching – Submitting many records at once to the DB Bucketing and batching all done with transient state

Where to get the code Stable Bleeding Edge

Data Model Streaming OLAP-like cubes Useful summaries of high-volume data

Cube Dimensions 01:01:00 foo.com/rfoo.com/qbar.com/nbar.com/m 01:01:01 Time URL 27

Cube Aggregates (Count, Max) bar.com/m 01:01:01 28

Updating A Cube Request #1 bar.com/m 01:01:00 Latency: 90 ms Request #1 bar.com/m 01:01:00 Latency: 90 ms (0,0) 01:01:00 foo.com/rfoo.com/qbar.com/nbar.com/m 01:01:01 Time URL 29

Map Request To Cell Request #1 bar.com/m 01:01:00 Latency: 90 ms Request #1 bar.com/m 01:01:00 Latency: 90 ms (0,0) 01:01:00 foo.com/rfoo.com/qbar.com/nbar.com/m 01:01:01 Time URL 30

Update The Aggregates Request #1 bar.com/m 01:01:00 Latency: 90 ms Request #1 bar.com/m 01:01:00 Latency: 90 ms (1,90) (0,0) 01:01:00 foo.com/rfoo.com/qbar.com/nbar.com/m 01:01:01 Time URL 31

Update In-Place Request #2 bar.com/m 01:01:00 Latency: 50 ms Request #2 bar.com/m 01:01:00 Latency: 50 ms (2,90) (0,0) 01:01:00 foo.com/rfoo.com/qbar.com/nbar.com/m 01:01:01 Time URL 32

Cube Slice 01:01:00 foo.com/rfoo.com/qbar.com/nbar.com/m 01:01:01 Time URL … 01:01:58 01:01:59 Slice 33

Cube Rollup 01:01:0 0 foo.com/rfoo.com/qbar.com/nbar.com/m Time URL URL: bar.com/* Time: 01:01:01 URL: foo.com/* Time: 01:01:01 34

Rich Structure (5,90) (3,75) (8,199) (21,40) D A C 01:01:59 01:01:00 foo.com/rfoo.com/r foo.com/qfoo.com/q bar.com/nbar.com/n bar.com/m 01:01:01 01:01:58 B … E CellURLTime Abar.com/*01:01:01 B* Cfoo.com/*01:01:01 Dfoo.com/r01:01:* Efoo.com/*01:01:* 35

Key Property 2 types of rollups 1.Across Dimensions 2.Across Sources We use the same aggregation function for both Powerful conceptual constraints Semantic properties preserved when changing the granularity of reporting

Where to get the code Stable Bleeding Edge