COS 518: Advanced Computer Systems Lecture 12 Michael Freedman

Slides:



Advertisements
Similar presentations
MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
Advertisements

Apache Storm A scalable distributed & fault tolerant real time computation system ( Free & Open Source ) Shyam Rajendran 16-Feb-15.
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
Lecture 18-1 Lecture 17-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Hilfi Alkaff November 5, 2013 Lecture 21 Stream Processing.
Discretized Streams: Fault-Tolerant Streaming Computation at Scale Wenting Wang 1.
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Dunja Mladenić Marko Grobelnik Jožef Stefan Institute, Slovenia.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.
Process Modeling SYSTEMS ANALYSIS AND DESIGN, 6 TH EDITION DENNIS, WIXOM, AND ROTH © 2015 JOHN WILEY & SONS. ALL RIGHTS RESERVED. 1 Roberta M. Roth.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
MapReduce and Data Management Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Microsoft Ignite /28/2017 6:07 PM
Big thanks to everyone!.
Distributed, real-time actionable insights on high-volume data streams
MapReduce “MapReduce allows us to stop thinking about fault tolerance.” Cathy O’Neil & Rachel Schutt, 2013.
Hadoop Aakash Kag What Why How 1.
Streaming Analytics & CEP Two sides of the same coin?
Hadoop.
The Future of Apache Flink®
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
International Conference on Data Engineering (ICDE 2016)
Building realtime BI systems with HDFS and Kudu
The Stream Model Sliding Windows Counting 1’s
Original Slides by Nathan Twitter Shyam Nutanix
Lesson Objectives Aims Key Words
Scaling Apache Flink® to very large State
Distributed Transactions and Spanner
Akshun Gupta, Karthik Bala
Big Data II: Stream Processing and Coordination
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
Apache Flink and Stateful Stream Processing
ETL Architecture for Real-Time BI
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Benchmarking Modern Distributed Stream Processing Systems
COS 518: Advanced Computer Systems Lecture 11 Michael Freedman
Ministry of Higher Education
Capital One Architecture Team and DataTorrent
The Basics of Apache Hadoop
湖南大学-信息科学与工程学院-计算机与科学系
COS 518: Advanced Computer Systems Lecture 11 Daniel Suo
StreamApprox Approximate Stream Analytics in Apache Spark
February 26th – Map/Reduce
Cse 344 May 4th – Map/Reduce.
EECS 498 Introduction to Distributed Systems Fall 2017
MapReduce Algorithm Design Adapted from Jimmy Lin’s slides.
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Big Data Overview.
Slides prepared by Samkit
Architecture for Real-Time ETL
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
The Dataflow Model.
COS 418: Distributed Systems Lecture 19 Wyatt Lloyd
Chi: A Scalable & Programmable Control Plane for Distributed Stream Processing Luo Mai, Kai Zeng, Rahul Potharaju, Le Xu, Steve Suh, Shivaram Venkataraman,
Introduction to Stream Computing and Reservoir Sampling
Introduction to MapReduce
with Raul Castro Fernandez* Matteo Migliavacca+ and Peter Pietzuch*
COS 418: Distributed Systems Lecture 16 Wyatt Lloyd
Big Data II: Stream Processing and Coordination
Data science laboratory (DSLAB)
Streaming data processing using Spark
COS 518: Distributed Systems Lecture 11 Mike Freedman
Caching 50.5* + Apache Kafka
SESSION 1 ANALYTICS AND TRENDS KIMBLE AUTUMN USER GROUP 2018
Distributed Snapshots
Presentation transcript:

COS 518: Advanced Computer Systems Lecture 12 Michael Freedman Streaming COS 518: Advanced Computer Systems Lecture 12 Michael Freedman

What is streaming? Fast data! Fast processing! Lots of data!

Simple stream processing Single node Read data from socket Process Write output

Examples: Stateless conversion CtoF Convert Celsius temperature to Fahrenheit Stateless operation: emit (input * 9 / 5) + 32

Examples: Stateless filtering Function can filter inputs if (input > threshold) { emit input }

Examples: Stateful conversion EWMA Compute EWMA of Fahrenheit temperature new_temp = ⍺ * ( CtoF(input) ) + (1- ⍺) * last_temp last_temp = new_temp emit new_temp

Examples: Aggregation (stateful) Avg E.g., Average value per window Window can be # elements (10) or time (1s) Windows can be disjoint (every 5s) Windows can be “tumbling” (5s window every 1s)

Enter “BIG DATA”

The challenge of stream processing Large amounts of data to process in real time Examples Social network trends (#trending) Intrusion detection systems (networks, datacenters) Sensors: Detect earthquakes by correlating vibrations of millions of smartphones Fraud detection Visa: 2000 txn / sec on average, peak ~47,000 / sec

Scale “up” Tuple-by-Tuple Micro-batch input ← read if (input > threshold) { emit input } inputs ← read out = [] for input in inputs { if (input > threshold) { out.append(input) } emit out

Scale “up” Tuple-by-Tuple Micro-batch Lower Latency Lower Throughput Higher Latency Higher Throughput Why? Each read/write is an system call into kernel. More cycles performing kernel/application transitions (context switches), less actually spent processing data.

Scale “out”

Stateless operations: trivially parallelized C F C F C F

State complicates parallelization Aggregations: Need to join results across parallel computations CtoF Avg Filter

State complicates parallelization Aggregations: Need to join results across parallel computations CtoF Sum Cnt Filter Avg CtoF Sum Cnt Filter CtoF Sum Cnt Filter

Parallelization complicates fault-tolerance Aggregations: Need to join results across parallel computations CtoF Sum Cnt Filter Avg CtoF Sum Cnt Filter - blocks - CtoF Sum Cnt Filter

Can parallelize joins Sum / key Compute trending keywords E.g., portion tweets Sum / key portion tweets Sum / key Sum / key Sort top-k portion tweets - blocks - Sum / key

Can parallelize joins Hash partitioned tweets portion tweets Sum / key Sum / key Sort top-k portion tweets Sum / key Sum / key Sort top-k portion tweets Sum / key Sum / key Sort top-k

Parallelization complicates fault-tolerance Hash partitioned tweets portion tweets Sum / key Sum / key Sort top-k portion tweets Sum / key Sum / key Sort top-k portion tweets Sum / key Sum / key Sort top-k

A Tale of Four Frameworks Record acknowledgement (Storm) Micro-batches (Spark Streaming, Storm Trident) Transactional updates (Google Cloud dataflow) Distributed snapshots (Flink)

Fault tolerance via record acknowledgement (Apache Storm -- at least once semantics) Goal: Ensure each input ”fully processed” Approach: DAG / tree edge tracking Record edges created as tuple processed Wait for all edges to be marked done Inform source of data when complete; otherwise, they resend tuple. Challenge: “at least once” means: Operators can receive tuple > once Replay can be out-of-order ... application needs to handle.

Fault Tolerance via distributed snapshots (Apache Flink) Rather than log each record for each operator, take system-wide snapshots Snapshotting: Determine consistent snapshot of system-wide state (includes in-flight records and operator state) Store state in durable storage Recover: Restoring latest snapshot from durable storage Rewinding the stream source to snapshot point, and replay inputs Algorithm is based on Chandy-Lamport distributed snapshots, but also captures stream topology

Fault Tolerance via distributed snapshots (Apache Flink) Use markers (barriers) in the input data stream to tell downstream operators when to consistently snapshot

But another big issue: Streaming = unbounded data (Batch = bounded data)

Three major challenges Consistency: historically, streaming systems were created to decrease latency and made many sacrifices (e.g., at-most-once processing) Throughput vs. latency: typically a trade-off Time: new challenge We’ve covered consistency in a lot of detail, let’s investigate time

Our lives used to be easy…

New Concerns Once data is unbounded, new concerns: Sufficient capacity so processing speed >= arrival velocity (on average) Support for handling out-of-order data Easiest thing to do:

Windowing by processing time is great Easy to implement and verify correctness Great for applications like filtering or monitoring

What if care about when events happen? If we associate event times, then items could now come out-of-order! (why?)

Time creates new wounds

This would be nice

But not the case, so we need tools Windows: how should we group together data? Watermarks: how can we mark when the last piece of data in some window has arrived? Triggers: how can we initiate an early result? Accumulators: what do we do with the results (correct, modified, or retracted)? All topics covered in next week’s readings!