COS 518: Advanced Computer Systems Lecture 12 Michael Freedman

Slides:

Advertisements

Similar presentations

MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO

Advertisements

Apache Storm A scalable distributed & fault tolerant real time computation system ( Free & Open Source ) Shyam Rajendran 16-Feb-15.

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.

Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html

Lecture 18-1 Lecture 17-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Hilfi Alkaff November 5, 2013 Lecture 21 Stream Processing.

Discretized Streams: Fault-Tolerant Streaming Computation at Scale Wenting Wang 1.

Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.

Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,

Dunja Mladenić Marko Grobelnik Jožef Stefan Institute, Slovenia.

1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)

Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Process Modeling SYSTEMS ANALYSIS AND DESIGN, 6 TH EDITION DENNIS, WIXOM, AND ROTH © 2015 JOHN WILEY & SONS. ALL RIGHTS RESERVED. 1 Roberta M. Roth.

Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.

MapReduce and Data Management Based on slides from Jimmy Lin’s lecture slides ( (licensed.

Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.

Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University

Microsoft Ignite /28/2017 6:07 PM

Big thanks to everyone!.

Distributed, real-time actionable insights on high-volume data streams

MapReduce “MapReduce allows us to stop thinking about fault tolerance.” Cathy O’Neil & Rachel Schutt, 2013.

Hadoop Aakash Kag What Why How 1.

Streaming Analytics & CEP Two sides of the same coin?

The Future of Apache Flink®

Distributed Programming in “Big Data” Systems Pramod Bhatotia wp

International Conference on Data Engineering (ICDE 2016)

Building realtime BI systems with HDFS and Kudu

The Stream Model Sliding Windows Counting 1’s

Original Slides by Nathan Twitter Shyam Nutanix

Lesson Objectives Aims Key Words

Scaling Apache Flink® to very large State

Distributed Transactions and Spanner

Akshun Gupta, Karthik Bala

Big Data II: Stream Processing and Coordination

Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.

Apache Flink and Stateful Stream Processing

ETL Architecture for Real-Time BI

MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner

Benchmarking Modern Distributed Stream Processing Systems

COS 518: Advanced Computer Systems Lecture 11 Michael Freedman

Ministry of Higher Education

Capital One Architecture Team and DataTorrent

The Basics of Apache Hadoop

湖南大学-信息科学与工程学院-计算机与科学系

COS 518: Advanced Computer Systems Lecture 11 Daniel Suo

StreamApprox Approximate Stream Analytics in Apache Spark

February 26th – Map/Reduce

Cse 344 May 4th – Map/Reduce.

EECS 498 Introduction to Distributed Systems Fall 2017

MapReduce Algorithm Design Adapted from Jimmy Lin’s slides.

Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform

Big Data Overview.

Slides prepared by Samkit

Architecture for Real-Time ETL

Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)

The Dataflow Model.

COS 418: Distributed Systems Lecture 19 Wyatt Lloyd

Chi: A Scalable & Programmable Control Plane for Distributed Stream Processing Luo Mai, Kai Zeng, Rahul Potharaju, Le Xu, Steve Suh, Shivaram Venkataraman,

Introduction to Stream Computing and Reservoir Sampling

Introduction to MapReduce

with Raul Castro Fernandez* Matteo Migliavacca+ and Peter Pietzuch*

COS 418: Distributed Systems Lecture 16 Wyatt Lloyd

Big Data II: Stream Processing and Coordination

Data science laboratory (DSLAB)

Streaming data processing using Spark

COS 518: Distributed Systems Lecture 11 Mike Freedman

Caching 50.5* + Apache Kafka

SESSION 1 ANALYTICS AND TRENDS KIMBLE AUTUMN USER GROUP 2018

Distributed Snapshots

Presentation transcript:

COS 518: Advanced Computer Systems Lecture 12 Michael Freedman Streaming COS 518: Advanced Computer Systems Lecture 12 Michael Freedman

What is streaming? Fast data! Fast processing! Lots of data!

Simple stream processing Single node Read data from socket Process Write output

Examples: Stateless conversion CtoF Convert Celsius temperature to Fahrenheit Stateless operation: emit (input * 9 / 5) + 32

Examples: Stateless filtering Function can filter inputs if (input > threshold) { emit input }

Examples: Stateful conversion EWMA Compute EWMA of Fahrenheit temperature new_temp = ⍺ * ( CtoF(input) ) + (1- ⍺) * last_temp last_temp = new_temp emit new_temp

Examples: Aggregation (stateful) Avg E.g., Average value per window Window can be # elements (10) or time (1s) Windows can be disjoint (every 5s) Windows can be “tumbling” (5s window every 1s)

Enter “BIG DATA”

The challenge of stream processing Large amounts of data to process in real time Examples Social network trends (#trending) Intrusion detection systems (networks, datacenters) Sensors: Detect earthquakes by correlating vibrations of millions of smartphones Fraud detection Visa: 2000 txn / sec on average, peak ~47,000 / sec

Scale “up” Tuple-by-Tuple Micro-batch input ← read if (input > threshold) { emit input } inputs ← read out = [] for input in inputs { if (input > threshold) { out.append(input) } emit out

Scale “up” Tuple-by-Tuple Micro-batch Lower Latency Lower Throughput Higher Latency Higher Throughput Why? Each read/write is an system call into kernel. More cycles performing kernel/application transitions (context switches), less actually spent processing data.

Scale “out”

Stateless operations: trivially parallelized C F C F C F

State complicates parallelization Aggregations: Need to join results across parallel computations CtoF Avg Filter

State complicates parallelization Aggregations: Need to join results across parallel computations CtoF Sum Cnt Filter Avg CtoF Sum Cnt Filter CtoF Sum Cnt Filter

Parallelization complicates fault-tolerance Aggregations: Need to join results across parallel computations CtoF Sum Cnt Filter Avg CtoF Sum Cnt Filter - blocks - CtoF Sum Cnt Filter

Can parallelize joins Sum / key Compute trending keywords E.g., portion tweets Sum / key portion tweets Sum / key Sum / key Sort top-k portion tweets - blocks - Sum / key

Can parallelize joins Hash partitioned tweets portion tweets Sum / key Sum / key Sort top-k portion tweets Sum / key Sum / key Sort top-k portion tweets Sum / key Sum / key Sort top-k

Parallelization complicates fault-tolerance Hash partitioned tweets portion tweets Sum / key Sum / key Sort top-k portion tweets Sum / key Sum / key Sort top-k portion tweets Sum / key Sum / key Sort top-k

A Tale of Four Frameworks Record acknowledgement (Storm) Micro-batches (Spark Streaming, Storm Trident) Transactional updates (Google Cloud dataflow) Distributed snapshots (Flink)

Fault tolerance via record acknowledgement (Apache Storm -- at least once semantics) Goal: Ensure each input ”fully processed” Approach: DAG / tree edge tracking Record edges created as tuple processed Wait for all edges to be marked done Inform source of data when complete; otherwise, they resend tuple. Challenge: “at least once” means: Operators can receive tuple > once Replay can be out-of-order ... application needs to handle.

Fault Tolerance via distributed snapshots (Apache Flink) Rather than log each record for each operator, take system-wide snapshots Snapshotting: Determine consistent snapshot of system-wide state (includes in-flight records and operator state) Store state in durable storage Recover: Restoring latest snapshot from durable storage Rewinding the stream source to snapshot point, and replay inputs Algorithm is based on Chandy-Lamport distributed snapshots, but also captures stream topology

Fault Tolerance via distributed snapshots (Apache Flink) Use markers (barriers) in the input data stream to tell downstream operators when to consistently snapshot

But another big issue: Streaming = unbounded data (Batch = bounded data)

Three major challenges Consistency: historically, streaming systems were created to decrease latency and made many sacrifices (e.g., at-most-once processing) Throughput vs. latency: typically a trade-off Time: new challenge We’ve covered consistency in a lot of detail, let’s investigate time

Our lives used to be easy…

New Concerns Once data is unbounded, new concerns: Sufficient capacity so processing speed >= arrival velocity (on average) Support for handling out-of-order data Easiest thing to do:

Windowing by processing time is great Easy to implement and verify correctness Great for applications like filtering or monitoring

What if care about when events happen? If we associate event times, then items could now come out-of-order! (why?)

Time creates new wounds

This would be nice

But not the case, so we need tools Windows: how should we group together data? Watermarks: how can we mark when the last piece of data in some window has arrived? Triggers: how can we initiate an early result? Accumulators: what do we do with the results (correct, modified, or retracted)? All topics covered in next week’s readings!