COS 518: Advanced Computer Systems Lecture 11 Daniel Suo

Slides:



Advertisements
Similar presentations
1 SEDA: An Architecture for Well- Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.
Advertisements

Managed Availability works by implementing Probes, Monitors and Responders:  The Probe is the component that performs the simple test. It doesn’t care.
Embedded and Real Time Systems Lecture #2 David Andrews
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.
The Financial Survival Guide to Retirement Retirement Basics.
Workflow Validation Kerstin Kleese van Dam Michela Taufer.
1 Testing Concurrent Programs Why Test?  Eliminate bugs?  Software Engineering vs Computer Science perspectives What properties are we testing for? 
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS ® EasyAPI SafeMode A Usage Scenario.
AMOST Experimental Comparison of Code-Based and Model-Based Test Prioritization Bogdan Korel Computer Science Department Illinois Institute of Technology.
Transitioning to EPIC The Role of the Pharmacy Buyer.
LoadComplete Testing Tool. LoadComplete Testing Tool.
Test Isolation and Mocking Technion – Institute of Technology Author: Gal Lalouche © 1 Author: Gal Lalouche - Technion 2015 ©
Recruiting, Orienting and Managing NHD Judges. Step 1: Recruiting The Right Time The Right Places The Right Information The Right Follow-Up.
Test and Review chapter State the differences between archive and back-up data. Answer: Archive data is a copy of data which is no longer in regular.
Software Status  Last Software Workshop u Held at Fermilab just before Christmas. u Completed reconstruction testing: s MICE trackers and KEK tracker.
1 Legacy Code From Feathers, Ch 2 Steve Chenoweth, RHIT Right – Your basic Legacy, from Subaru, starting at $ 20,295, 24 city, 32 highway.
Network Computing Laboratory A programming framework for Stream Synthesizing Service.
Cpr E 308 Spring 2005 Process Scheduling Basic Question: Which process goes next? Personal Computers –Few processes, interactive, low response time Batch.
Introduction to Science.  Science: a system of knowledge based on facts or principles  Science is observing, studying, and experimenting to find the.
Name: Dr. Cathal Doyle Twitter: Website: cathaldoyle.comcathaldoyle.com.
Course Introduction David Ferry, Chris Gill Department of Computer Science and Engineering Washington University, St. Louis MO 1E81.
November 1, 2004 ElizabethGallas -- D0 Luminosity Db 1 D0 Luminosity Database: Checklist for Production Elizabeth Gallas Fermilab Computing Division /
1 WP2.2 Progress Update Deliverable Inertial System Platform Tyndall.
Week 1 Lecture 1 Slide 1 CP2028 Visual Basic Programming 2 “The VB Team” Copyright © University of Wolverhampton CP2028 Visual Basic Programming 2 v Week.
Commissioning and run coordination CMS week Commissioning plenary 28-February 2007 DQM and monitoring workshop MandateGoals.
Dato Confidential 1 Danny Bickson Co-Founder. Dato Confidential 2 Successful apps in 2015 must be intelligent Machine learning key to next-gen apps Recommenders.
Canadian Bioinformatics Workshops
Zach Tatlock / Winter 2016 CSE 331 Software Design and Implementation Lecture 24 Wrap Up.
Chapter 14 Part 1: Core Game Mechanics By Nolan Driessen.
Introduction to CSCI 1311 Dr. Mark C. Lewis
NASA’s Dangerous Mathematics: Black Holes and Dividing by Zero
Information Systems in Organizations 2
Preparing for Automation: Expanding Your Coverage
Streaming Analytics & CEP Two sides of the same coin?
The Future of Apache Flink®
Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics
CSE 374 Programming Concepts & Tools
Scaling Apache Flink® to very large State
Information Systems in Organizations 2
Data stream as an unbounded table
Algorithm Analysis CSE 2011 Winter September 2018.
Recall The Team Skills Analyzing the Problem (with 5 steps)
COS 518: Advanced Computer Systems Lecture 11 Michael Freedman
Information Systems in Organizations 2
Bojian Zheng CSCD70 Spring 2018
Information Systems in Organizations 2
COS 518: Advanced Computer Systems Lecture 12 Mike Freedman
This is the world that God made
Information Systems in Organizations 2
Information Systems in Organizations 2
Understanding prompts
Slides prepared by Samkit
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
Information Systems in Organizations 2
The Dataflow Model.
Coordinate Operations Standard
Information Systems in Organizations 2
Hatching a Group of Eggs
CSE451 Virtual Memory Paging Autumn 2002
CS 240 – Advanced Programming Concepts
Molecules and Compounds Notes
Chapter 14 Part 1: Core Game Mechanics By Nolan Driessen
Memory System Performance Chapter 3
BASEAL Relationships - 4
Data science laboratory (DSLAB)
COS 518: Advanced Computer Systems Lecture 12 Michael Freedman
CO4301 – Advanced Games Development Week 12 Using Trees
CREDIT BUILDER CARD How Does It Work?
SESSION 1 ANALYTICS AND TRENDS KIMBLE AUTUMN USER GROUP 2018
Information Systems in Organizations 2
Presentation transcript:

COS 518: Advanced Computer Systems Lecture 11 Daniel Suo Streaming COS 518: Advanced Computer Systems Lecture 11 Daniel Suo

What is streaming? Fast data! Fast processing! Lots of data!

Streaming = unbounded data (Batch = bounded data)

Other defns are somewhat misleading We can use batch frameworks for stream processing (how?) Batch frameworks can also handle scenarios historically covered by stream frameworks (e.g., low-latency, approximate)

Three major challenges Consistency: historically, streaming systems were created to decrease latency and made many sacrifices (at-most-processing, anyone?) Throughput vs. latency: typically a trade-off (why?) Time: as we will soon see, streaming introduces some new challenges We’ve covered consistency in a lot of detail, so let’s investigate time.

Our lives used to be easy…

…but if you give a data scientist some data… Once we move to unbounded data, we need new methods to process whether for sake of capacity (not enough machines) or availability (data doesn’t exist yet) Easiest thing to do:

Windowing by processing time is great Easy to implement and verify correctness Great for applications like filtering or monitoring

But what if we care about when events happen? If we associate event times, then items could now come out-of-order! (why?)

Time creates new wounds

This would be nice

But not the case, so we need tools Windows: how should we group together data? Watermarks: how can we mark when the last piece of data in some window has arrived? Triggers: how can we initiate an early result? Accumulators: what do we do with the results (correct, modified, or retracted)? All topics covered in next week’s readings!