An overview of Data Streaming

Slides:



Advertisements
Similar presentations
Sampling From a Moving Window Over Streaming Data Brian Babcock * Mayur Datar Rajeev Motwani * Speaker Stanford University.
Advertisements

1 11. Streaming Data Management Chapter 18 Current Issues: Streaming Data and Cloud Computing The 3rd edition of the textbook.
The Design of the Borealis Stream Processing Engine Daniel J. Abadi1, Yanif Ahmad2, Magdalena Balazinska1, Ug ̆ur C ̧ etintemel2, Mitch Cherniack3, Jeong-Hyon.
Maintaining Variance over Data Stream Windows Brian Babcock, Mayur Datar, Rajeev Motwani, Liadan O ’ Callaghan, Stanford University ACM Symp. on Principles.
A Data Stream Management System for Network Traffic Management Shivnath Babu Stanford University Lakshminarayanan Subramanian Univ. California, Berkeley.
On-the-Fly Sharing for Streamed Aggregation Sailesh Krishnamurthy, Chung Wu, and Michael J. Franklin Presented by: Joshua Lee and Mingrui Wei Material.
Adaptive Monitoring of Bursty Data Streams Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani.
1 Load Shedding CS240B notes. 22 Load Shedding in a DSMS zDSMS: online response on boundless and bursty data streams—How? zBy using approximations and.
1 Continuous Queries over Data Streams Vitaly Kroivets, Lyan Marina Presentation for The Seminar on Database and Internet The Hebrew University of Jerusalem,
Fjording the Stream: An Architecture for Queries over Streaming Sensor Data Samuel Madden, Michael J. Franklin University of California, Berkeley Proceedings.
An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations Presenter: Liyan Zhang Presentation of ICS
Aurora Proponent Team Wei, Mingrui Liu, Mo Rebuttal Team Joshua M Lee Raghavan, Venkatesh.
Scalable Distributed Stream System Mitch Cherniack, Hari Balakrishnan, Magdalena Balazinska, Don Carney, Uğur Çetintemel, Ying Xing, and Stan Zdonik Proceedings.
Dunja Mladenić Marko Grobelnik Jožef Stefan Institute, Slovenia.
Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.
Chain: Operator Scheduling for Memory Minimization in Data Stream Systems Authors: Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani (Dept.
1 PODS 2002 Motivation. 2 PODS 2002 Data Streams data sets Traditional DBMS – data stored in finite, persistent data sets data streams New Applications.
Streaming Data, Continuous Queries, and Adaptive Dataflow Michael Franklin UC Berkeley NRC June 2002.
The Stanford Data Streams Research Project Profs. Rajeev Motwani & Jennifer Widom And a cast of full- and part-time students: Arvind Arasu, Brian Babcock,
Word Wide Cache Distributed Caching for the Distributed Enterprise.
MONITORING STREAMS: A NEW CLASS OF DATA MANAGEMENT APPLICATIONS DON CARNEY, U Ğ UR ÇETINTEMEL, MITCH CHERNIACK, CHRISTIAN CONVEY, SANGDON LEE, GREG SEIDMAN,
Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.
Torsten Braun, Universität Bern cds.unibe.ch
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Data Stream Systems Reynold Cheng 12 th July, 2002 Based on slides by B. Babcock et.al, “Models and Issues in Data Stream Systems”, PODS’02.
Supported By Hash-Based Algorithms For Operator Load-Balancing In Database Middleware Systems Angel L. Villalain-Garcia – M.S. StudentProf.
Vladimír Smotlacha CESNET Full Packet Monitoring Sensors: Hardware and Software Challenges.
A new model and architecture for data stream management.
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
Wireless Sensor Networks In-Network Relational Databases Jocelyn Botello.
1 Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
A new model and architecture for data stream management.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Aurora: a new model and architecture for data stream management Daniel J. Abadi 1, Don Carney 2, Ugur Cetintemel 2, Mitch Cherniack 1, Christian Convey.
Adaptivity in continuous query systems Luis A. Sotomayor & Zhiguo Xu Professor Carlo Zaniolo CS240B - Spring 2003.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Review Lecture DB A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Review Lecture Databases Phil Gibbons May 1, 2003.
REED : Robust, Efficient Filtering and Event Detection in Sensor Network Daniel J. Abadi, Samuel Madden, Wolfgang Lindner Proceedings of the 31st VLDB.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
Stream Data Operator Ordering  Query Optimization Query Index.
Streaming Semantic Data COMP6215 Semantic Web Technologies Dr Nicholas Gibbins –
Data Streams COMP3017 Advanced Databases Dr Nicholas Gibbins –
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Ricardo Jimenez-Peris Universidad Politecnica de Madrid
COMP3211 Advanced Databases
SONATA: Scalable Streaming Analytics for Network Monitoring
Applying Control Theory to Stream Processing Systems
Distributed database approach,
Space and Speed Tradeoffs in TCAM Hierarchical Packet Classification
Data Streaming in Computer Networking
Load Shedding CS240B notes.
Arvind Arasu, Brian Babcock
SONATA: Query-Driven Network Telemetry
COS 518: Advanced Computer Systems Lecture 11 Michael Freedman
Data Stream Management System (DSMS)
Sonata Query-driven Streaming Network Telemetry
Multimedia Data Stream Management System
Load Shedding in Stream Databases – A Control-Based Approach
Towards an Internet-Scale XML Dissemination Service
Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani
Composite Subscriptions in Content-based Pub/Sub Systems
Sonata: Query-Driven Streaming Network Telemetry
with Raul Castro Fernandez* Matteo Migliavacca+ and Peter Pietzuch*
Load Shedding CS240B notes.
Theppatorn rhujittawiwat
A. Kemper, R. Kuntschke, and B. Stegmaier
An Analysis of Stream Processing Languages
COS 518: Advanced Computer Systems Lecture 12 Michael Freedman
Presentation transcript:

An overview of Data Streaming ICT Support for Adaptiveness and (Cyber)Security in the Smart Grid DAT285B An overview of Data Streaming Vincenzo Gulisano vinmas@chalmers.se (room 5107) Chalmers University of technology 2018-09-19

Agenda Introduction Motivation and System Model Sample Data Streaming application Evolution of Stream Processing Engines Challenges in the context of Smart Grids 2018-09-19

Agenda Introduction Motivation and System Model Sample Data Streaming application Evolution of Stream Processing Engines Challenges in the context of Smart Grids 2018-09-19

Database vs. Data Streaming Problem: James travels by car from A to B His grandmother is worried, she wants to know if he exceeds the speed limit How will the “database” and the “data streaming” grandmothers do this? 2018-09-19

Database vs. Data Streaming Start time Position A End time Position B 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝐴,𝐵) 𝐸𝑛𝑑 𝑡𝑖𝑚𝑒 −𝑆𝑡𝑎𝑟𝑡 𝑇𝑖𝑚𝑒 Database grandmother 2018-09-19

Database vs. Data Streaming First the data, then the query Precise result Need to store information Database grandmother 2018-09-19

Database vs. Data Streaming First the query, then the data “Continuous” result No need to store information Data streaming grandmother 2018-09-19

Agenda Introduction Motivation and System Model Sample Data Streaming application Evolution of Stream Processing Engines Challenges in the context of Smart Grids 2018-09-19

Motivation Applications such as: Require: Sensor networks Network Traffic Analysis Financial tickers Transaction Log Analysis Fraud Detection Require: Continuous processing of data streams Real Time Fashion 2018-09-19

Motivation Store and process is not feasible Data Streaming: high-speed networks, nanoseconds to handle a packet ISP router: gigabytes of headers every hour,… Data Streaming: In memory Bounded resources Efficient one-pass analysis 2018-09-19

Motivation DBMS vs. DSMS 2 Query 3 Query results Query Processing 1 Data Main Memory Main Memory Continuous Query Query results Data Disk 2018-09-19

System Model Data Stream: unbounded sequence of tuples Example: Call Description Record (CDR) Field Caller text Callee Time (secs) int Price (€) double A B 8:00 3 C D 8:20 7 A E 8:35 6 time 2018-09-19

System Model Operators: OP OP Stateless 1 input tuple 1 output tuple Stateful 1+ input tuple(s) 1 output tuple 2018-09-19

System Model Stateless Operators Map: transform tuples schema Example: convert price €  $ Filter: discard / route tuples Example: route depending on price Union: merge multiple streams (sharing the same schema) Example: merge CDRs from different sources Map Filter … Union … 2018-09-19

System Model Stateful Operators Aggregate: compute aggregate functions (group-by) Example: compute avg. call duration Equijoin: match tuples from 2 streams (equality predicate) Example: match CDRs with same price Cartesian Product: merge tuples from 2 streams (arbitrary predicate) Example: match CDRs with prices in the same range Aggregate Equijoin 2 Cartesian Product 2 2018-09-19

System Model Infinite sequence of tuples / bounded memory  windows Example: 1 hour windows time [8:00,9:00) [8:20,9:20) [8:40,9:40) 2018-09-19

System Model Infinite sequence of tuples / bounded memory  windows Example: count tuples - 1 hour windows 8:05 8:15 8:22 8:45 9:05 time [8:00,9:00) [8:20,9:20) Output: 4 2018-09-19

Agenda Introduction Motivation and System Model Sample Data Streaming application Evolution of Stream Processing Engines Challenges in the context of Smart Grids 2018-09-19

Continuous Query Example Fraud detection, High Mobility Spot mobile phone whose space and time distance between two consecutive calls is suspicious Phone X at 12:03 CLONED NUMBER ! Phone X at 12:00 2018-09-19

High Mobility Continuous Query (1/2) Field Caller Callee Time Duration Price Caller_Position Callee_Position Field Caller Callee Time Duration Caller_Position Callee_Position Field Phone number Start time End time Position Field Phone number Start time End time Position Map Field Phone number Start time End time Position Map Union Create separate tuple for caller Input Stream Map Remove fields that are not needed Merge tuples Create separate tuple for callee 2018-09-19

High Mobility Continuous Query (2/2) Field Phone number Start time End time Position Field Phone number Time Speed Field Phone number Time Speed Union Aggregate Filter … Merge tuples For each consecutive pair of calls referring to the same number compute speed Forward tuples with speed exceeding a given threshold Window type: tuple based Window size: 2 Window Advance: 1 2018-09-19

Agenda Introduction Motivation and System Model Sample Data Streaming application Evolution of Stream Processing Engines Challenges in the context of Smart Grids 2018-09-19

Centralized SPEs 2018-09-19

Distributed SPEs Inter-operator parallelism 2018-09-19

Parallel SPEs Intra-operator parallelism … … Over-provisioning or under-provisioning? 2018-09-19

Elastic SPEs Scale up … … + + 2018-09-19

Elastic SPEs Scale down … … - - 2018-09-19

Agenda Introduction Motivation and System Model Sample Data Streaming application Evolution of Stream Processing Engines Challenges in the context of Smart Grids 2018-09-19

Challenges in the context of Smart Grids Process energy consumption data Build profiles and spot deviations Predictions / forecasts about consumption 2018-09-19

Challenges in the context of Smart Grids Process control events Spot possible threats Monitor the devices status 2018-09-19

Challenges in the context of Smart Grids How to process the information? Centralized 2018-09-19

Challenges in the context of Smart Grids How to process the information? Distributed (In-network aggregation) 2018-09-19

Challenges in the context of Smart Grids How to deal with constrained/limited resources? What if this device is running out of battery? 2018-09-19

An overview of Data Streaming Questions? 2018-09-19

Bibliography Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. Models and issues in data stream systems. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, PODS ’02, New York, NY, USA, 2002. ACM. Michael Stonebraker, Uˇgur Çetintemel, and Stan Zdonik. The 8 requirements of realtime stream processing. SIGMOD Rec., 34(4), December 2005. Nesime Tatbul. QoS-Driven load shedding on data streams. In Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers, EDBT ’02, London, UK, UK, 2002. Springer-Verlag. Arvind Arasu, Shivnath Babu, and Jennifer Widom. The CQL continuous query language: semantic foundations and query execution. The VLDB Journal, 15(2), June 2006. Daniel J. Abadi, Don Carney, Ugur Cetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. Aurora: a new model and architecture for data stream management. The VLDB Journal, 12(2), August 2003. 9/19/2018

Bibliography Vincenzo Gulisano, Ricardo Jiménez-Peris, Marta Patiño-Martínez, and Patrick Valduriez. Streamcloud: A large scale data streaming system. In ICDCS 2010: International Conference on Distributed Computing Systems, pages 126–137, June 2010. Mehul Shah Joseph, Joseph M. Hellerstein, Sirish Ch, and Michael J. Franklin. Flux: An adaptive partitioning operator for continuous query systems. In In ICDE, 2002. Vincenzo Gulisano, Ricardo Jimenez-Peris, Marta Patino-Martinez, Claudio Soriente, and Patrick Valduriez. Streamcloud: An elastic and scalable data streaming system. IEEE Transactions on Parallel and Distributed Systems, 99(PrePrints), 2012. Thomas Heinze. Elastic complex event processing. In Proceedings of the 8th Middleware Doctoral Symposium, MDS ’11, New York, NY, USA, 2011. ACM. 9/19/2018