Peter R. Pietzuch THEMIS: Fairness in Data Stream Processing under Overload Evangelia Kalyvianaki City University London, UK 15041 Model-driven.

Slides:



Advertisements
Similar presentations
Peter R. Pietzuch Peer-to-Peer Computing – or how to make your BitTorrent downloads go faster... Peter Pietzuch Large-Scale Distributed.
Advertisements

Scalable Content-Addressable Network Lintao Liu
DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song.
Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,
Institute for Software Science – University of ViennaP.Brezany 1 Databases and the Grid Peter Brezany Institute für Scientific Computing University of.
CStream: Neighborhood Bandwidth Aggregation For Better Video Streaming Thangam Vedagiri Seenivasan Advisor: Mark Claypool Reader: Robert Kinicki 1 M.S.
Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.
Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.
Muhammad Mahmudul Islam Ronald Pose Carlo Kopp School of Computer Science & Software Engineering Monash University, Australia.
Rendezvous Points-Based Scalable Content Discovery with Load Balancing Jun Gao Peter Steenkiste Computer Science Department Carnegie Mellon University.
Improving the Accuracy of Continuous Aggregates & Mining Queries Under Load Shedding Yan-Nei Law* and Carlo Zaniolo Computer Science Dept. UCLA * Bioinformatics.
CSIT530 Projects -- 1 H.Lu/HKUST CSIT530: Suggested Projects  Three types of projects  System implementation  Literature survey  Research  General.
Department of Computer Engineering Koc University, Istanbul, Turkey
Bandwidth Allocation in a Self-Managing Multimedia File Server Vijay Sundaram and Prashant Shenoy Department of Computer Science University of Massachusetts.
A Distributed Data Architecture Mark Jessop University of York.
On Fairness, Optimizing Replica Selection in Data Grids Husni Hamad E. AL-Mistarihi and Chan Huah Yong IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,
New Challenges in Cloud Datacenter Monitoring and Management
Sensor Networks Storage Sanket Totala Sudarshan Jagannathan.
Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.
Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected.
Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2
Power Distribution and Redistribution of Workloads in Cloud Computing Facilities Cornell Wilson.
Switch-and-Navigate: Controlling Data Ferry Mobility for Delay-Bounded Messages Liang Ma*, Ting He +, Ananthram Swami §, Kang-won Lee + and Kin K. Leung*
A Prediction-based Fair Replication Algorithm in Structured P2P Systems Xianshu Zhu, Dafang Zhang, Wenjia Li, Kun Huang Presented by: Xianshu Zhu College.
Ramazan Bitirgen, Engin Ipek and Jose F.Martinez MICRO’08 Presented by PAK,EUNJI Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors.
Organisation Management and Policy Group (MPG): Responsible for setting and policy decisions and resolving any issues concerning fractional usage, acceptable.
CAMP: Fast and Efficient IP Lookup Architecture Sailesh Kumar, Michela Becchi, Patrick Crowley, Jonathan Turner Washington University in St. Louis.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
S.Sathya M.Victor Jose Department of Computer Science and Engineer Noorul Islam Centre for Higher Education Kumaracoil,Tamilnadu,IndiaPROCEEDINGS OF ICETECT.
A Node and Load Allocation Algorithm for Resilient CPSs under Energy-Exhaustion Attack Tam Chantem and Ryan M. Gerdes Electrical and Computer Engineering.
Muhammad Mahmudul Islam Ronald Pose Carlo Kopp School of Computer Science & Software Engineering Monash University, Australia.
Prepare by : Ihab shahtout.  Overview  To give an overview of fixed priority schedule  Scheduling and Fixed Priority Scheduling.
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
Load Shedding Techniques for Data Stream Systems Brian Babcock Mayur Datar Rajeev Motwani Stanford University.
1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.
Accommodating Bursts in Distributed Stream Processing Systems Yannis Drougas, ESRI Vana Kalogeraki, AUEB
On Reducing Mesh Delay for Peer- to-Peer Live Streaming Dongni Ren, Y.-T. Hillman Li, S.-H. Gary Chan Department of Computer Science and Engineering The.
Adaptive Resource Management Architecture for DRE Systems Nishanth Shankaran
Programming Sensor Networks Andrew Chien CSE291 Spring 2003 May 6, 2003.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Adaptive Tracking in Distributed Wireless Sensor Networks Lizhi Yang, Chuan Feng, Jerzy W. Rozenblit, Haiyan Qiao The University of Arizona Electrical.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Song Liu‡, Sunil Prabhakar†, and Bin Yao‡ † Department of Computer.
Control-Theoretic Approaches for Dynamic Information Assurance George Vachtsevanos Georgia Tech Working Meeting U. C. Berkeley February 5, 2003.
Studying and Implementing Multi-processor based Real-time Scheduling Algorithms in Linux Musfiq Niaz Rahman
Stela: Enabling Stream Processing Systems to Scale-in and Scale-out On- demand Le Xu ∗, Boyang Peng†, Indranil Gupta ∗ ∗ Department of Computer Science,
Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
A Practical Performance Analysis of Stream Reuse Techniques in Peer-to-Peer VoD Systems Leonardo B. Pinho and Claudio L. Amorim Parallel Computing Laboratory.
Big thanks to everyone!.
Presented by: Saurav Kumar Bengani
Applying Control Theory to Stream Processing Systems
Data Stream Management System (DSMS)
Web Content FileSystem
StreamApprox Approximate Stream Analytics in Apache Flink
Scalability of Persistent Queries
Load Shedding Techniques for Data Stream Systems
StreamApprox Approximate Stream Analytics in Apache Spark
StreamApprox Approximate Computing for Stream Analytics
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Performance– Meaning and Metrics
Jigar.B.Katariya (08291A0531) E.Mahesh (08291A0542)
Attribute Based Addressing for SIP
CLUSTER BY: A NEW SQL EXTENSION FOR SPATIAL DATA AGGREGATION
with Raul Castro Fernandez* Matteo Migliavacca+ and Peter Pietzuch*
Probabilistic Ranking of Database Query Results
Presentation transcript:

Peter R. Pietzuch THEMIS: Fairness in Data Stream Processing under Overload Evangelia Kalyvianaki City University London, UK Model-driven Algorithms and Architectures for Self-Aware Computing Systems, Dagstuhl 2015 Marco Fiscato Imperial College London, UK Theodoros Salonidis IBM Research, USA Peter Pietzuch Imperial College London, UK

The Puzzle of Big Data Real-Time Processing Engines in Data Centres 2 Queries overload data center resources. How to efficiently allocate resources across clusters/engines?

3 A well-known technique to handle transient overload conditions is to discard data [][][] Data Shedding overloaded How to measure shedding across queries? a well-known mechanism to handle transient overload conditions is to discard data How much data should we shed from queries? How to implement shedding in this distributed setup?

4 shedding data  reduced correctness  degraded performance different dropped data  difference degrees of degradation Source Information Content (SIC) metric measures the contribution of data from sources to results 11/6 < 3 degraded processing perfect processing How to measure shedding across queries? SIC is a data-stream-processing-aware metric. But can we have a metric that is operator- or query-aware?

5 Fair Shedding for Equalising SIC values each local shedder equalises the SIC values of its own queries global coordination is achieved with local informed shedding

6 SIC Fair Shedder to address nodes’ heterogeneity and workload variations: online cost model estimates the time to process an average tuple Could we build the system to be goal-aware?

7 A self-aware autonomic system for data processing in real-time Systems already have (some) adaption and (some) self-awareness but could we extend to (full) self-awareness? For example, can we build a self-aware system to perform fair data shedding for data stream processing and databases and filesystems in overload? Thank you! Questions?