Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors:

Slides:



Advertisements
Similar presentations
Adaptive QoS Control Based on Benefit Optimization for Video Servers Providing Differential Services Ing-Ray Chen, Sheng-Yun Li, I-Ling Yen Presented by.
Advertisements

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.
Implementing Declarative Overlays From two talks by: Boon Thau Loo 1 Tyson Condie 1, Joseph M. Hellerstein 1,2, Petros Maniatis 2, Timothy Roscoe 2, Ion.
4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.
Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.
Panoptes: A Scalable Architecture for Video Sensor Networking Applications Wu-chi Feng, Brian Code, Ed Kaiser, Mike Shea, Wu-chang Feng (OGI: The Oregon.
Effective Coordination of Multiple Intelligent Agents for Command and Control The Robotics Institute Carnegie Mellon University PI: Katia Sycara
Variability Oriented Programming – A programming abstraction for adaptive service orientation Prof. Umesh Bellur Dept. of Computer Science & Engg, IIT.
Chapter 6: Database Evolution Title: AutoAdmin “What-if” Index Analysis Utility Authors: Surajit Chaudhuri, Vivek Narasayya ACM SIGMOD 1998.
Naming in Wireless Sensor Networks. 2 Sensor Naming  Exploiting application-specific naming and in- network processing for building efficient scalable.
1 SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref Presented by.
Transactions – T4.3 Title: Concurrency Control Performance Modeling: Alternatives and Implications Authors: R. Agarwal, M. J. Carey, M. Livny ACM TODS,
Chapter 3: Data Storage and Access Methods
Chapter 4: Transaction Management
Benchmarks Title: A Measure of Transaction Processing Power Authors: Anon Et. Al. Datamation, 1985.
What Can Databases Do for Peer-to-Peer Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu Presented by: Ryan Huebsch CS294-4 P2P Systems.
Modeling and Evaluation. Modeling Information system model –User perspective of data elements and functions –Use case scenarios or diagrams Entity model.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Distribution Statement A. Approved for public release; distribution is unlimited. Test and Evaluation/Science and Technology Program Rapid Data Analyzer.
The Network Enabled Verification Service (NEVS) in Support of NNEW Capability Evaluation Sean Madine ESRL/GSD/FVS 15 September 2010.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Hopkins Storage Systems Lab, Department of Computer Science A Workload-Driven Unit of Cache Replacement for Mid-Tier Database Caching Xiaodan Wang, Tanu.
Managing Service Metadata as Context The 2005 Istanbul International Computational Science & Engineering Conference (ICCSE2005) Mehmet S. Aktas
Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
Managing a Cloud For Multi Agent System By, Pruthvi Pydimarri, Jaya Chandra Kumar Batchu.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Chapter 10: Stream-based Data Management Title: Retrospective on Aurora Authors: Hari Balakrishnan, et. al.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Oracle9i Performance Tuning Chapter 1 Performance Tuning Overview.
Data Tagging Architecture for System Monitoring in Dynamic Environments Bharat Krishnamurthy, Anindya Neogi, Bikram Sengupta, Raghavendra Singh (IBM Research.
Event Processing A Perspective From Oracle Dieter Gawlick, Shailendra Mishra Oracle Corporation March,
Distributed Virtual Environments Introduction. Outline What are they? DVEs vs. Analytic Simulations DIS –Design principles Example.
Event-Based Hybrid Consistency Framework (EBHCF) for Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey.
Content Sharing over Smartphone-Based Delay- Tolerant Networks.
Enabling Peer-to-Peer SDP in an Agent Environment University of Maryland Baltimore County USA.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
1 Flow and Congestion Control for Reliable Multicast Communication In Wide-Area Networks A Doctoral Dissertation By Supratik Bhattacharyya.
Early Profile Pruning on XML-aware Publish- Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California VLDB 2007 Presented.
March 2004 At A Glance NASA’s GSFC GMSEC architecture provides a scalable, extensible ground and flight system approach for future missions. Benefits Simplifies.
Multiuser Receiver Aware Multicast in CDMA-based Multihop Wireless Ad-hoc Networks Parmesh Ramanathan Department of ECE University of Wisconsin-Madison.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Querying The Internet With PIER Nitin Khandelwal.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
Review of Parnas’ Criteria for Decomposing Systems into Modules Zheng Wang, Yuan Zhang Michigan State University 04/19/2002.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
1 Scalability of a Mobile Cloud Management System Roberto Bifulco* Marcus Brunner** Roberto Canonico* Peer Hasselmeyer** Faisal Mir** * Università di Napoli.
Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.
Building Wireless Efficient Sensor Networks with Low-Level Naming J. Heihmann, F.Silva, C. Intanagonwiwat, R.Govindan, D. Estrin, D. Ganesan Presentation.
1 A Grid-Based Middleware’s Support for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
3 Copyright © 2006, Oracle. All rights reserved. Designing and Developing for Performance.
Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
SketchVisor: Robust Network Measurement for Software Packet Processing
Outline Introduction. Changes made to the Tycho design from last time (June 2005). Example Tycho setup. Tycho benchmark motivations and methodology. Some.
Online parameter optimization for elastic data stream processing
Database Performance Tuning and Query Optimization
Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Networked Real-Time Systems: Routing and Scheduling
Chapter 11 Database Performance Tuning and Query Optimization
EdgeWise: A Better Stream Processing Engine for the Edge
Presentation transcript:

Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors: Navendu Jain, Lisa Amini, et. al.

Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Problem –Problem Statement –Why is this problem important? –Why is this problem hard? Approaches –Approach description, key concepts –Contributions (novelty, improved) –Assumptions

Problem Statement Given –Stream data, continuous queries in large-scale distributed environments –Streaming data application (Linear Road) –Stream processing middleware (Stream Processing Core, SPC) Find: –Performance bottlenecks of streaming data applications Objectives –Understand the performance characteristics of the stream data application Constraints –SPC is constantly overloaded with respect to the available resources. –Processing elements are a mix of I/O-bound as well as CPU-bound. –It is unrealistic for applications to store the full history of a stream in memory.  Memory-bound.

Why is this problem important? High volume, continuous data are ubiquitous. –Text and transactional data –Digital audio, video, and image –Instant messages, network packet traces –Sensor data Stream processing applications become important in the networking and database community.

Why is this problem Hard? Stream data are –Large volume –High data rates –Generated by multiple distributed data sources –Rapidly updated Processing stream data requires –Filtering –Aggregation –Correlation A system supporting the stream data processing applications should consider –Scalability –Latency –Resource utilization

Novelty of Contribution Related Work –DataCutter, StreaMIT: Connections between applications are statically determined. –TelegraphCQ, Aurora, Borealis, STREAM: provide support for stream data manipulation from a database-centric perspective, but, process streams of tuples individually. (i.e., small-scale) –Benchmarks: Previous works on Linear Road did not report any performance number Contributions –SPC is dynamic application composition. –Evaluate the SPC using the Linear Road application employing multiple distributed configurations.  Highly scalable implementation of the Linear Road application –Study the behavior of the streaming infrastructure support for large- scale continuous and historical queries.  Addressing performance bottlenecks and tuning them.

SPC Architecture Publish-subscribe model –Each processing element (PE) that consumes and produces stream data specifies the characteristics of the streams. –SPC dynamically determines the stream connections by matching stream descriptors as new applications and new data sources join and leave the system. Reusing streams –Results in significant resource savings. –Discovers useful info. over an ever- changing set of data sources.

Performance Challenges and Optimizations in SPC Challenges –The PEs consist of performing Small amount of processing on large volumes of data Large amount of processing on lower volumes of data Thus, a mix of I/O-bound & CPU-bound –Impossible to store stream history in memory  memory-bound Optimizations –SDO filtering: SPC can filter out unwanted objects  saving resources. –Events: PEs can subscribe to system events.  Can adapt its algorithm. –Dynamic copies of PEs

Linear Road Benchmark Simulates the traffic characteristics of a simple urban expressway system. Input to the Linear Road benchmark is stream data format. Requires stream-based data management system (SDMS) to process a set of continuous and historical queries.

Prototype Implementation Design principles –Modularity –Data Aggregation –Network and Data Locality –Flexible Programming Environment Linear Road in SPC –The figure shows the query network infrastructure comprising 15 PEs.

Experiments Input data is increasing over time for stress-test Scalability

Experiments Analyzing Bottleneck PEs PE Placement Policy

Summary Paper’s focus –Understanding the performance characteristics of stream processing applications in a distributed setup Ideas –Design and implementation of the Linear Road benchmark on the SPC middleware. –Identify the main performance bottlenecks to achieve scalability and low query response latency Contributions –Demonstrate a scalable distributed implementation of Linear Road –Highlight the importance of addressing performance bottlenecks Analytical Validation –Experiments –Prototyping

Assumptions, Rewrite today Assumptions –Restrict evaluation to SPC support for the Linear Road application assuming that their design decisions are performance results are applicable to other streaming applications. –The system is constantly overloaded with respect to the available resources. –PEs are I/O, CPU, and memory bound. Rewrite today –Apply the ideas to other types of streaming applications. –More extensive experiments on performance tuning.