Multimedia Data Stream Management System

Slides:



Advertisements
Similar presentations
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Advertisements

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
SDN Controller Challenges
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Spark: Cluster Computing with Working Sets
The Design of the Borealis Stream Processing Engine Brandeis University, Brown University, MIT Magdalena BalazinskaNesime Tatbul MIT Brown.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Overview Distributed vs. decentralized Why distributed databases
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Copyright ©2009 Opher Etzion Event Processing Course Engineering and implementation considerations (related to chapter 10)
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Computer System Lifecycle Chapter 1. Introduction Computer System users, administrators, and designers are all interested in performance evaluation. Whether.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
1 The Google File System Reporter: You-Wei Zhang.
Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.
Functions of a Database Management System
Cluster Reliability Project ISIS Vanderbilt University.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
CS4730 Real-Time Systems and Modeling Fall 2010 José M. Garrido Department of Computer Science & Information Systems Kennesaw State University.
Seminar On Rain Technology
Gorilla: A Fast, Scalable, In-Memory Time Series Database
SEMINAR TOPIC ON “RAIN TECHNOLOGY”
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
CPSC-310 Database Systems
Short History of Data Storage
Database Systems: Design, Implementation, and Management Tenth Edition
Real-time Software Design
HERON.
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Managing Multi-User Databases
Hadoop.
Module 11: File Structure
Introduction to Load Balancing:
The Stream Model Sliding Windows Counting 1’s
Applying Control Theory to Stream Processing Systems
Chapter 19: Architecture, Implementation, and Testing
Database Systems: Design, Implementation, and Management Tenth Edition
Software Design and Architecture
Part 3 Design What does design mean in different fields?
The Client/Server Database Environment
Methodology – Physical Database Design for Relational Databases
Physical Database Design for Relational Databases Step 3 – Step 8
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Software Architecture in Practice
Real-time Software Design
Database Performance Tuning and Query Optimization
Exploring Azure Event Grid
Replication Middleware for Cloud Based Storage Service
Database Management System (DBMS)
April 30th – Scheduling / parallel
1 Demand of your DB is changing Presented By: Ashwani Kumar
湖南大学-信息科学与工程学院-计算机与科学系
GEOMATIKA UNIVERSITY COLLEGE CHAPTER 2 OPERATING SYSTEM PRINCIPLES
Cse 344 May 4th – Map/Reduce.
Physical Database Design
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Clouds & Containers: Case Studies for Big Data
Software models - Software Architecture Design Patterns
Query Optimization CS 157B Ch. 14 Mien Siao.
Distributed Databases
Operating Systems : Overview
Multithreaded Programming
Chapter 11 Database Performance Tuning and Query Optimization
Overview of Workflows: Why Use Them?
Control Theory in Log Processing Systems
The SMART Way to Migrate Replicated Stateful Services
Adaptive Query Processing (Background)
An Analysis of Stream Processing Languages
Outline Introduction Background Distributed DBMS Architecture
Model-based Adaptation for Self-Healing Systems David Garlan, Bradley Schmert ELSEVIER Sciences of Computer Programming 57 (2005) 이경렬
Presentation transcript:

Multimedia Data Stream Management System By David Kleinman

Outline Definition Motivating Examples Nine Requirements Current Systems Comparison Brief Overview of current Stream Systems Preview of My Project

What is it? Stream of multimedia data from a source (video camera) Query stored in a system (This query may itself change Process high volumes of data in real-time

Motivating Examples Security Surveillance Baby Sitting Traffic Reports Crowd Security Air Security Burglary Baby Sitting Traffic Reports Science Animal behavior Ocean Sensors can be expensive – especially when applied across a large area. It is also difficult to install (i.e. Washington D.C. do not have right to install on other’s buildings. Sensors are not 100% dependable. Sensors can be disabled easily. They are often located on the ground and are easily accessible. A video camera can be stored in one safe location high above ground. Sensors cannot detect items that a video camera can detect.

Reqirement #1 - Process Quickly Low latency Messages Processed “In-Stream” No Storage to perform operation Active System Avoid Polling Low latency – the system must be able to perform message processing without having a costly storage operation. Storage adds latency to process. – i.e. writing to a database requires a disk write – Passive systems wait to be told what to do by an application before beginning processing. Passive systems require applications to continuously poll for conditions. Polling results in additional latency because on average half the polling interval is added to the processing delay. Active systems avoid this by having built in event/data driven processing capabilities

Requirement #2 – Query using SigmaQL for Streams (StreamSigmaQL) Querying Mechanism Based on SQL Express Continuous Streams of Data Window Construct Time Frames Breakpoints Merge Operator SQL has remained the most enduring standard database language for over 30 years because it is very good at expressing complex data transformations. It is based on a set of very powerful data processing primitives that do filtering, merging, correlation, and aggregation. Also SQL is widely understood and used by programmers. The language should be easy to learn Windows should be definable over time, number of frames, or breakpoints in other attributes in a message. Windows should be able to slide a variable amount. Depending on slide amount windows can be made disjoint or overlapping A merge operator is needed to join multiple streams

Requirement # 3 –Handle Imperfections Data might be late delayed, missing, or out-of sequence Time out individual calculations or computations Challenges with Dealing with out-of-order data Mechanism for additional time Networks aren’t reliable Let’s say computing average number of people in ten rooms. One of the cameras in a room is broken. You don’t want the system to block waiting for a result that will not come. A time out system is a must Let’s say you have a time window 9:00 – 9:01; Ordinarily after a timestamp greater than 9:01 is received the window will be closed. However this action assumes that data arrives in timestamp order which is not always the case. To deal with out of order data, a mechanism must be provided to allow windows to stay open for additional period time.

Requirement #4 – Generate Predictable Outcomes Generate deterministic and repeatable results Time-ordered deterministic processing throughout entire pipeline Important for fault tolerance and recovery A stream processing system must process time-series messages in a predictable manner to ensure that the results of the processing are deterministic and repeatable Soundtrack with movie – It’s important that the frames of the movie and the sound wav file are processed in the correct order Time ordering is needed to guarantee correctness Time ordering is needed for fault tolerance and recovery, as replaying the same stream should yield the same results

Requirement #5 – Integrate Stored and Streaming Data Comparing present with past Capability to efficiently store, access, and modify state information A query may wish to include a picture with known terrorists Finding unusual activity – requires gathering the usual activity patterns and comparing

Requirement #6 – Guarantee Data Safety Must use a high-availability solution Secondary System Synchronizes with primary frequently Takes over in case of failure Mission critical information needs backup plan. If monitoring can’t have it failing.

Requirement #7 – Partition and Scale Automatically Take advantage of distributed computing Support multi-threading Takes advantage of multi-processor Avoids blocking Load Balance across machines Automatic process Transparent

Requirement #8 – Process and Respond Instantaneously Needs to respond in real – time Highly optimized, minimal overhead execution path All system components have high performance

Requirement #9 - Adaptability Change queries without restarting Accept all different types of multimedia streams Allow for custom configuration Work with different systems API

DBMS Widely used Passive Do not keep data moving Use SQL – but not equipped for Streams Passive Do not keep data moving Difficult to handle out of order data Difficulty with predictable out comes Incur latency with seamless integration Widely used due to their ability to reliably store large data sets and efficiently process human-iniated queries. Passive – wait to be told what to do. Some have trigger mechanisms but Triggers are not scalable Moving – require write to disk and then access – not real time Difficult – trigger systems have no obvious way to time out. Predicatable outcomes are difficult because they are passive

Rule Engine Example – Prolog Active Handle imperfections Troubles with seamless integration A rule engine typically accepts condition / action pairs – using if then notation – enforces a collection of rules

Stream Processing Engine Handle all the requirements Not specifically designed to handle multimedia constraints Not Specifically designed to handle streams of multimedia

Chart DBMS Rule Engine SPE MSPE Keep data moving No Yes SigmaQL Handle Imperfections Difficult Possible Predictable outcome High availability Stored and Streamed data Distribution and scalability POssible Real time Adaptability

Aurora DSMS developed at MIT and Brown

Aurora Query Network QoS . Consists of operator boxes and connection points – storage points Use QoS graphs to determine best path Has built in scheduling, optimization and load shedding Supports distributed environment

Stream Management System Developed at Stanford Uses synopsis and queues

Simple Query Plan Q1 Q2  ⋈ ⋈ State3 State4 Scheduler State1 State2 Stream3 Consists of queues which connect producer and consumer Synopses – has tables at operators to store state Has a scheduler Stream1 Stream2

NiagaraCQ Developed at Wisconsin First DSMS Uses a grouping strategy Not as complete as other two

System Architecture

TelegraphCQ Developed at Berkeley Stem – storage point Eddy – route tuples Good at handling multiple queries Adaptive

Adaptivity (Telegraph) Output Queues STeMs for join R grouped filter (R.A) EDDY S grouped filter (S.B) R x S x T T Input Streams R S T Runtime Adaptivity Multi-query Optimization Framework – implements arbitrary schemes

My Project Design a multimedia streaming database Outline the specifications The Scheduling algorithm The query structure The operators Etc.