Yongluan Zhou University of Southern Denmark Ali Salehi, Karl Aberer EPFL, Switzerland - Manisha Mishra.

Slides:



Advertisements
Similar presentations
DISTRIBUTED COMPUTING PARADIGMS
Advertisements

UNIVERSITY OF JYVÄSKYLÄ Mobile Chedar – A Peer-to-Peer Middleware for Mobile Devices Presentation for International Workshop on Mobile Peer-to- Peer Computing.
MMOM: Efficient Mobile Multicast Support Based on the Mobility of Mobile Hosts YUNGOO HUH and CHEEHA KIM Presented by Kiran Kumar Bankupally.
Manajemen Basis Data Pertemuan 12 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
Database Architectures and the Web
Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.
SIMPLE Presence Traffic Optimization and Server Scalability Vishal Kumar Singh Henning Schulzrinne Markus Isomaki Piotr Boni IETF 67, San Diego.
RTP: A Transport Protocol for Real-Time Applications Provides end-to-end delivery services for data with real-time characteristics, such as interactive.
1 Herald: Achieving a Global Event Notification Service Luis Felipe Cabrera, Michael B. Jones, Marvin Theimer Microsoft Research.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
FeedTree: Sharing Web Micronews with Peer-to-Peer Event Notification D. Sandler, A. Mislove, A. Post, P. Druschel Presented by: Andrew Sutton.
©NEC Laboratories America 1 Hui Zhang Samrat Ganguly Sudeept Bhatnagar Rauf Izmailov NEC Labs America Abhishek Sharma University of Southern California.
Dissemination protocols for large sensor networks Fan Ye, Haiyun Luo, Songwu Lu and Lixia Zhang Department of Computer Science UCLA Chien Kang Wu.
School of Computer Science and Software Engineering A Networked Virtual Environment Communications Model using Priority Updating Monash University Yang-Wai.
William Stallings Data and Computer Communications 7th Edition
1 Ali Salehi,Karl Aberer,Manfred Hauswirth, Andreas Wombacher, Yongluan Zhou, Lei Shu, Antonio Aguilar,Gearoid Hynes, et al.
Hermes: A Distributed Event- Based Middleware Architecture Peter Pietzuch and Jean Bacon 1st DEBS Workshop, Vienna,
Distributed Publish/Subscribe Network Presented by: Yu-Ling Chang.
Gursharan Singh Tatla Transport Layer 16-May
CS 415 N-Tier Application Development By Umair Ashraf July 6,2013 National University of Computer and Emerging Sciences Lecture # 9 Introduction to Web.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
GOVERNMENT SERVICES INTEGRATION INDUSTRY SOLUTION.
Enterprise Systems & Architectures. Enterprise systems are mainly composed of information systems. Business process management mainly deals with information.
MIDDLEWARE SYSTEMS RESEARCH GROUP Denial of Service in Content-based Publish/Subscribe Systems M.A.Sc. Candidate: Alex Wun Thesis Supervisor: Hans-Arno.
Gil EinzigerRoy Friedman Computer Science Department Technion.
Integration Broker PeopleTools Integration Broker Steps –Introduction & terminologies –Application Server PUB/SUB services (Application Server)
Wireless Ad-Hoc Networks
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 4 Communication.
Higashino Lab. Maximizing User Gain in Multi-flow Multicast Streaming on Overlay Networks Y.Nakamura, H.Yamaguchi and T.Higashino Graduate School of Information.
Introduction GOALS:  To improve the Quality of Service (QoS) for the JBI platform and endpoints  E.g., latency, fault tolerance, scalability, graceful.
IntroductionRelated work 2 Contents Publish/Subscribe middleware Conclusion and Future works.
HUAWEI TECHNOLOGIES CO., LTD. Page 1 Survey of P2P Streaming HUAWEI TECHNOLOGIES CO., LTD. Ning Zong, Johnson Jiang.
Distributed Authentication in Wireless Mesh Networks Through Kerberos Tickets draft-moustafa-krb-wg-mesh-nw-00.txt Hassnaa Moustafa
ITCS373: Internet Technology Lecture 5: More HTML.
AUKEGGS Architecturally Significant Issues (that we need to solve)
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Overlook of Messaging.
Communication Paradigm for Sensor Networks Sensor Networks Sensor Networks Directed Diffusion Directed Diffusion SPIN SPIN Ishan Banerjee
Temporal-DHT and its Application in P2P-VoD Systems Abhishek Bhattacharya, Zhenyu Yang & Shiyun Zhang.
Server to Server Communication Redis as an enabler Orion Free
Enterprise Integration Patterns CS3300 Fall 2015.
Hwajung Lee.  Interprocess Communication (IPC) is at the heart of distributed computing.  Processes and Threads  Process is the execution of a program.
GLOBE DISTRIBUTED SHARED OBJECT. INTRODUCTION  Globe stands for GLobal Object Based Environment.  Globe is different from CORBA and DCOM that it supports.
Flashback: A Peer-to-Peer Web Server for Flash Crowds Presented by Tom Batkiewicz CS 587x Fall ‘07.
Rover Technology Enabling Scalable Location Aware Computing ( Wireless ) Myoung – Seo Kim Super Computing Lab
Architectural Design of a Multi- Agent System for handling Metadata streams Don Cruickshank, Luc Moreau, David De Roure Department of Electronics and Computer.
A SEMINAR REPORT ON CELLULAR SYSTEM Introduction to cellular system The cellular concept was developed and introduce by the bell laboratories in the.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
IETF 67 – SIMPLE WG SIMPLE Problem Statement Draft-rang-simple-problem-statement-01 Tim Rang - Microsoft Avshalom Houri – IBM Edwin Aoki – AOL.
Information-Centric Networks10b-1 Week 10 / Paper 2 Hermes: a distributed event-based middleware architecture –P.R. Pietzuch, J.M. Bacon –ICDCS 2002 Workshops.
Tufts Wireless Laboratory School Of Engineering Tufts University Paper Review “An Energy Efficient Multipath Routing Protocol for Wireless Sensor Networks”,
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
Information-Centric Networks Section # 10.2: Publish/Subscribe Instructor: George Xylomenos Department: Informatics.
Peter R Pietzuch and Jean Bacon Peer-to-Peer Overlay Networks in an Event-Based Middleware DEBS’03, San Diego, CA, USA,
NDDS: The Real-Time Publish Subscribe Middleware Network Data Delivery Service An Efficient Real-Time Application Communications Platform Presented By:
Integration Patterns in BizTalk Server 2004 Integration Patterns Explained What are integration patterns? What patterns does BizTalk Server 2004 provide.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
Ben Miller.   A distributed algorithm is a type of parallel algorithm  They are designed to run on multiple interconnected processors  Separate parts.
Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Design of a Notification Engine for Grid Monitoring Events and Prototype Implementation Natascia De Bortoli INFNGRID Technical Board Bologna Feb.
AMSA TO 4 Advanced Technology for Sensor Clouds 09 May 2012 Anabas Inc. Indiana University.
Using ZeroMQ for GEP. 2 About ZeroMQ The “zero” in ZeroMQZeroMQ  Zero Broker  Zero Latency (Low Latency)  Zero Administration  Zero Cost – Cross Platform.
Using Collaborative Filtering to Weave an Information Tapestry
RTP: A Transport Protocol for Real-Time Applications
Message Queuing.
Indirect Communication Paradigms (or Messaging Methods)
Indirect Communication Paradigms (or Messaging Methods)
List servers (listservs)
Presentation transcript:

Yongluan Zhou University of Southern Denmark Ali Salehi, Karl Aberer EPFL, Switzerland - Manisha Mishra

 Continuous queries over data streams typically produce large volumes of result streams whose delivery from the processing server to the end users is bandwidth consuming and hence should be carefully handled.  Existing systems often assume that the result streams are directly sent to the users from the server. Such an architecture cannot scale to a large population of users.

Naive Approach  One query one query result stream.  One query result stream unique identifier.  User's subscription unique identifier. (i.e. user's data interest) butReason?? Inefficient Large communication over head overlap contents

Q1SELECT S2.* FROM Station1 [Range 30 Minutes] S1, Station2 [Now] S2 WHERE S1.snowHeight > S2.snowHeight Q2SELECT S1.snowHeight, S1.timetamp, S2.snowHeight, S2.timestamp FROM Station1 [Range 1 Hour] S1, Station2 [Now] S2 WHERE S1.snowHeight > S2.snowHeight Q3SELECT S2.*, S1.snowHeight, S1.timestamp FROM Station1 [Range 1 Hour] S1, Station2 [Now] S2 WHERE S1.snowHeight > S2.snowHeight n1 n2 n3n4 Q1 Q2 S1 S2 S1 Non-Share n1 = Responsible for processing Q1 and Q2 n2 = Neighboring Broker of n1 n3 = Post query Q1 n4 = Post query Q2 Overlapping contents of S1 and S2 are transmitted twice over the link between n1 and n2. Thus S1,S2,S3 = Result stream of Q1, Q2 and Q3 respectively

n1 n2 n3 n4n4 Q1 Q2 S3 S2S1 Share Query Reformulation Approach  Relatively Simple.  Easy to be implemented.  Implemented as middleware between system processing engine and DPSS(more on this later).  For a group of queries that have overlapping results, the system composes a new query Q called Representative Query that contains all the queries in its group. S3 {S1,S2} We send one result stream S3 to n2, and split S3 into two separate streams S1 and S2 at node n2.

The whole system consists of a stream processing server, a DPSS infrastructure, and a number of end users. Publish/subscribe (or pub/sub) is an asynchronous messaging paradigm where senders (publishers) of messages are not programmed to send their messages to specific receivers (subscribers). Distributed Publish/Subscribe System (DPSS), a scalable data dissemination infrastructure, for efficient stream query result delivery. A subscription in the DPSS is a triple where S = set of stream names, which indicates the streams that are of interest to the subscriber. P = a few selected attributes from the streams in S. F = set of filters over the streams within S. P1: S={S3} P={S2.*} F={-30<S1.Timestamp – S2.Timestamp<=0} P2: S={S3} P={S1.snowHeight, S1.timetamp,S2.snowHeight, S2.timestamp} F={} Tuples that pass p1 are sent to n3 and those that pass p2 are sent to n4 An SPJ query Q is assumed to contain the following components: i. Strm(Q): the input stream of Q ii. Window(Q): the window predicate defined on the input stream A window predicate wi takes an input stream si and three non-negative integer parameters begin i, interval i, slide i. iii. Pred(Q): the predicate specified by Q, which appear in the WHERE clause of the SQL string iv. Attr(Q): the set of attributes selected by Q

D1: A query Q1 is contained by another queryQ2,denotedby Q1[Q2, if for all database instances D,Q1(D) is a subset of Q2(D), i.e. Q1(D)( Q2(D), where Qi(D) is the result of evaluating Qi over D. Q1 and Q2 are equivalent if Q1[Q2 and Q2[Q1. D2: A continuous stream query Q1 is contained by another continuous stream query Q2, denoted by Q1[Q2, if for all stream instances S, Q1(S,t)(Q2(S,t)at any application time instance.Q1 and Q2 are equivalent if Q1[Q2 and Q2[Q1. A1: All the window predicates in a single query have a common sliding step and a common starting time.

L1:For a query with only a window-based join operation of two streams s1 and s2 with window sizes of interval1 and interval2 respectively and a common sliding step “slide" and a common query starting time “begin", two tuples t1 from s1 and t2 from s2 can generate a join result tuple t if and only if all the following conditions are true: (1) they satisfy the join predicates; (2) -1. interval1 <= t1.ts -t2.ts<= interval2. (3) t1.ts > n2. slide - interval1, where n2 = (t2. ts - begin)/slide. (4) t2.ts > n1. slide - interval2, where n1 = (t1.ts - begin)/slide. Window join query Q between two streams s1 and s2, with begin(Q) = 0, slide(Q) = 10, interval1(Q) = 5 and interval2(Q) = 7. Only the timestamps of the arrived tuples are drawn. Fig a: at t=20 Fig b: at t=30 S S S S

T1: A select-project-join (SPJ) continuous query Q1 is contained by another SPJ continuous query Q2 iff they satisfy either conditions (1) ^(2) ^ (3) ^ (4) or conditions (1) ^ (2) ^ (3) ^ (5) ^ (6): (1) Q1 infinity [ Q2 infinity, where Qi is a query resulted from setting all the window sizes of Qi as infinity; (2) begin(Q1) >= begin(Q2); (3) for all i, interval i (Q1) <= interval i (Q2), where interval i (Qj)is the window size of the ith stream involved in Qj ; (4) for all i, 1 <= slide(Q2) <= max (1, interval i (Q2) - interval i (Q1)), where slide(Q2) is the sliding step of Q2. (5) m E [1; infinity ), s.t. begin(Q2)+m. slide (Q2) - begin(Q1) <= min i (interval i (Q2) - interval i (Q1)). (6) slide(Q1) = k. slide(Q2), where k is a positive integer.

Q1 at t = 20 Q2 at t = 22 Temporal relations defined by two queries on the same stream. The parameters are interval(Q1) =4; interval(Q2) = 8; and slide(Q2) = 4 Q1 at t = 19 Q2 at t = 20 Q1 at t = 25 Q2 at t = 26 Temporal relations defined by two queries on the same stream. The parameters are interval(Q1) =5; interval(Q2) = 8; slide(Q1) = 6; and slide(Q2) =

Steps: 1. First Containment checking is done i.e. Q i contains the other Q j.. 2. Then we combine the attribute selection list of both the queries. Attr(Q)<- Attr(Q1) U Attr(Q2); 3. To retrieve the results of Qj from that of Qi, the attribute list Q(representative query) is extended with those that appear in Pred(Q)i.e. Attr(Q)<- Attr(Q) U attributes in Pred(Q1) or Pred(Q2); 4. For those queries that do not contain each other firstly the predicates, stream windows and the selected attributes are merged one after another. 5. Then theorem T1 is applied to see that all the conditions are satisfied.

 First initialize the subscription with the Q's result stream name.  Then add the selected attribute list and predicates of Q i.  After that, filters are added to check whether the result tuples satisfy the window predicate defined by Q i. These filters are generated based on Lemma 1.

The stream processing server only keeps track of the next hop of the delivery path of the result streams. The benefit of the query merging is estimated as (S i C(Qi) – C(Q)).l where, C(Q) is the data rate (bits/sec)of the representative query Q's result stream. C(Qi) is the data rate of the member query Qi's result stream. l is the latency of the common first hop of all the member query Qi. This implicitly says that only queries with a common first hop would be merged.

For efficient grouping maintenance a multi-tree data structure is created. A query tree is built for each query group. The root of the tree is the representative query of the query group. Within the tree, a query Qi is an ancestor of another one Qj only if Qi contains Qj. Q’ Q’1 Q2 Q3 Q4 Q5 An example query tree. This tree represents a group of query: Q2;Q3;Q4 and Q5. Q’1 is a derived query constructed by merging Q3 and Q4. Q5 is contained by Q2. Q’ is the representative query of the whole query group which is derived by merging Q’1 and Q2.

Steps: 1. When a query is first submitted to the system, a query group is selected by using a greedy algorithm which estimates the benefit of merging the new query with each existing query group and select the one with the highest benefit. 2. If no merging has positive benefit, then a new group is generated. 3. Then the new query is added to the corresponding query tree structure of the selected query group or the newly generated group. 4. If the existing representative query of the group and the new query do not contain one another, a new representative query is generated for this query group.

Original Queries: Those queries submitted by the users. Derived Queries: Those queries derived from query merging. Steps: 1. The to-be-terminated query is removed from the tree and then its children are transferred to its parent or a new root is generated if the to-be-terminated query is a root itself. 2. If the to-be-terminated query's parent is a derived query, the merging algorithm will be run to rewrite the parent query to reflect the change.

To enhance the system's scalability, DPSS, a scalable and efficient communication paradigm, is employed to deliver query result streams. To fully exploit the message delivery sharing capability of a DPSS, query grouping and merging is done. To realize this approach, stream query containment theorems are first studied, based on which, query merging and query grouping algorithms are proposed.

Thank you!!