Scalable Distributed Stream Processing Presented by Ming Jiang.

Slides:



Advertisements
Similar presentations
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Advertisements

Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
Decentralized Reactive Clustering in Sensor Networks Yingyue Xu April 26, 2015.
Jaringan Informasi Pengantar Sistem Terdistribusi oleh Ir. Risanuri Hidayat, M.Sc.
IPv6 Multihoming Support in the Mobile Internet Presented by Paul Swenson CMSC 681, Fall 2007 Article by M. Bagnulo et. al. and published in the October.
PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric. Presented by: Vinuthna Nalluri Shiva Srivastava.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Address Configuration in Adhoc Networks in IPv4. By Meenakshi Sundaram V.
WebLogic Clustering - Failover, and Load Balancing Bryan Ferrel and Ramarao Desaraju CS 522 Computer Communications December 4, 2002.
NETWORK LOAD BALANCING NLB.  Network Load Balancing (NLB) is a Clustering Technology.  Windows Based. (windows server).  To scale performance, Network.
Network Operating Systems Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: –Logging into the.
Java Parallel Processing Framework. Presentation Road Map What is Java Parallel Processing Framework JPPF Features JPPF Requirements JPPF Topology JPPF.
Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.
1 Internet Networking Spring 2004 Tutorial 13 LSNAT - Load Sharing NAT (RFC 2391)
2/18/2004 Challenges in Building Internet Services February 18, 2004.
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
Scalable Distributed Stream System Mitch Cherniack, Hari Balakrishnan, Magdalena Balazinska, Don Carney, Uğur Çetintemel, Ying Xing, and Stan Zdonik Proceedings.
OSD Metadata Management
Content Addressable Networks. CAN Associate with each node and item a unique id in a d-dimensional space Goals –Scales to hundreds of thousands of nodes.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Distributed Transactions Transaction may access data at several sites. Each site has a local.
EECE 411: Design of Distributed Software Applications What is a Distributed System? You know when you have one … … when the failure of a computer you’ve.
CSE 461: Distance Vector Routing. Next Topic  Focus  How do we calculate routes for packets?  Routing is a network layer function  Routing Algorithms.
Distributed Systems CS Programming Models- Part I Lecture 13, Oct 13, 2014 Mohammad Hammoud 1.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #12 LSNAT - Load Sharing NAT (RFC 2391)
Study of Server Clustering Technology By Thao Pham and James Horton For CS526, Dr. Chow.
Distributed Systems Architecture Presentation II Presenters Rose Kit & Turgut Tezir.
By Mohammad Alsawwaf Supervised By Dr. Lee NETWORK LOAD BALANCING NLB.
1 Locating Application Data Across Service Discovery Domains MobiCom’01.
Introduction of P2P systems
CS3502: Data and Computer Networks Local Area Networks - 4 Bridges / LAN internetworks.
Networks – Network Architecture Network architecture is specification of design principles (including data formats and procedures) for creating a network.
Wireless Access and Terminal Mobility in CORBA Dimple Kaul, Arundhati Kogekar, Stoyan Paunov.
1 Next Few Classes Networking basics Protection & Security.
Networked Graphics Building Networked Virtual Environments and Networked Games Chapter 12: Scalability.
Introduction GOALS:  To improve the Quality of Service (QoS) for the JBI platform and endpoints  E.g., latency, fault tolerance, scalability, graceful.
The Internet Trisha Cummings ITE115. What is the Internet? The Internet is a world-wide network of computer networks that use a common communications.
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
CS1Q Computer Systems Lecture 17 Simon Gay. Lecture 17CS1Q Computer Systems - Simon Gay2 The Layered Model of Networks It is useful to think of networks.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
Lecture 4: Sun: 23/4/1435 Distributed Operating Systems Lecturer/ Kawther Abas CS- 492 : Distributed system & Parallel Processing.
Association Rule Mining in Peer-to-Peer Systems Ran Wolff Assaf Shcuster Department of Computer Science Technion I.I.T. Haifa 32000,Isreal.
1 Detecting and Reducing Partition Nodes in Limited-routing-hop Overlay Networks Zhenhua Li and Guihai Chen State Key Laboratory for Novel Software Technology.
1 Computer Networking Dr. Mohammad Alhihi Communication and Electronic Engineering Department Philadelphia University Faculty of Engineering.
Content Addressable Networks CAN is a distributed infrastructure, that provides hash table-like functionality on Internet-like scales. Keys hashed into.
Databases Illuminated
OSPF Offloading: The HELLO Protocol A First Step Toward Distributed Heterogeneous Offloading Speaker: Mary Bond.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
MQTT QoS2 Considerations Konstantin Dotchkoff. Challenges associated with implementing QoS 2 in large scale distributed systems Replication of QoS 2 messages.
CS1001 Lecture 7. Overview Computer Networks Computer Networks The Internet The Internet Internet Services Internet Services Markup Languages Markup Languages.
Information-Centric Networks10b-1 Week 10 / Paper 2 Hermes: a distributed event-based middleware architecture –P.R. Pietzuch, J.M. Bacon –ICDCS 2002 Workshops.
Distributed Computing Systems CSCI 4780/6780. Scalability ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Networks and the Internet Topic 3. Three Important Networking Technologies Networks, Internet, WWW.
Distributed DBMS, Query Processing and Optimization
Computer Network Architecture Lecture 6: OSI Model Layers Examples 1 20/12/2012.
Hiearchial Caching in Traffic Server. Hiearchial Caching  A set of techniques and mechanisms to increase the size and performance of network caches.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Netprog: Chat1 Chat Issues and Ideas for Service Design Refs: RFC 1459 (IRC)
1 FairOM: Enforcing Proportional Contributions among Peers in Internet-Scale Distributed Systems Yijun Lu †, Hong Jiang †, and Dan Feng * † University.
Mobile Ad Hoc Networking By Shaena Price. What is it? Autonomous system of routers and hosts connected by wireless links Can work flawlessly in a standalone.
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Distributed System 電機四 陳伯翰 b
Chat Refs: RFC 1459 (IRC).
Commit Protocols CS60002: Distributed Systems
Distributed computing deals with hardware
Presentation transcript:

Scalable Distributed Stream Processing Presented by Ming Jiang

Centralized stream processing review

Situation when distributed A distributed federation of participating nodes in different administrative domains Collaboration between different domains required

Two complementary efforts for the situation Aurora* intra-participant distribution Medusa inter-participant distribution

Three pieces to be shard Aurora An overlay network of communication Algorithms for high-availability

Three architectural issues Communications Load sharing High availability in the presence of failure

Communications Naming (participants, entity-name) Routing 1. a data source or an administrator registers a schema and a stream 2. When DS produce an event, labels

Communications Message Transport multiplexing all the message streams on a single TCP connection Remote definition: process migration is too complicated

Load Management Repartitioning Aurora Networks, based on loads and resources: Box Sliding Box Splitting

Box Sliding Takes a box on the edge of a sub- network on one machine and shifts it to its neighbor. upstream box sliding

Box Splitting Create a copy of a box that is intended to run on second machine, to offload Need a filter as router

Box splitting Tumble Merge: Box splitting has to be transparent

Box splitting If predicate in filter is: B<3  A machine: 1,2,3,4,7B machine: 5,6 A machine B machine final result after merge

Key partitioning Challenges Choosing what to offload Choosing what to split Choosing filters Others…

High Availability Utilize the push-based nature

Failure detection and Recovery 1. periodically send heartbeat msgs to upstream neighbors 2. if any server does not reply for pre-defined time, we assume it failed 3. initiate recovery phase, emulating the process of failed server (load shedding can be used)

Thank you!