SPUD A Distributed High Performance Publish-Subscribe Cluster Uriel Peled and Tal Kol Guided by Edward Bortnikov Software Systems Laboratory Faculty of.

Slides:



Advertisements
Similar presentations
Capacity Planning for LAMP Architectures John Allspaw Manager, Operations Flickr.com Web Builder 2.0 Las Vegas.
Advertisements

Multiple Processor Systems
Chorus and other Microkernels Presented by: Jonathan Tanner and Brian Doyle Articles By: Jon Udell Peter D. Varhol Dick Pountain.
Netscape Application Server Application Server for Business-Critical Applications Presented By : Khalid Ahmed DS Fall 98.
CS533 Concepts of Operating Systems Jonathan Walpole.
Highly Available Central Services An Intelligent Router Approach Thomas Finnern Thorsten Witt DESY/IT.
Module 8: Concepts of a Network Load Balancing Cluster
04/14/2008CSCI 315 Operating Systems Design1 I/O Systems Notice: The slides for this lecture have been largely based on those accompanying the textbook.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Page: 1 Director 1.0 TECHNION Department of Computer Science The Computer Communication Lab (236340) Summer 2002 Submitted by: David Schwartz Idan Zak.
Precept 3 COS 461. Concurrency is Useful Multi Processor/Core Multiple Inputs Don’t wait on slow devices.
1 Concurrent and Distributed Systems Introduction 8 lectures on concurrency control in centralised systems - interaction of components in main memory -
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
3.5 Interprocess Communication
Computer Science Lecture 2, page 1 CS677: Distributed OS Last Class: Introduction Distributed Systems – A collection of independent computers that appears.
Scaling Service Requests Linux: ipvsadm & iptoip.
Design and Implementation of a Server Director Project for the LCCN Lab at the Technion.
1 Chapter 4 Threads Threads: Resource ownership and execution.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
1 I/O Management in Representative Operating Systems.
Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.
VSP Video Station Protocol Presented by : Mittelman Dana Ben-Hamo Revital Ariel Tal Instructor : Sela Guy Presented by : Mittelman Dana Ben-Hamo Revital.
COM S 614 Advanced Systems Novel Communications U-Net and Active Messages.
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
Module – 7 network-attached storage (NAS)
Fronting Tomcat with Apache Httpd Mladen Turk Red Hat, Inc.
© 2011 IBM Corporation 11 April 2011 IDS Architecture.
Christopher Bednarz Justin Jones Prof. Xiang ECE 4986 Fall Department of Electrical and Computer Engineering University.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services by, Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Networked File System CS Introduction to Operating Systems.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
CS 153 Design of Operating Systems Spring 2015
Global NetWatch Copyright © 2003 Global NetWatch, Inc. Factors Affecting Web Performance Getting Maximum Performance Out Of Your Web Server.
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Exercises for Chapter 7 Operating.
CHEN Ge CSIS, HKU March 9, Jigsaw W3C’s Java Web Server.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Background: I/O Concurrency Brad Karp UCL Computer Science CS GZ03 / M030 2 nd October, 2008.
Server to Server Communication Redis as an enabler Orion Free
The Client-Server Model And the Socket API. Client-Server (1) The datagram service does not require cooperation between the peer applications but such.
Christopher Bednarz Justin Jones Prof. Xiang ECE 4986 Fall Department of Electrical and Computer Engineering University.
Processes and Virtual Memory
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
Java Message Service (JMS) Web Apps and Services.
6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris
Chapter 2 Operating System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Threads. Readings r Silberschatz et al : Chapter 4.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
Presented by Deepak Varghese Reg No: Introduction Application S/W for server load balancing Many client requests make server congestion Distribute.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Kernel Design & Implementation
Chapter 9: The Client/Server Database Environment
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Netscape Application Server
Processes and Threads Processes and their scheduling
Chapter 4: Multithreaded Programming
The Client/Server Database Environment
CS703 - Advanced Operating Systems
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Concurrency: Processes CSE 333 Summer 2018
CS510 Operating System Foundations
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
CS703 – Advanced Operating Systems
A tutorial on building large-scale services
Presentation transcript:

SPUD A Distributed High Performance Publish-Subscribe Cluster Uriel Peled and Tal Kol Guided by Edward Bortnikov Software Systems Laboratory Faculty of Electrical Engineering, Technion

Project Goal Design and implement a general-purpose Publish-Subscribe server Push traditional implementations into global scale performance demands 1 million concurrent clients Millions of concurrent topics High transaction rate Demonstrate server abilities with a fun client application

What is Pub/Sub? topic://traffic-jams/ayalon subscribe publish accident in hashalom accident in hashalom

What Can We Do With It? Collaborative Web Browsing others:

What Can We Do With It? Instant Messaging Hi buddy! Hi buddy!

Seems Easy To Implement, But… “I’m behind a NAT, I can’t connect!” Not all client setups are server friendly “Server is too busy, try again later?!” 1 million concurrent clients is simply too much “The server is so slow!!!” Service time grows exponentially with load “A server crashed, everything is lost!” Single points of failure will eventually fail

Naïve Implementation (example 1) Simple UDP for client-server communication No need for sessions since we send messages Very low cost-per-client Sounds perfect? NAT

NAT Traversal UDP hole punching NAT will accept UDP reply for a short window Our measurements: seconds Keep UDP pinging from each client every 15s Days-long TCP sessions NAT remembers current sessions for replies If WWW works - we should work Increases dramatically cost-per-client Our research: all IM’s do exactly this

Naïve Implementation (example 2) Blocking I/O with one thread per client Basic model for most servers (JAVA default) Traditional UNIX – fork for every client Sounds perfect? 500 clients 500 clients 500 clients

Network I/O Internals Blocking I/O – one thread per client 2MB stack, 1GB virtual space enough for only 512 (!) Non-blocking I/O - select Linear fd searches are very slow Asynchronous I/O – completion ports Thread pool to handle request completion Our measurements: 30,000 concurrent clients! What is the bottleneck? Number of locked pages (zero-byte receives) TCP/IP kernel driver non-paged pool allocations

Scalability Scale up Buy a bigger box Scale out Buy more boxes Which one to do? Both! Push each box to its hardware maximum 1000’s of servers is impractical Add relevant boxes as load increases The Google way (cheap PC server farms)

Identify Our Load Factors Concurrent TCP clients Scale up: async-I/O, 0-byte-recv, larger NPP Scale out: dedicate boxes to handle clients => Connection Server (CS) High transaction throughput (topic load) Scale up: software optimizations Scale out: dedicate boxes to handle topics => Topic Server (TS) Design the cluster accordingly

Network Architecture

Client Load Balancing CLB CS1 CS2 CS3 TS1 TS2 request CS load balance: - user location - CS client load given CS2 loginsubscribepublish

Topic Load Balancing Static CS TS0 TS3 TS2 TS1 subscribe: traffic Room 0 subscribe: %4=1

Topic Load Balancing Dynamic TS1 CS Room 0 Room 1 Room 2 TS1 subscribe subscribe R0: 345K R1: ? R2: ? subscribe R0: 345K R1: 278K R2: ? subscribe R0: 345K R1: 278K R2: 301K subscribe R1: 278K handle subscribe

Performance Pitfalls Data Copies Single instance - reference counting (REF_BLOCK) Multi-buffer messages (MESSAGE: header, body, tail) Context Switches Flexible module exec foundation (MODULE) Processor num sized thread pools Memory Allocation MM: custom memory pools (POOL, POOL_BLOCK) fine-grained locking, pre-allocation, batching, single-size Lock Contention EVENT, MUTEX, RW_MUTEX, interlocked API

Class Diagram (Application)

Class Diagram (TS, CS)

Stress Testing Measure publish-notify turnaround time 1 ms resolution using MM timer, avg. of 30 Increasing client and/or topic load Several room topologies examined Results: Exponential-like climb TS increase: better times CS increase: better max clients time not improved