CS514: Intermediate Course in Operating Systems Professor Ken Birman Ben Atkin: TA Lecture 24: Nov. 16.

Slides:



Advertisements
Similar presentations
Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Multiple Processor Systems
Distributed Processing, Client/Server, and Clusters
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
A Dependable Auction System: Architecture and an Implementation Framework
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
Technical Architectures
Managing Data Resources
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.
Chapter 13 Embedded Systems
Distributed Systems 2006 Retrofitting Reliability* *With material adapted from Ken Birman.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Scalable Applications and Real Time Response Ashish Motivala CS 614 April 17 th 2001.
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
Copyright ©2009 Opher Etzion Event Processing Course Engineering and implementation considerations (related to chapter 10)
Computer Organization and Architecture
Lesson 1: Configuring Network Load Balancing
1 A Framework for Highly Available Services Based on Group Communication Alan Fekete Idit Keidar University of Sidney MIT.
Capacity planning for web sites. Promoting a web site Thoughts on increasing web site traffic but… Two possible scenarios…
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Slide 3-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 3 Operating System Organization.
World Wide Web Caching: Trends and Technology Greg Barish and Katia Obraczka USC Information Science Institute IEEE Communications Magazine, May 2000 Presented.
Operating Systems: Principles and Practice
MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.
Scalable Server Load Balancing Inside Data Centers Dana Butnariu Princeton University Computer Science Department July – September 2010 Joint work with.
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
OpenFlow Switch Limitations. Background: Current Applications Traffic Engineering application (performance) – Fine grained rules and short time scales.
Client/Server Architectures
Word Wide Cache Distributed Caching for the Distributed Enterprise.
Review of Memory Management, Virtual Memory CS448.
Software Engineering Chapter 23 Software Testing Ku-Yaw Chang Assistant Professor Department of Computer Science and Information.
Copyright © Clifford Neuman and Dongho Kim - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Advanced Operating Systems Lecture.
CH2 System models.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
Source: George Colouris, Jean Dollimore, Tim Kinderberg & Gordon Blair (2012). Distributed Systems: Concepts & Design (5 th Ed.). Essex: Addison-Wesley.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
EEL Software development for real-time engineering systems.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.
Chapter 5 McGraw-Hill/Irwin Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved.
Networking Fundamentals. Basics Network – collection of nodes and links that cooperate for communication Nodes – computer systems –Internal (routers,
 Load balancing is the process of distributing a workload evenly throughout a group or cluster of computers to maximize throughput.  This means that.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Chap 7: Consistency and Replication
COORDINATION, SYNCHRONIZATION AND LOCKING WITH ISIS2 Ken Birman 1 Cornell University.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Real-Time Systems, Events, Triggers. Real-Time Systems A system that has operational deadlines from event to system response A system whose correctness.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
Understanding Performance Testing Basics by Adnan Khan.
Advanced Operating Systems Introduction and Overview.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
COMPSCI 110 Operating Systems
The consensus problem in distributed systems
Alternative system models
Chapter 19: Architecture, Implementation, and Testing
CHAPTER 3 Architectures for Distributed Systems
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
Active replication for fault tolerance
Software System Testing
Basic organizations and memories in distributed computer systems
Lecture 21: Replication Control
Distributed Systems and Concurrency: Distributed Systems
Presentation transcript:

CS514: Intermediate Course in Operating Systems Professor Ken Birman Ben Atkin: TA Lecture 24: Nov. 16

Solving real problems with real-time protocols When does real-time matter? –Air traffic application: we want rapid response when events occur –Telecommunications application: switch requires real-time reactions to events that occur Two categories of real-time –We want an action to be predictably fast –We want an action to occur before a deadline passes

Predictability If this is our goal… –Any well-behaved mechanism may be adequate –But we should be careful about uncommon disruptive cases For example, cost of failure handling is often overlooked Risk is that an infrequent scenario will be very costly when it occurs

Predictability: Examples Probabilistic multicast protocol –Very predictable if our desired latencies are larger than the expected convergence –Much less so if we seek latencies that bring us close to the expected latency of the protocol itself Rule of thumb? –Real-time doesn’t mean “as fast as possible” – more often it means “slow and steady” !

Mixing issues Telephone networks need a mixture of properties –Real-time response –High performance –Stable behavior even when failures and recoveries occur Can we use our tools to solve such a problem?

Friedman’s SS7 experiment Used Horus to emulate a telephone switching system Idea is to control a telephone switch that handles 800 telephone numbers in software Horus runs the “800 number database” on a cluster of processors next to the switch

IN coprocessor example SS7 switch SS7 switch SS7 switch SS7 switch

IN coprocessor example SS7 switch SS7 switch SS7 switch SS7 switch coprocessor

Role of coprocessor A simple database Basically –Switch does a query How should I route a call to from ? Reply: use output line 6 –Time limit of 100ms on transaction Also runs a background protocol to update the database as things change, on a separate network…

Goals for coprocessor Right now, people use hardware fault-tolerant machines for this –E.g. Stratus “pair and a spare” –Mimics one computer but tolerates hardware failures –Performance an issue…

Goals for coprocessor What we want –Scalability: ability to use a cluster of machines for the same task, with better performance when we use more nodes –Fault-tolerance: a crash or recovery shouldn’t disrupt the system –Real-time response: must satisfy the 100ms limit at all times Desired: “7 to 9-nines availability” Downtime: any period when a series of requests might all be rejected

IN coprocessor example SS7 switch Query Element (QE) processors do the number lookup (in- memory database). Goals: scalable memory without loss of processing performance as number of nodes is increased Switch itself asks for help when remote number call is sensed External adaptor (EA) processors run the query protocol EA Primary backup scheme adapted (using small Horus process groups) to provide fault-tolerance with real-time guarantees

Options? A simple scheme: –Organize nodes as groups of 2 processes –Use virtual synchrony multicast For query For response Also for updates and membership tracking –A bit like our ATC example…

IN coprocessor example SS7 switch EA Step 1: Switch sees incoming request

IN coprocessor example SS7 switch EA Step 2: Switch waits while EA procs. multicast request to group of query elements (“partitioned” database)

IN coprocessor example SS7 switch Think EA Step 3: The query elements do the query in duplicate

IN coprocessor example SS7 switch EA Step 4: They reply to the group of EA processes

IN coprocessor example SS7 switch EA Step 6: EA processes reply to switch, which routes call

Results? Terrible performance! –Solution has 2 Horus multicasts on each critical path –Experience: about 600 queries per second but no more Also: slow to handle failures –Freezes for as long as 6 seconds Performance doesn’t improve much with scale either

Next try? Consider taking Horus off the critical path Idea is to continue using Horus –It manages groups –And we use it for updates to the database and for partitioning the QE set But no multicasts on critical path –Instead use a hand-coded scheme

Roy’s hand-coded scheme Queue up a set of requests from an EA to a QE Periodically, sweep the set into a message and send as a batch Process, also as a batch Send the batch of replies back to EA

Clever twists? Split into a primary and secondary EA for each request –Secondary steps in if no reply seen in 50ms –Batch size calculated so that 50ms should be “long enough” Hand optimized I/O and batching code…

Results? Able to sustain 22,000 emulated telephone calls per second Able to guarantee response within 100ms and no more than 3% of calls are dropped (randomly) Performance is not hurt by a single failure or recovery while switch is running Can put database in memory: memory size increases with number of nodes in cluster

Keys to success Horus is doing the hard work of configuration management –But configuration is only “read” by code on critical path –Horus is not really in the performance- critical section of code Also: need enough buffering space to keep running while a failure is sensed and reported

Coprocessors galore SS7 thinks of the scalable cluster as a coprocessor But coprocessor thinks of Horus as a sort of coprocessor –It sits off to one side –Reports membership changes –But “interface” is really just a shared memory segment

Same problem with Totem or CASD? Can’t use these technologies with 100ms timeout! The basic delivery latency already exceeds 100ms Could probably tune either protocol to this setup... but Friedman can probably double his performance too, by tuning Horus to the setup Conclusion is that real-time should be understood from needs of application, not a specific theory

Other settings with a strong temporal element Load balancing –Idea is to track load of a set of machines –Can do this at an access point or in the client –Then want to rebalance by issuing requests preferentially to less loaded servers

Load-balancing with an external adaptor EA

Load-balancing on client Load summary Picks a lightly- loaded machine

Load balancing in farms Akamai widely cited –They download the rarely-changing content from customer web sites –Distribute this to their own web farm –Then use a hacked DNS to redirect web accesses to a close-by, less-loaded machine Real-time aspects? –The data on which this is based needs to be fresh or we’ll send to the wrong server

Real-time in industry Very common in factory settings –At time t start the assembly line –Planning: from time t 0 to t 1 produce MIPS CPU chips on fab-unit 16… –If the pressure rises too quickly, reduce the temperature Often, we use real-time operating systems in support of such applications

Robotics, embedded systems Many emerging applications involve coordination of action by many components E.g. robots that cooperate to construct something Demand for real-time embedded systems technology will be widespread in industry Little is understood about networks in such settings… a big opportunity

Future directions in real- time Expect GPS time sources to be common within five years Real-time tools like periodic process groups will also be readily available (members take actions in a temporally coordinated way) Increasing focus on predictable high performance rather than provable worst-case performance Increasing use of probabilistic techniques

Future Directions David Tennenhouse (MIT, then DARPA ITO, then MCI): –Get real –Get small –Get moving!

Conclusions? Protocols like pbcast are potentially appealing in a subset of applications that are naturally probabilistic to begin with, and where we may have knowledge of expected load levels, etc. More traditional virtual synchrony protocols with strong consistency properties make more sense in standard networking settings Many ways to combine temporal+logical props.

Ending on a thought question Distributed systems depend on many layers of software, hardware, and many assumptions New wave of embedded systems will demand real-time solutions! Are such systems ultimately probabilistic, or ultimately deterministic? Do current reliable systems converge towards deterministic behavior or converge towards chaotic behaviors?