An Approach to Measuring Large-Scale Distributed Systems Jun Li, Peter Reiher, Gerald Popek, and Mark Yarvis UCLA Geoffrey H. Kuenning Harvey Mudd College.

Slides:



Advertisements
Similar presentations
Simulation of Feedback Scheduling Dan Henriksson, Anton Cervin and Karl-Erik Årzén Department of Automatic Control.
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Distributed Databases and Its Twelve Objectives CS157B Name: Yingying Wu Professor: Sin-Min Lee Reference Book: An introduction to Database Systems By.
Scalable Content-Addressable Network Lintao Liu
DOT – Distributed OpenFlow Testbed
M-grid Using Ubiquitous Web Technologies to create a Computational Grid R J Walters and S Crouch 21 January 2009.
Why static is bad! Hadoop Pregel MPI Shared cluster Today: static partitioningWant dynamic sharing.
1 Fall 2005 Extending LANs Qutaibah Malluhi CSE Department Qatar University Repeaters, Hubs, Bridges, Fiber Modems, and Switches.
SAVE: Source Address Validity Enforcement Protocol Jun Li, Jelena Mirković, Mengqiu Wang, Peter Reiher and Lixia Zhang UCLA Computer Science Dept 10/04/2001.
1 Securing Information Transmission by Redundancy Jun LiPeter ReiherGerald Popek Computer Science Department UCLA NISS Conference October 21, 1999.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
Dissemination of Security Updates Jun Li Dissertation Proposal.
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
June 2007CRI workshop (Boston, MA) Testbeds Henning Schulzrinne Columbia University.
Empirical Analysis of Transmission Power Control Algorithms for Wireless Sensor Networks CENTS Retreat – May 26, 2005 Jaein Jeong (1), David Culler (1),
Naixue GSU Slide 1 ICVCI’09 Oct. 22, 2009 A Multi-Cloud Computing Scheme for Sharing Computing Resources to Satisfy Local Cloud User Requirements.
Cloud Computing Saneel Bidaye uni-slb2181. What is Cloud Computing? Cloud Computing refers to both the applications delivered as services over the Internet.
A Delaunay Triangulation Architecture Supporting Churn and User Mobility in MMVEs Mohsen Ghaffari, Behnoosh Hariri and Shervin Shirmohammadi Advanced Communications.
A brief overview about Distributed Systems Group A4 Chris Sun Bryan Maden Min Fang.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Internet Information Services 7.0 Infrastructure Planning and Design Series.
Overlay Network Physical LayerR : router Overlay Layer N R R R R R N.
Internetworking An internetwork is typically comprised of many physical networks over which data travels There are many different types of physical networks:
1 Computing Fundamantals With thanks to Laudon & Laudon Session 2.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
V. Tsaoussidis, DUTH – Greece
Heavy and lightweight dynamic network services: challenges and experiments for designing intelligent solutions in evolvable next generation networks Laurent.
Conversing in the Cloud Ryan Kupfer, Scott Wetter, Bryan Welfel, Shekhar Pradhan.
QoS research in a complicated world Christian Huitema Architect Windows Networking & Communications Microsoft Corporation.
Investigating the Performance of Audio/Video Service Architecture II: Broker Network Ahmet Uyar & Geoffrey Fox Tuesday, May 17th, 2005 The 2005 International.
Network Layer Support for Service Discovery in MANETs Ulas Kozat and Leandros Tassiulas University of Maryland, College Park Presented by Wei Gao.
Server Virtualization
DISTRIBUTED COMPUTING. Computing? Computing is usually defined as the activity of using and improving computer technology, computer hardware and software.
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
Networking Fundamentals. Basics Network – collection of nodes and links that cooperate for communication Nodes – computer systems –Internal (routers,
Lecture 2 Page 1 CS 111 Online System Services for OSes One major role of an operating system is providing services – To human users – To applications.
Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.
Bug Isolation via Remote Sampling. Lemonade from Lemons Bugs manifest themselves every where in deployed systems. Each manifestation gives us the chance.
Virtual Machines Created within the Virtualization layer, such as a hypervisor Shares the physical computer's CPU, hard disk, memory, and network interfaces.
1 Wide Area Network Emulation on the Millennium Bhaskaran Raman Yan Chen Weidong Cui Randy Katz {bhaskar, yanchen, wdc, Millennium.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 2 Parallel Hardware and Parallel Software An Introduction to Parallel Programming Peter Pacheco.
CS 147 Virtual Memory Prof. Sin Min Lee Anthony Palladino.
| nectar.org.au NECTAR TRAINING Module 4 From PC To Cloud or HPC.
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
A Reliability-oriented Transmission Service in Wireless Sensor Networks Yunhuai Liu, Yanmin Zhu and Lionel Ni Computer Science and Engineering Hong Kong.
LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium.
IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©
Background Computer System Architectures Computer System Software.
Dynamic Load Balancing Tree and Structured Computations.
Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.
Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.
Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.
Network Topologies for Scalable Multi-User Virtual Environments Lingrui Liang.
These slides are based on the book:
Andy Wang COP 5611 Advanced Operating Systems
Andy Wang COP 5611 Advanced Operating Systems
Revere—Disseminating Security Updates at Internet Scale
CMSC 611: Advanced Computer Architecture
Chapter 17: Database System Architectures
A Case for Mutual Notification
CSE 451: Operating Systems Spring 2005 Module 20 Distributed Systems
Computer communications
Fast Communication and User Level Parallelism
Andy Wang COP 5611 Advanced Operating Systems
CSE 451: Operating Systems Winter 2004 Module 19 Distributed Systems
Database System Architectures
Andy Wang COP 5611 Advanced Operating Systems
Client/Server Computing and Web Technologies
Presentation transcript:

An Approach to Measuring Large-Scale Distributed Systems Jun Li, Peter Reiher, Gerald Popek, and Mark Yarvis UCLA Geoffrey H. Kuenning Harvey Mudd College

2 How to Measure Internet-Scale Systems? ä Distributed systems have complex performance at large sizes ä Would like to measure & tune before deployment ä Biggest research testbeds are tiny relative to Internet ä Only Internet-scale testbed is Internet itself

3 Live Internet Measurement ä Difficult or impossible to get cooperation ä Difficult to control remote sites ä Extraneous noise in measurements

4 The Simulation Option ä Usually requires models of real software ä Expensive to develop ä Possible inaccuracy or bugs ä Must be validated against real system ä Simulation usually much slower than reality

5 Measuring Big Distributed Systems is Tough ä Only one really big testbed: the Internet ä Can’t get enough participants ä Too much noise for repeatable measurements ä Simulations don’t use the real software ä Hard to validate ä Small testbeds don’t reveal scaling problems

6 Testbed Overloading ä Use real software ä Run multiple instances on one machine ä Virtual topology to simulate connectivity

7 Characteristics of Overloading ä Allows greatly increased scale ä Works best when applications are lightweight ä Some (not all) measurements will differ

8 Effects of Overloading ä Some metrics unaffected ä Hop count ä Bytes transferred per (virtual) node ä Storage cost ä Other metrics must be adjusted due to resource competition ä CPU processing times ä Latencies

9 Eliminating Interference ä Locking to avoid contention ä Characterize slowdown ä Divide and conquer

10 Locking to Avoid Contention ä Use central coordinator ä One process at a time initiates operation x ä Measure latency, bytes transferred, messages exchanged ä No contention because of serialization ä Works well for operations that are one-at-a- time in real world (e.g., join multicast group) ä Total run time increases

11 Slowdown Analysis ä Measure time for one logical node on a physical node ä Measure time for n logical nodes ä Develop slowdown factor as function of n ä Apply to measured results

12 Divide and Conquer ä Divide task into components ä Must be independent ä No parallelism ä Contention only at component boundaries ä Measure components individually in isolation ä Measure occurrences in full system & sum ä Resource contention now omitted from total

13 Divide-and-Conquer Example ä Components of dissemination latency in Revere ä Local processing time ä Kernel-space crossing ä Transmission delay (per hop) ä Each component measured in isolation ä Sum multiplied by observed hop count

14 Dissemination Latency OS Revere Previous hop Next hop Java Local processing time (measured) Kernel-crossing time (measured) Per-hop transmission latency (parameter)

15 OS Java Revere User space Kernel space Java OS Revere Java Measurement Environment Delays - Sum known times - Multiply by hop count

16 Open Issues ä Measurement framework for arbitrary applications ä Scalability of locking approach

17 Conclusions ä Method for measuring much larger systems ä Used to measure Revere on 3000 virtual nodes ä Avoids drawbacks of other approaches

An Approach to Measuring Large-Scale Distributed Systems Jun Li, Peter Reiher, Gerald Popek, and Mark Yarvis UCLA Geoffrey H. Kuenning Harvey Mudd College

Black Slide