Example: Rumor Performance Evaluation Andy Wang CIS 5930 Computer Systems Performance Analysis.

Slides:



Advertisements
Similar presentations
The google file system Cs 595 Lecture 9.
Advertisements

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google Jaehyun Han 1.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Ceph: A Scalable, High-Performance Distributed File System
Module 20 Troubleshooting Common SQL Server 2008 R2 Administrative Issues.
Overview of Mobile Computing (3): File System. File System for Mobile Computing Issues for file system design in wireless and mobile environments Design.
G Robert Grimm New York University Disconnected Operation in the Coda File System.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Coda file system: Disconnected operation By Wallis Chau May 7, 2003.
Ivy: A Read/Write Peer-to- Peer File System A.Muthitacharoen, R. Morris, T. Gil, and B. Chen Presented by: Matthew Allen.
Hash History: A Method for Reconciling Mutual Inconsistency in Optimistic Replication Brent ByungHoon Kang, Robert Wilensky and John Kubiatowicz CS Division,
The Google File System.
Example: Obstacle Modeling for Wireless Transmissions Andy Wang CIS Computer Systems Performance Analysis.
Distributed File System: Data Storage for Networks Large and Small Pei Cao Cisco Systems, Inc.
Google File System.
Case Study - GFS.
Event Viewer Was of getting to event viewer Go to –Start –Control Panel, –Administrative Tools –Event Viewer Go to –Start.
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed File Systems Steve Ko Computer Sciences and Engineering University at Buffalo.
Software Faults and Fault Injection Models --Raviteja Varanasi.
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 8 – Denial of Service.
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
© 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions.
© 1998, Geoff Kuenning General 2 k Factorial Designs Used to explain the effects of k factors, each with two alternatives or levels 2 2 factorial designs.
CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review.
Other Regression Models Andy Wang CIS Computer Systems Performance Analysis.
Test Loads Andy Wang CIS Computer Systems Performance Analysis.
Data in the Cloud – I Parallel Databases The Google File System Parallel File Systems.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
Karrenberg et. Al.. RIPE 43, September 2002, Ρόδος. DISTEL Domain Name Server Testing Lab Daniel Karrenberg with Alexis Yushin, Ted.
University of Massachusetts, Amherst TFS: A Transparent File System for Contributory Storage James Cipar, Mark Corner, Emery Berger
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Ceph: A Scalable, High-Performance Distributed File System
CS425 / CSE424 / ECE428 — Distributed Systems — Fall 2011 Some material derived from slides by Prashant Shenoy (Umass) & courses.washington.edu/css434/students/Coda.ppt.
Speculative Execution in a Distributed File System Ed Nightingale Peter Chen Jason Flinn University of Michigan.
Distributed File Systems
Information/File Access and Sharing Coda: A Case Study J. Kistler, M. Satyanarayanan. Disconnected operation in the Coda File System. ACM Transaction on.
Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions.
Distributed FS, Continued Andy Wang COP 5611 Advanced Operating Systems.
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Tools and techniques for managing virtual machine images Andreas.
1/12 Distributed Transactional Memory for Clusters and Grids EuroTM, Paris, May 20th, 2011 Michael Schöttner.
Linear Regression Models Andy Wang CIS Computer Systems Performance Analysis.
THE EVOLUTION OF CODA M. Satyanarayanan Carnegie-Mellon University.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Test Loads Andy Wang CIS Computer Systems Performance Analysis.
Jonathan Walpole Computer Science Portland State University
CSE 451: Operating Systems
Nomadic File Systems Uri Moszkowicz 05/02/02.
Software Architecture in Practice
Google File System.
Example Replicated File Systems
Disconnected Operation in the Coda File System
Two-Factor Full Factorial Designs
Linear Regression Models
The Google File System (GFS)
Today: Coda, xFS Case Study: Coda File System
The Google File System (GFS)
CSE 451: Operating Systems Autumn Module 22 Distributed File Systems
The Google File System (GFS)
The Google File System (GFS)
Distributed FS, Continued
The Google File System (GFS)
THE GOOGLE FILE SYSTEM.
Performance And Scalability In Oracle9i And SQL Server 2000
One-Factor Experiments
File System Interface (cont)
The Google File System (GFS)
Presentation transcript:

Example: Rumor Performance Evaluation Andy Wang CIS 5930 Computer Systems Performance Analysis

Motivation Optimistic peer replication is popular –Intermittent connectivity –Availability of replicas for concurrent updates –Convergence and correctness for updates Example: Rumor, Coda, Ficus, Lotus Notes, Outlook Calendar, CVS 2

Background Replication provides high availability Optimistic replication allows immediate access to any replicated item, at the risk of permitting concurrent updates Reconciliation process makes replicas consistent (i.e., two replicas for peer-to- peer) 3

Background Continued Conflicts occur when different replicas of the same file are updated subsequent to the previous reconciliation 4

Optimistic Replication Example 5 Log on Desktop 10:00Update 10:25Update Log on Portable 10:00Update 10:25Update connected Log on Desktop 10:00Update 10:25Update 10:40Update Log on Portable 10:00Update 10:25Update 10:51 Update disconnected

Example Continued 6 Log on Desktop 10:00Update 10:25Update 10:40Update Log on Portable 10:00Update 10:25Update 10:51 Update disconnected Log on Desktop 10:00Update 10:25Update 10:40Update 10:51Update Log on Portable 10:00Update 10:25Update 10:40Update 10:51 Update connected Run reconciliation Detect a conflict Propagate updates

Goal Understand the cost characteristics of the reconciliation process for Rumor 7

Services Reconciliation –Exchange file system states –Detect new and conflicting versions If possible, automatically resolve conflicts Else, prompt user to resolve conflicts –Propagate updates 8

Outcomes Two reconciled replicas become consistent for all files and directories Some files remain inconsistent and require user to resolve conflicts 9

Metrics Time –Elapsed time From the beginning to the completion of a reconciliation request –User time (time spent using CPU) –System time (time spent in the kernel) Failure rate –Number of incomplete reconciliations and infinite loops (none observed) 10

Metrics not Measured Disk access time –Require complex instrumentations E.g., buffering, logging, etc. Network and memory resources –Not heavily used Correctness –Difficult to evaluate 11

Monitor Implementation 12 Spool-to-dump Recon ScannerRfindstoredRreconServer Perl library C ++ Reconciliation Process Top-level Perl time command

Parameters System parameters –CPU (speed of local and remote servers) –Disk (bandwidth, fragmentation level) –Network (type, bandwidth, reliability) –Memory (size, caching effects, speed) –Operating system (type, version, VM management, etc.) 13

Parameters (Continued) Workload parameters –Number of replicas –Number of files and directories –Number of conflicts and updates –Size of volumes (file size) 14

Workloads Update characteristics extracted from Geoff Kuenning’s traces 15 File access Read- only access Read-write access Nonshared accessShared access Read access Write access 2-way sharing3+way sharing Read access Write access Read access Write access

Experimental Settings Machine model: Dell Latitude XP CPU: x MHz RAM: 36MB Ethernet: 10Mb Operating system: Linux 2.0.x File system: ext3 16

Experimental Settings Should have documented the following as well –CPU: L1 and L2 cache sizes –RAM: Brand and type –Disk: brand, model, capacity, RPM, and the size of on-disk cache –File system version 17

Experimental Design full factorial design Linear regression or multivariate linear regression to model major factors Target: 95% confidence interval 18

2 5 5 Full Factorial Design Number of replicas: 2 and 6 Number of files: 10 and 1,000 File size: 100 and 22,000 bytes Number of directories: 10 and 100 Number of updates: 10 and 450 –Capped at 10 updates for 10 files Number of conflicts: 0 /* typical */ 19

2 5 5 Full Factorial Analysis Experiment errors < 3% 20

Variation of Effects All major effects significant at 95% confidence interval 21

Residuals vs. Predicted Time Clusters caused by dominating effects of files 22

Residuals vs. Experiment Numbers Residuals show homoscedasticity, almost 23

Quantile-Quantile Plot Residuals are normally distributed, almost 24

Multivariate Regression Number of replicas: 2 Number of files: 4 levels, File size: 22,000 bytes Number of directories: 4 levels, Number of updates: 0 Number of conflicts: 0 /* typical */ Number of repetitions: 5 per data point 25

Multivariate Regression Experiment errors < 7% All coefficients are significant 26

Residuals vs. Predicted Time Elapsed time shows a bi-model trend User time shows an exponential trend 27

Residuals vs. Experiment Numbers Not so good for elapsed time and user time 28

Quantile-Quantile Plot Residuals are not normally distributed for elapsed time and user time 29

Log Transform (User Time) ANOVA tests failed miserably 30

Residual Analyses (User Time) No indications that transforms can help… 31

Possible Explanations i-node related factors –Number of files per directory block –Crossing block boundary may cause anomalies Caching effects –Reboot needed across experiments 32

Linear Regression Number of files: 100, 150, 200, 250, 252, 253, 300, 350, 400, 450 –Test for the boundary-crossing condition as the number of files exceeds one block –Note that Rumor has hidden files Number of repetitions: 5 per data point Flush cache (reboot) before each run 33

Linear Regression R 2 > 80% All coefficients are significant 34

Residuals vs. Predicted Time Elapsed time shows a bi-model trend User time shows an exponential trend 35

Residuals vs. Experiment Numbers Elapsed time shows a rising bi-modal trend –Randomization of experiments may help 36

Quantile-Quantile Plot Error residuals for elapsed time is not normal –Perhaps piece-wise normal 37

Possible Explanations i-node related factors: No Caching effects: No Hidden factors: Maybe Bugs: Maybe 38

Conclusion Identified the number of files as the dominating factor for Rumor running time Observed the existence of an unknown factor in the Rumor performance model 39

40 White Slide