Example: Rumor Performance Evaluation Andy Wang CIS 5930 Computer Systems Performance Analysis
Motivation Optimistic peer replication is popular –Intermittent connectivity –Availability of replicas for concurrent updates –Convergence and correctness for updates Example: Rumor, Coda, Ficus, Lotus Notes, Outlook Calendar, CVS 2
Background Replication provides high availability Optimistic replication allows immediate access to any replicated item, at the risk of permitting concurrent updates Reconciliation process makes replicas consistent (i.e., two replicas for peer-to- peer) 3
Background Continued Conflicts occur when different replicas of the same file are updated subsequent to the previous reconciliation 4
Optimistic Replication Example 5 Log on Desktop 10:00Update 10:25Update Log on Portable 10:00Update 10:25Update connected Log on Desktop 10:00Update 10:25Update 10:40Update Log on Portable 10:00Update 10:25Update 10:51 Update disconnected
Example Continued 6 Log on Desktop 10:00Update 10:25Update 10:40Update Log on Portable 10:00Update 10:25Update 10:51 Update disconnected Log on Desktop 10:00Update 10:25Update 10:40Update 10:51Update Log on Portable 10:00Update 10:25Update 10:40Update 10:51 Update connected Run reconciliation Detect a conflict Propagate updates
Goal Understand the cost characteristics of the reconciliation process for Rumor 7
Services Reconciliation –Exchange file system states –Detect new and conflicting versions If possible, automatically resolve conflicts Else, prompt user to resolve conflicts –Propagate updates 8
Outcomes Two reconciled replicas become consistent for all files and directories Some files remain inconsistent and require user to resolve conflicts 9
Metrics Time –Elapsed time From the beginning to the completion of a reconciliation request –User time (time spent using CPU) –System time (time spent in the kernel) Failure rate –Number of incomplete reconciliations and infinite loops (none observed) 10
Metrics not Measured Disk access time –Require complex instrumentations E.g., buffering, logging, etc. Network and memory resources –Not heavily used Correctness –Difficult to evaluate 11
Monitor Implementation 12 Spool-to-dump Recon ScannerRfindstoredRreconServer Perl library C ++ Reconciliation Process Top-level Perl time command
Parameters System parameters –CPU (speed of local and remote servers) –Disk (bandwidth, fragmentation level) –Network (type, bandwidth, reliability) –Memory (size, caching effects, speed) –Operating system (type, version, VM management, etc.) 13
Parameters (Continued) Workload parameters –Number of replicas –Number of files and directories –Number of conflicts and updates –Size of volumes (file size) 14
Workloads Update characteristics extracted from Geoff Kuenning’s traces 15 File access Read- only access Read-write access Nonshared accessShared access Read access Write access 2-way sharing3+way sharing Read access Write access Read access Write access
Experimental Settings Machine model: Dell Latitude XP CPU: x MHz RAM: 36MB Ethernet: 10Mb Operating system: Linux 2.0.x File system: ext3 16
Experimental Settings Should have documented the following as well –CPU: L1 and L2 cache sizes –RAM: Brand and type –Disk: brand, model, capacity, RPM, and the size of on-disk cache –File system version 17
Experimental Design full factorial design Linear regression or multivariate linear regression to model major factors Target: 95% confidence interval 18
2 5 5 Full Factorial Design Number of replicas: 2 and 6 Number of files: 10 and 1,000 File size: 100 and 22,000 bytes Number of directories: 10 and 100 Number of updates: 10 and 450 –Capped at 10 updates for 10 files Number of conflicts: 0 /* typical */ 19
2 5 5 Full Factorial Analysis Experiment errors < 3% 20
Variation of Effects All major effects significant at 95% confidence interval 21
Residuals vs. Predicted Time Clusters caused by dominating effects of files 22
Residuals vs. Experiment Numbers Residuals show homoscedasticity, almost 23
Quantile-Quantile Plot Residuals are normally distributed, almost 24
Multivariate Regression Number of replicas: 2 Number of files: 4 levels, File size: 22,000 bytes Number of directories: 4 levels, Number of updates: 0 Number of conflicts: 0 /* typical */ Number of repetitions: 5 per data point 25
Multivariate Regression Experiment errors < 7% All coefficients are significant 26
Residuals vs. Predicted Time Elapsed time shows a bi-model trend User time shows an exponential trend 27
Residuals vs. Experiment Numbers Not so good for elapsed time and user time 28
Quantile-Quantile Plot Residuals are not normally distributed for elapsed time and user time 29
Log Transform (User Time) ANOVA tests failed miserably 30
Residual Analyses (User Time) No indications that transforms can help… 31
Possible Explanations i-node related factors –Number of files per directory block –Crossing block boundary may cause anomalies Caching effects –Reboot needed across experiments 32
Linear Regression Number of files: 100, 150, 200, 250, 252, 253, 300, 350, 400, 450 –Test for the boundary-crossing condition as the number of files exceeds one block –Note that Rumor has hidden files Number of repetitions: 5 per data point Flush cache (reboot) before each run 33
Linear Regression R 2 > 80% All coefficients are significant 34
Residuals vs. Predicted Time Elapsed time shows a bi-model trend User time shows an exponential trend 35
Residuals vs. Experiment Numbers Elapsed time shows a rising bi-modal trend –Randomization of experiments may help 36
Quantile-Quantile Plot Error residuals for elapsed time is not normal –Perhaps piece-wise normal 37
Possible Explanations i-node related factors: No Caching effects: No Hidden factors: Maybe Bugs: Maybe 38
Conclusion Identified the number of files as the dominating factor for Rumor running time Observed the existence of an unknown factor in the Rumor performance model 39
40 White Slide