Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dependability where the Mobile World Meets the Enterprise World a

Similar presentations


Presentation on theme: "Dependability where the Mobile World Meets the Enterprise World a"— Presentation transcript:

1 Dependability where the Mobile World Meets the Enterprise World a
Amiya Kumar Maji Advisor: Prof. Saurabh Bagchi Feb 27, 2015 School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana

2 Introduction Large Scale Internet Mobility
End-to-end services need dependability of both components

3 Summary of Contributions
Dependability of Smartphones Study of failures in Android and Symbian. Analyze location of failure manifestation, bug fixes, customizability related failures. ISSRE2010 Evaluation of robustness of Android ICC. Designed and implemented our testing tool JarJarBinks, evaluated and analyzed crashes, suggestions for improving robustness. DSN2012 Dependability of Cloud Applications Evaluated impact of performance interference in public (Amazon EC2) and private clouds. Mitigate performance interference by intelligent application reconfiguration. MW2014 Mitigate interference by two-level reconfiguration of web server clusters. Improves the previous work by making the controller more agile and effective. ICAC2015 (submitted)

4 Publications A. K. Maji, K. Hao, S. Sultana, S. Bagchi. “Characterizing Failures in Mobile OSes: A Case Study with Android and Symbian,” in 21st International Symposium on Software Reliability Engineering, ISSRE 2010, November 1-4, 2010, San Jose, California. A. K. Maji, F. A. Arshad, S. Bagchi, J. S. Rellermeyer. “An Empirical Study of the Robustness of Inter-component Communication in Android,” in 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012, June 25-28, 2012, Boston, MA. A. K. Maji, S. Mitra, B. Zhou, S. Bagchi, A. Verma. “Mitigating Interference in Cloud Services by Middleware Reconfiguration,” in 15th International Middleware Conference, MIDDLEWARE 2014, December 8-12, 2014, Bordeaux, France. A. K. Maji, S. Mitra, S. Bagchi. “ICE: An Integrated Configuration Engine for Interference Mitigation in Cloud Services,” in 12th International Conference on Autonomic Computing, ICAC 2015, July 7-10, 2015, Grenoble, France. (Under review) Provisional application for patent

5 Prelim Review: Dependability of Smartphones
Agenda Introduction Contributions Prelim Review: Dependability of Smartphones Study of failures in Android and Symbian Robustness testing of Android ICC Dependability of Cloud Applications IC2: Mitigating interference by middleware reconfiguration ICE: Two-level configuration engine for WS clusters Directions for Future Research Summary

6 Part I (Prelim Review) Dependability of Smartphones
Before you transition to the second part, you should give a list of the big messages (about 3-5) from the first part.

7 Study of Failures in Android and Symbian†
Analyzed 628 bugs in Android and 153 bugs in Symbian (Oct 2008-Nov 2009) Most bugs (> 90%) are permanent in nature. Majority bugs in Android middleware, fewer bugs in Kernel layer. Both platforms had significant number of bugs in Dev Tools, Web, Multimedia, and Build segments. Analyzed 233 bug fixes in Android Presented categorization of bug fixes. Only 22% of fixes required major code changes (> 10 lines). Customizability in Android has its cost (more bugs). Bug density Android (< 2.5*10-4) lower than that reported for Windows (2.66*10-3) [Alhazmi 2007] Question from Preliminary Exam: Q. How does Android compare against Linux in terms of bug density? A. Linux has a bug density of 1*10-4 /LOC in Kernel version [Palix et al., “Faults in Linux: Ten Years Later,” ASPLOS 2011] † Collaborators: Kangli Hao, Salmin Sultana

8 Robustness Evaluation of Android ICC †
Presented JarJarBinks, a tool for evaluating ICC robustness in Android. JJB tests Intent handling capabilities of Android components by sending large number of Semi-valid Intents (Explicit or Implicit). More than 6+ million Intents were sent to 800+ Android components over a week We found ~10% Activities crashed with Semi-valid Intents All crashes manifest as Exceptions in the runtime system. NPE most prevalent in both Android 2.2 and 4.0. Exception handling improved from 2.2 to 4.0 but still is a major concern. Similar results with Implicit Intents. Components often crashed with valid Intents (Since extra data not captured in Intent-filter definition). † Collaborators: Fahad Arshad, Jan S. Rellermeyer Purdue IBM Research, Austin

9 System Crash from User-level Application
3 Activities crashed Android-Runtime

10 Recommendation for Improving ICC Robustness
A. Intent Sub-typing B. Checking input constraints Static (Java Annotations) Dynamic (Runtime) C. Full input-validation Use domain specific languages (e.g. WSDL) Make Intent/Intent-filter descriptions more expressive Class CallIntent extends Intent{ String action="ACTION_DIAL"; telUri data; ComponentName cmp; getAction(){ … }; setData(){ … }; getData(){ … }; ……….. } - Intents are effectively untyped; their application-level type is only determined by a String identifier but is not reflected by the Java type system. No contract between receiver and sender requires the receiver to handle any input-validation One way to handle this is more explicit intent format (intent sub-typing) as oppose to current flat-type Data can be represented by fields of sub-type with getters and setters thus allowing the Java compiler to do type checking at compile time. Considering a handset where we have 200 Intent types, this implies a 80KB additional footprint for turning all these Intents into Subtypes with 6 fields. Static: Message format constraints when choosing sub-typing can be realized by using Java Annotations for compile-time checks. Dynamic: can be implemented at Intent Delivery Mechanism or at the Receivers

11 Dependability of Smartphones
Agenda Introduction Contributions Dependability of Smartphones Study of failures in Android and Symbian Robustness testing of Android ICC Dependability of Cloud Applications IC2: Mitigating interference by middleware reconfiguration ICE: Two-level configuration engine for WS clusters Directions for Future Research Summary

12 Part II Dependability of Cloud Applications†
Before you transition to the second part, you should give a list of the big messages (about 3-5) from the first part. † Collaborator: Subrata Mitra, Bowen Zhou, Akshat Verma Purdue IBM Research, Delhi

13 Running Web Applications in the Cloud
Host1 Hypervisor VM1 WS1 VM2 WS2 VMn DB2 ….. Host2 App1 DB1 VMm Appm Storage Network

14 Imperfect Performance Isolation due to Shared Hardware Resources
Other shared resources Memory bandwidth Network/IO Translation Lookaside Buffer (TLB) P1 L1 P2 L1 Processor Cache L2 Cache (last level) Multi-core Cache Sharing

15 Mitigating Performance Interference in Clouds
Performance of one VM suffering due to activity of another co-located VM Why it happens? Low level hardware resources are not partitioned well Contention for Cache, Mem bandwidth, Network can degrade performance Our experiments with Amazon EC2 Performance of web servers can suffer drastically during interference Cloudsuite Application benchmark m1.large VM instances (2 cores, 7.5GB) Run for 100 hours Tail ~ 55 X median Tail ~ 4 X median EC2 Private Cloud

16 Remediation Techniques
Traditional techniques for remediation Better VM placement [Paragon ASPLOS2013] Hypervisor scheduling [QCloud Eurosys 2010] Dynamic live migration [Deepdive ATC2013] Our approach Requirements Need user level control Fast response during interference Key idea: Reconfigure application to handle change in operating environment (interference) IC2: Interference-aware Cloud application Configuration Require changes in hypervisor. Not feasible in public cloud

17 Solution Overview

18 IC2: Agenda Performance Interference in Cloud Our approach Solution Overview Interference vs. Middleware Parameters Interference Detection Configuration Controller IC2 in Operation Key Results

19 Interference vs. Middleware Parameters
Setup Servers are Poweredge T320 servers, Xeon E processor 6(12) cores, 16GB Memory Application: Cloudsuite (Olio, Social media calendar) Middlewares: Apache + Php-fpm Server 1 KVM Web Server Interference Server 2 Database Server 3 Clients

20 Interference vs. Middleware Parameters
Setup Middleware Parameters Thread-pool parameters Apache: MaxClients Php-fpm: pm.max_children (PhpMaxChildren) Timeout parameter Apache: KeepaliveTimeout Interference: Dcopy from BLAS (cache r+w) LLCProbe from Ristenpart CCS’12 (cache r) Varying sizes of Dcopy to create different levels of contention

21 Choice of Optimal Apache Parameters
Optimal MXC changes with interference Optimal KAT changes with interference Depends on degree of interference Need dynamic reconfiguration

22 Parameter Dependency Parameter dependency changes with interference
KAT = MXC / #new_connections/sec no longer valid during interference With interference, need smaller MXC larger KAT SB: You should give some example of what you mean by bullet 2.

23 Optimal configuration values with interference
Observations Optimal configuration values with interference Optimal MXC decreases, KAT, PHP increases Server capacity with interference CPU saturates sooner with interference IdleCPU with different interferences (MXC=1100) Lots of cache misses. CPI increases. No-Intf Dcopy-15MB Dcopy-1.5GB 17% 7% 1%

24 Agenda: IC2 Performance Interference in Cloud Our approach Solution Overview Interference vs. Middleware Parameters Interference Detection Configuration Controller IC2 in Operation Key Results

25 Solution Overview Questions that we answer How to detect interference?
Which parameters to reconfigure during interference? How to determine new parameter values?

26 Interference Detection
IC2 workflow Interference Detection Interference Detection Use Decision Tree classifier In EC2, use system and appln. metrics to detect interference Load per operation (LPO) is a key indicator Challenge: Capture metrics variations with configuration changes More details on Decision Tree in paper

27 State Manager In EC2, use buffer states to deal with transient interference/noisy data Reconfigure only after two successive periods under interference Also masks classifier errors

28 Configuration Controller
Choice of parameter driven by knowledge base Created from empirical results shown earlier Can be created by expert administrators Our heuristic Decrease MXC based on proportional increase in LPO Increase KAT based on proportional increase in response time. For PHP use two constant values (no-interference, interference) Implementation Modified Apache to handle graceful parameter update Called httpd-online: A well-designed domain specific language can express any type of constraint and therefore permit full input validation including version checks. External DSLs are free-standing and independent of the host language. IDLs, for instance, are external DSLs. As a result, however, code written in the host language and the meta-data written in the DSL have to be developed independently and cannot easily be cross-validated by existing tools. Internal or embedded DSLs are themselves implemented in the host language and therefore agree much better with existing tools. They are, however, restricted to what the host language can express.

29 Agenda Performance Interference in Cloud Our approach Interference vs. Middleware Parameters Solution Overview Interference Detection Configuration Controller IC2 in Operation Key Results Conclusion

30 IC2 in Operation Setup Metrics to consider EC2 m1.large VMs
Web server co-located with interference VM Periodic interference of varying intensity and type (LLCProbe, Dcopy) Private testbed VMs configured to match EC2 specifications Metrics to consider Improvement in response time during interference Detection latency Detection accuracy

31 IC2 Improves Response Time
Httpd-online reduces overhead New values Effects of interference lasts longer in EC2 Default Apache distribution has high overhead of reconfiguration Httpd-online solves this

32 Median interference detection latency
Results IC2 improved response time by upto 40% in private testbed and upto 29% in EC2 during interference Median interference detection latency 15 sec in private testbed 20 sec in EC2 testbed Classifier accuracy Interference detection showed 89% recall and 73% precision Majority misclassifications due to Interference, No-interference detected as Transient Our labeling does not account for ambient interference

33 Summary: IC2 Interference causes severe performance degradation in cloud Optimal application configurations change during interference Web services can mitigate effects of interference by reconfiguration We presented the design and implementation of IC2 which reconfigures web servers during interference Our evaluations showed 40% reduction of response time in Private testbed and 29% reduction in EC2.

34 Review: Dependability of Smartphones
Agenda Introduction Contributions Review: Dependability of Smartphones Study of failures in Android and Symbian Robustness testing of Android ICC Dependability of Cloud Applications IC2: Mitigating interference by middleware reconfiguration ICE: Two-level configuration engine for WS clusters Directions for Future Research Summary

35 ICE: An Integrated Configuration Engine for Interference Mitigation
Motivation IC2 improves response time by configuring WS parameters WS reconfiguration is costly and limited Use residual capacity in a WS cluster efficiently Objectives Make reconfiguration (interference mitigation) faster Make existing load-balancers interference-aware Get better response time during interference (than IC2) We use HAProxy as our baseline load-balancer

36 Two-level reconfiguration
ICE Overview Two-level reconfiguration 1. Update load balancer weight Less overhead. More agile. 2. Update Middleware parameters Only for long interferences. Reduces overhead of idle threads.

37 ICE Design We use hardware counters for interference detection
Faster detection Hypervisor access not required if counters are virtualized

38 ICE: Load Balancer Reconfiguration
Objective: Keep WS VM’s CPU utilization below a threshold Uthres If predicted CPU above threshold, find a new request rate such that it goes below threshold Request rate (RPS) determines server weight value in load balancer configuration Use the following empirical function for load estimation Predicted Util Past Util CPI RPS Indicator of Interference

39 Evaluation Experimental Setup
Cloudsuite benchmark with different interferences We look at ICE with two different load balancer scheduling policies Weighted Round Robin (WRR or simply RR) WRR shows performance of a static configuration. Weighted Least Connection (WLC or simply LC) WLC shows performance of an out-of-box dynamic load balancer

40 Response Time Round Robin (RR) Least Connection (LC)
400ms 200ms Round Robin (RR) Least Connection (LC) ICE improves response time both in RR and LC LC (out-of-box) reduces effect of interference significantly, but occasional spikes remain ICE reduces frequency of these spikes

41 Median interference detection latency
Results ICE improves median response time by upto 94% compared to a static configuration (RR) ICE improves median response time by upto 39% compared to a dynamic load balancer (LC) Median interference detection latency 3 sec using ICE (15-20 sec for IC2)

42 ICE: Summary Effect of interference can be mitigated by reducing load on the affected VM We presented ICE for two-level configuration in WS clusters ICE improves median RT by 94% compared to static configuration and 39% compared to a dynamic out-of-box load balancer Median interference detection latency 3s

43 Review: Dependability of Smartphones
Agenda Introduction Contributions Review: Dependability of Smartphones Study of failures in Android and Symbian Robustness testing of Android ICC Dependability of Cloud Applications IC2: Mitigating interference by middleware reconfiguration ICE: Two-level configuration engine for WS clusters Directions for Future Research Summary

44 Directions for Future Research
Reliability with software evolution in Android Enhance JJB by instrumenting ActivityManager IC2: Automated generation of KB How to find which parameters to reconfigure in unknown applications? ICE: Handling long-lasting sessions. Move some sessions to other servers during interference.

45 Summary of Contributions
Presented failure characterization of Android and Symbian Robustness testing of Android ICC Designed and implemented JarJarBinks Analysis of crashes Suggestions for robust ICC Mitigating interference in clouds Presented two solutions for handling interference without hypervisor modification IC2: mitigates interference by middleware reconfiguration ICE: mitigates interference by load-balancer and WS reconfiguration

46 Publications A. K. Maji, K. Hao, S. Sultana, S. Bagchi. “Characterizing Failures in Mobile OSes: A Case Study with Android and Symbian,” in 21st International Symposium on Software Reliability Engineering, ISSRE 2010, November 1-4, 2010, San Jose, California. [*49] A. K. Maji, F. A. Arshad, S. Bagchi, J. S. Rellermeyer. “An Empirical Study of the Robustness of Inter-component Communication in Android,” in 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012, June 25-28, 2012, Boston, MA. [*23] A. K. Maji, S. Mitra, B. Zhou, S. Bagchi, A. Verma. “Mitigating Interference in Cloud Services by Middleware Reconfiguration,” in 15th International Middleware Conference, MIDDLEWARE 2014, December 8-12, 2014, Bordeaux, France. A. K. Maji, S. Mitra, S. Bagchi. “ICE: An Integrated Configuration Engine for Interference Mitigation in Cloud Services,” in 12th International Conference on Autonomic Computing, ICAC 2015, July 7-10, 2015, Grenoble, France. (Under review) [*] is Google Scholar Citations

47 Questions

48 Acknowledgements Prof. Saurabh Bagchi Committee members Collaborators:
Akshat Verma (IBM Research, MakeMyTrip) Jan S. Rellermeyer (IBM Research) Subrata Mitra (Purdue University) Fahad Arshad (Purdue University) Bowen Zhou (Purdue University) Kangli Hao (Purdue University, Samsung) Salmin Sultana (Purdue University, Intel Research)

49 Thank You!

50 Backup Slides


Download ppt "Dependability where the Mobile World Meets the Enterprise World a"

Similar presentations


Ads by Google