Download presentation
Presentation is loading. Please wait.
Published byJulie Stone Modified over 9 years ago
1
Hotspot Detection in a Service Oriented Architecture Pranay Anchuri, anchupa@cs.rpi.edu, http://cs.rpi.edu/~anchupaanchupa@cs.rpi.edu http://cs.rpi.edu/~anchupa Rensselaer Polytechnic Institute, Troy, NY Roshan Sumbaly, roshan@coursera.orgroshan@coursera.org Coursera, Mountain View, CA Sam Shah, samshah@linkedin.comsamshah@linkedin.com LinkedIn, Mountain View, CA
2
www.rpi.edu Introduction
3
www.rpi.edu Largest professional network. 300M members from 200 countries. 2 new members per second.
4
www.rpi.edu Largest professional network. 300M members from 200 countries. 2 new members per second.
5
www.rpi.edu Service Oriented Architecture
6
www.rpi.edu What is a Hotspot Hotspot : Service responsible for suboptimal performance of a user facing functionality.
7
www.rpi.edu What is a Hotspot Hotspot : Service responsible for suboptimal performance of a user facing functionality. Performance measures: Latency Cost to serve Error rate
8
www.rpi.edu Who uses hotspot detection ? Engineering teams : Minimize latency for the user. Increase the throughput of the servers. Operations teams : Reduce the cost of serving user requests.
9
www.rpi.edu Goal
10
www.rpi.edu Data - Service Call Graphs Service call metrics logged into a central system. Call graph structure re-constructed from random trace id.
11
www.rpi.edu Example of Service Call Graph Read profile Content Service Context Service Content Service EntitlementsVisibility 3 7 12 10 11
12
www.rpi.edu Example of Service Call Graph Read profile Content Service Context Service Content Service EntitlementsVisibility 3 7 12 10 11
13
www.rpi.edu Example of Service Call Graph Read profile Content Service Context Service Content Service EntitlementsVisibility 3 7 12 10 11
14
www.rpi.edu Example of Service Call Graph Read profile Content Service Context Service Content Service EntitlementsVisibility 3 7 12 10 11
15
www.rpi.edu
16
Challenges in mining hotspots
17
www.rpi.edu Structure of call graphs Structure of call graphs change rapidly across requests. Depends on member’s attributes. A/B testing. Changes to code base. Over 90% unique structures for most requested services.
18
www.rpi.edu Asynchronous service calls Calls A B, A C are Serial : C is called after B returns to A. Parallel : B and C are called at same time or in a brief time span. Parallel service calls are particularly difficult to handle. Degree of parallelism ~ 20 for some services.
19
www.rpi.edu Related Work Hu et. al [SIGCOMM 04, INFOCOMM 05] Tools to detect bottlenecks along network paths. Mann et. al [USENIX 11] Models to estimate latency as a function of RPC’s latencies.
20
www.rpi.edu Why existing methods don’t work ? Metric cannot be controlled as in bottleneck detection algorithms. Analyzing millions of small networks. Parallel service calls.
21
www.rpi.edu Our approach
22
www.rpi.edu ● Given call graphs Optimize and summarize approach
23
www.rpi.edu ● Given call graphs ● Hotspots in each call graph Optimize and summarize approach
24
www.rpi.edu ● Given call graphs ● Hotspots in each call graph ● Ranking hotspots Optimize and summarize approach
25
www.rpi.edu What are the top-k hotspots in a call graph ? Hotspots in a specific call graph irrespective of other call graphs for the same type of request.
26
www.rpi.edu Key Idea What are the k services, if already optimized, that would have lead to maximum reduction in the latency of request ? (Specific to a particular call graph)
27
www.rpi.edu Quantifying impact of a service What if a service was optimized by θ ? (think after the fact)
28
www.rpi.edu Quantifying impact of a service What if a service was optimized by θ ? (think after the fact) Its internal computations are θ times faster. No effect on the overall latency if its parent is waiting on other service call to return.
29
www.rpi.edu Example [0,11] [0,3] [1,2] [1.3, 1.6] [2.1, 2.5] [4,11] [6,9] [7,8]
30
www.rpi.edu Example [0,11] [0,3] [1,2] [1.3, 1.6] [2.1, 2.5] [4,11] [6,9] [7,8]
31
www.rpi.edu Example [0,11] [0,3] [1,2] [1.3, 1.6] [2.1, 2.5] [4,11] [6,9] [7,8] 2x faster
32
www.rpi.edu Example [0,11] [0,3] [1,2] [1.3, 1.6] [2.1, 2.5] [4,11] [6,9] [7,8] 2x faster
33
www.rpi.edu Example [0,11] [0,3] [1,2] [1.3, 1.6] [2.1, 2.5] [4,11] [6,9] [7,8] 2x faster Effect of 2x speedup
34
www.rpi.edu Local effect of optimization
35
www.rpi.edu Negative example [0,11] [0,3] [1,2] [1.3, 1.6] [2.1, 2.5] [4,11] [6,9] [7,8]
36
www.rpi.edu Negative example [0,11] [0,3] [1,2] [1.3, 1.6] [2.1, 2.5] [4,11] [6,9] [7,8]
37
www.rpi.edu Negative example [0,11] [0,3] [1,2] [1.3, 1.6] [2.1, 2.5] [4,11] [6,9] [7,8]
38
www.rpi.edu Effect propagation ABC Optimizing C Reduces run time of C C returns to B earlier. B might return earlier. A might return earlier…
39
www.rpi.edu Propagation Assumption A service propagates the effects to its parent only if doing so doesn’t change the order of service calls (by parent).
40
www.rpi.edu Example
41
www.rpi.edu Example After optimization
42
www.rpi.edu Example
43
www.rpi.edu Under the propagation assumption
44
www.rpi.edu Relaxation Variation of the propagation assumption that allows for a service to propagate fractional effects to its parent. Leads to a greedy algorithm.
45
www.rpi.edu Greedy algorithm to compute top-k hotspots Given an optimization factor θ, Repeatedly select a service that has maximum impact on frontend service. Update the times after each selection. Stop after k iterations.
46
www.rpi.edu Ranking hotspots
47
www.rpi.edu Rest of the paper Similar approach applied to cost of request metric. Generalized framework for optimizing arbitrary metrics. Other ranking schemes.
48
www.rpi.edu Results
49
www.rpi.edu Dataset Request type Avg # of call graphs per day* Avg # of service call per request Avg # of subcalls per service Max # of parallel subcalls Home10.2 M16.901.889.02 Mailbox3.33 M23.311.98.88 Profile3.14 M17.311.8611.04 Feed1.75 M16.291.878.97 * Scaled down by a constant factor
50
www.rpi.edu vs Baseline algorithm
51
www.rpi.edu User of the system
52
www.rpi.edu Impact of improvement factor
53
www.rpi.edu Consistency over a time period
54
www.rpi.edu Conclusion
55
www.rpi.edu Conclusions Defined hotspots in service oriented architectures. Framework to mine hotspots w.r.t various performance metrics. Experiments on real world large scale datasets.
56
www.rpi.edu Thanks Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.