Download presentation
Presentation is loading. Please wait.
Published byAlvin Hudson Modified over 9 years ago
1
Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University, Canada Gilbert Hamann Parminder Flora Research In Motion (RIM), Canada
2
Software Maintenance: Customer Support
3
Retrieving Relevant Reports ■ State of Practice: –No systematic techniques to retrieve and use information for future engagements –Keyword searching is limited: depends on the search skills and experience of the analyst and peculiarity of the problem
4
Customer Support Problem Statement ■ We want to find customers with similar operational and problem profiles ■ We can reuse prior solutions and knowledge Heavy Email, Light Web, Light MDS Light Email, Light Web, Light MDS Light Email, Heavy Web, Light MDS Heavy Email, Heavy Web, No MDS Light Email, Light Web, Heavy MDS Other Customers Compare New Customer Engagement
5
Using Logs for Customer Support ■ Execution logs are readily available and contain –Operational Profile: usage patterns (heavy users of email from device, or to device, or light users of calendar, etc.) –Signature Profile: specific error line patterns (connection timeout, database limits, messages queued up, etc.) ■ Find the most similar profile
6
Execution Logs ■ Contain time-stamped sequence of events at runtime ■ Readily available representatives of both feature executions and problems Queuing new mail msgid=ABC threadid=XYZ Instant message. Sending packet to client msgid=ABC threadid=XYZ New meeting request msgid=ABC threadid=XYZ Client established IMAP session emailid=ABC threadid=XYZ Client disconnected. Cannot deliver msgid=ABC threadid=XYZ New contact in address book emailid=ABC threadid=XYZ User initiated appointment deletion emailid=ABC threadid=XYZ
7
Example Other Customers Compare
8
Our Technique
9
Log Lines to Event Distribution ■ Remove dynamic information –Example: Given the two log lines “Open inbox user=A” and “Open inbox user=B”, map both lines to the event “Open inbox user=?” ■ Use event percentages to compare event logs for different running lengths without bias
10
Compare Event Distributions ■ Kullback-Leibler Divergence ■ Cosine Distance
11
Identify Signature Events ■ Signature Events have a different frequency when compared to events in other log files –Example signature events: dropped connections, thread dumps, and full queues ■ Chi-square test identifies such events
12
Measuring Performance ■ Precision = 2/4 = 50% 100% precise if all the retrieved log files are relevant ■ Recall = 2/3 = 67% 100% recall if all the relevant log files are retrieved
13
The Big Picture
14
Case Studies ■ Case Study I –Dell DVD Store open source application –Code instrumentation done for event logging –Built the execution log repository by applying synthetic workloads, changing the workload parameters each time ■ Case Study II –Globally deployed commercial application –More than 500 unique execution events
15
Case Study Results ■ Dell DVD Store –100% precision and recall on both operational profile based and signature profile based retrieval ■ Commercial Application –100% precision and recall for signature profile based retrieval –Results for operational profile based retrieval: Experiment Count of Log Files K-L DistanceCosine Distance PrecisionRecallPrecisionRecall Single Feature Group2867.71%90.28%67.71%90.28% Multiple Feature Groups2860.71%80.95%75.00%100.00% All Feature Groups1272.92%97.22%62.50%83.33% Real World Log Files1254.17%72.22%68.75%91.67% All the Log Files8059.93%79.90%56.72%75.62%
16
Sources of Errors ■ Events that do not correspond directly to a particular operational feature, such as idle time events, server health check events, startup and shutdown events ■ Imbalance in the event logging
17
Imbalance in Event Logging
18
Related Work ■ Data mining techniques on textual information [Hui and Jha, 2000] –Cons: Limited results, depending on analyst’s search skills and peculiarity of the problem ■ Using customer usage data [Elbaum and Narla, 2004] –Cons: Customer usage data rarely exists ■ Clustering HTTP execution logs [Menascé, 1999] –Cons: Complex process, works only for HTTP logs ■ Software Agent Deployment to build operational profile [Ramanujam et. al., 2006] –Cons: Intrusive, complex, costly
19
Conclusion
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.