Copyright© 2003 Avaya Inc. All rights reserved Avaya Interactive Dashboard (AID): An Interactive Tool for Mining Avaya Problem Ticket Database Ziyang Wang Department of Computer Science New York University USA Amit Bagga Ask Jeeves, Inc. USA
2 Copyright© 2003 Avaya Inc. All rights reserved Outline Motivations and goals Application overview Functionalities Architecture Algorithms Implementation features Future directions
3 Copyright© 2003 Avaya Inc. All rights reserved Avaya Problem Database (Maestro) Approximately 5 million records –Appx. 4 million alarms (reported via self-diagnostic tool) –Appx. 1 million tickets (reported by customer via a phone call) Ticket records –Structured data fields Customer name, location, product, date of problem, etc. –Unstructured fields (each limited to length of 256 bytes) Problem description One or more notes fields Resolution description
4 Copyright© 2003 Avaya Inc. All rights reserved Motivations and Goals Motivations –Structured fields mined by traditional data mining algorithms –NSM and service engineers manually scanned text fields First restricted size with a database query (based upon structured fields) Manually scanned resulting set of records to identify, track and classify problems across customers, locations and products. Goals –Develop interface that helps automate text analysis done by NSM and service engineers. –Provide advanced functionality to help them quickly and conveniently discover patterns and verify intuitions about problems.
5 Copyright© 2003 Avaya Inc. All rights reserved Application Overview Interactive Dashboard –A tool using search and data mining techniques –Find similar problems –Identify sub problems –Trace similar problems by customer and product
6 Copyright© 2003 Avaya Inc. All rights reserved Overview: algorithms and implementation Interactive Dashboard –Programming languages: Java, Perl, C –Service model: sockets, client/server model –Database management: Oracle, JDBC –Relevance metric: TF*IDF –Clustering: hierarchical clustering –Web interface: Perl, CGI
7 Copyright© 2003 Avaya Inc. All rights reserved Functionalities: Major ones Search relevant tickets –Help to find similar problems –Relevance score: the similarity of unstructured text data. –Search constrains: product name, customer code, time and severity of tickets. –Top level summary: ticket case ID, relevance score, ticket description. Cluster relevant tickets –Group similar tickets into clusters –Helps identify sub problems –Keyword expansion –Adaptive online search
8 Copyright© 2003 Avaya Inc. All rights reserved Functionalities: Supporting ones Categorize a set of tickets –Categorized by product name, customer name, and location name –Provide a high level summary –Discover similar problems by customers, products Retrieve detailed ticket information –Complete product/customer/location information, ticket resolution note, etc.
9 Copyright© 2003 Avaya Inc. All rights reserved Functionalities (cont.) Accessibility Web portal Relevant Tickets Categorized Set Clustered Relevant Tickets Ticket Information
10 Copyright© 2003 Avaya Inc. All rights reserved Interactive Dashboard Architecture: Main Frame Main frame: application server infrastructure –3-tier server architecture –Integrated central server: service provider and server logic organizer Database Web Interface Integrated Central Server CGI JDBC
11 Copyright© 2003 Avaya Inc. All rights reserved Architecture: Integrated Central Server Integrated Central Server Server Socket Module Query Engine Database module Text Analysis Module Response Module Incoming requests Output results Database
12 Copyright© 2003 Avaya Inc. All rights reserved Architecture: Text Analysis Module Text Analysis Module Database module Database Stop words Text Filter Data module Functional module Clustering Dictionary Relevance Evaluator Keywords/Sample Unstructured data TFIDF Module Top Relevant Tickets Response Module Output Manager Document Frequency Categorizing
13 Copyright© 2003 Avaya Inc. All rights reserved Algorithm: TFIDF TFIDF: a similarity metric for text data –Text document view: a bag of words. –Document representation: a vector. –The similarity of two documents is the normalized inner product of two vectors (the cosine of two vector).
14 Copyright© 2003 Avaya Inc. All rights reserved Algorithm: TFIDF (cont.) Issues –Document frequency Global vs. local Vocabulary: 100,708 terms after text filtering Solution: offline scan of database –Term frequency Online scan of ticket description Text filtering –Computing the similarity of ticket description Searching relevant tickets: 1-to-N similarity Clustering: N-to-N similarity
15 Copyright© 2003 Avaya Inc. All rights reserved Algorithm: Hierarchical Clustering (cont.) Hierarchical clustering –Similarity metric of data vector: TFIDF, Euclidean –Hierarchical clustering Step-by-step bottom-up cluster merging Merging criteria: complete linkage Cost: N-square performance
16 Copyright© 2003 Avaya Inc. All rights reserved Implementation Features Integrated server is built like a web-server where backend is a database –Multi-threaded-model –Stateless High level SQL query processor –Maps multiple requests to single database connection Loading database driver and authentication are done only once. Reducing the slow start of database connection. Using multiple JDBC SQL statements over one database connection can schedule data transmission looks like parallel retrieval.
17 Copyright© 2003 Avaya Inc. All rights reserved Example PLAT Csr cld to report trouble on Paging System, TOOS. No overhead music, no MOH. DPO tech SEV 4 dispatch to diagnose. PLAT Csr cld to report trouble on Paging System, TOOS. No overhead music, no MOH. DPO tech SEV 4 dispatch to diagnose. PLAT Csr cld to report paging system TOOS, no overhead music at all. System has been reset by csr (power confirmed). DPO tech SEV 4 to diagnose. PLAT Csr cld to report trouble with overhead music, TOOS. Paging appears OK, but they cannot get music output. DPO tech SEV 4 dispatch to check volume levels. …… paging plat dpo tech music csr overhead cust check report tech dpo plat csr access assist speakers overhead diagnose x15255 paging plat dpo tech sev power csr diagnose overhead carrier ……
18 Copyright© 2003 Avaya Inc. All rights reserved Example PLAT Csr cld to report paging system TOOS, no overhead music at all. System has been reset by csr (power confirmed). DPO tech SEV 4 to diagnose. …… paging plat dpo tech music csr overhead cust check report Ticket #Customer ABC Corp ABC Corp ABC Corp.
19 Copyright© 2003 Avaya Inc. All rights reserved Evaluation Hard since there are 1 million tickets in database Based upon detailed feedback from 2 of the users: –Significantly improves productivity –Additional features identified that are in process of being implemented Example: automatic identification of root cause, prediction of resolution code based upon prior cases of similar problems
20 Copyright© 2003 Avaya Inc. All rights reserved Future directions Search precision –Refine algorithms of relevance computation –Refine algorithms of clustering –Text filtering Search performance –Database organization –Java primitive functions Automatic classification of root cause of problems –Machine learning approach Prediction of resolution code Scalability
21 Copyright© 2003 Avaya Inc. All rights reserved Implementation: features Abstract database SQL manager for parallel requests –Mapping parallel requests to single database connection: Loading database driver and authentication are done only once. Reducing the slow start of database connection. Using multiple JDBC SQL statements over one database connection can schedule data transmission looks like parallel retrieval. –Stateful abstract database connection manager Unified error message processor –Exception catching and re-throwning –Goodness Format error message as HTML text Secure database connection status to be consistent
22 Copyright© 2003 Avaya Inc. All rights reserved Implementation: features (cont.) Multiple system-dependent process interaction through java runtime –Kernel clustering modular is written in C High performance for numerical computation Unix/Linux OS required –Communication of processes –I/O redirection Extensibility –Search space –Localized index engine