Improving DevOps and QA efficiency using machine learning and NLP methods Omer Sagi May 2018.

Slides:



Advertisements
Similar presentations
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 7: Scoring and results assembly.
Advertisements

Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
ARNOLD SMEULDERS MARCEL WORRING SIMONE SANTINI AMARNATH GUPTA RAMESH JAIN PRESENTERS FATIH CAKIR MELIHCAN TURK Content-Based Image Retrieval at the End.
Personas and Scenarios - Reshmi. Personas  Fictional characters created to represent the different user types that might use a site, brand, or product.
Search Engines and Information Retrieval
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Information Retrieval Review
Ch 4: Information Retrieval and Text Mining
Hinrich Schütze and Christina Lioma
Evaluating the Performance of IR Sytems
Advance Information Retrieval Topics Hassan Bashiri.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Information retrieval: overview. Information Retrieval and Text Processing Huge literature dating back to the 1950’s! SIGIR/TREC - home for much of this.
Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Introduction to Machine Learning Approach Lecture 5.
Chapter 5: Information Retrieval and Web Search
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Search Engines and Information Retrieval Chapter 1.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
NoteSearch - Find what you’re looking for. Prototype Team B.
Learningcomputer.com SQL Server 2008 – Profiling and Monitoring Tools.
Chapter 6: Information Retrieval and Web Search
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
IT Just Works ©2008 BigFix, Inc. Practical Guide to Relevance Ben Kus – 1/31/2008.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.
NetSearch: Googling Large-scale Network Management Data GROUP 2 MEMBERS SAMUEL LAWER WENBO HAN HUAN YAN PEI YAN SHREY YADAV SHUAI YU SHINE PANDITA.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Language Model in Turkish IR Melih Kandemir F. Melih Özbekoğlu Can Şardan Ömer S. Uğurlu.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
User Interfaces and Information Retrieval Dina Reitmeyer WIRED (i385d)
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
IR Homework #2 By J. H. Wang Apr. 13, Programming Exercise #2: Query Processing and Searching Goal: to search for relevant documents Input: a query.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Information Retrieval in Practice
Doron Orbach UCMDB Product Manager
Multi Node Label Routing – A layer 2.5 routing protocol
Using Social Media to Enhance Emergency Situation Awareness
Digital Video Library - Jacky Ma.
Information Organization: Overview
COMP791A: Statistical Language Processing
CIS 524 Possible Is Everything/tutorialrank.com
CIS 524 Education for Service/tutorialrank.com
Mining and Analyzing Data from Open Source Software Repository
Applying Key Phrase Extraction to aid Invalidity Search
موضوع پروژه : بازیابی اطلاعات Information Retrieval
Introduction Task: extracting relational facts from text
Lecture 8 Information Retrieval Introduction
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Requirements Management
Ying Dai Faculty of software and information science,
Retrieval Utilities Relevance feedback Clustering
Information Organization: Overview
Topic: Semantic Text Mining
Presentation transcript:

Improving DevOps and QA efficiency using machine learning and NLP methods Omer Sagi May 2018

Outline Investigating failed tests Document similarity The developed search system Unsupervised similarity Supervised similarity learning Evaluation Challenges in moving to production

The business domain – Investigating failed tests

Investigating failed tests - overview New test design Failed test Test investigation The users of our solution are QA and Devops engineers. Let’s take Ben who is a QA engineer… His work is to think about software tests that would check software’s performance under different scenarios. If the test that Ben created runs and raises exceptions or errors that means the software doesn’t do what we expect it to do and Ben needs to investigate the failure reason or root cause. And this root cause can be tracked in the log-file that related to the test. Log-files virtually tells the story. They contain millions of messages that describe what happened at every millisecond. It is a very tedious task to go over these log-files searching for the root-cause.

Investigating failed tests – duplicated issues

Current state problem Investigators overlook duplicated issues while mistakenly mark other duplications Log filesystem AAA BBB Looks like a network outage CCC AAA CCC DDD FFF Network outage GUI issue AAA BBB CCC AAA Data unavailability Data loss

User Interface

Document similarity

Document search system - overview Query Processed terms Text processing Indexed knowledge base Ranked Document set Relevant document set Retrieved document set Ranking Similarity model

Documents as vectors The intuition: Since [D] is a vector we can find other nearby vectors (documents) Distance between vectors d1 and d2 can be captured by the cosine of the angle θ between them d1: new york times d2: new york post d3: los angeles times 𝐶𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑑 1 , 𝑑 2 = 𝑑 1 ∗ 𝑑 2 𝑑 1 ∗ 𝑑 2 =0.66 𝐶𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑑 2 , 𝑑 3 = 𝑑 2 ∗ 𝑑 3 𝑑 2 ∗ 𝑑 3 =0

The developed solution: log-files search system

Solution overview Log processing Indexed knowledge base Query Vector Log processing Indexed knowledge base Ranked issues Relevant log set Retrieved issues Ranking Similarity model

Log file vectorization Messages are analogous to words, defining the vector dimensionality In the following example, each ‘word’ is a concatenation of log type and event id Log processing Vector_dim: info_14 crit_150 info_23 error_25 Value: 2 1

The developed solution – important notes TF-IDF vectors Time range filtering Similarity-driven vs. Diversity-driven ranking

Results

Similarity learning … Issue 1 Issue 2 …. Pair vector Is duplicate? Machine-learning model

Extracting similarity vector Info similarity 1 6

Extracting similarity vector Info similarity Crit similarity 1 6

Extracting similarity vector Info similarity Crit similarity Panic 1 6 1 2

Extracting similarity vector Configuration properties Configuration properties Info similarity Crit similarity Panic Same Operation System Time difference (days) 1 6 1 2 True 12

Extracting similarity vector Info similarity Crit similarity Panic Same Operation System Time difference (days) Label: Is duplication 1 6 1 2 True 12

Results

Evaluation – top [k] precision

Evaluation – users feedback Devops team leader: “Towards the new product release, we ran about 250- 300 tests a day. Having the machine learning suggestions is one of the reasons we were able to investigate it in time. “

Challenges in moving to production

Real-time performance over time Production starting point

Challenges in moving to production Communicate the feature engineering process Similarity measures may change over time Strong features Vs. cheat features Performance issues