Combining Process Mining and Distributed Tracing to Improve Root Cause Analysis Jochen Graeff (B.Sc.), 27.02.2017, Munich Advisor: Martin Kleehaus.

Slides:



Advertisements
Similar presentations
Information System (IS) Stakeholders
Advertisements

Configuration management
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
CS 290C: Formal Models for Web Software Lecture 10: Language Based Modeling and Analysis of Navigation Errors Instructor: Tevfik Bultan.
Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer
Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments.
Managing the development and purchase of information systems (Part 1)
Supporting Operational Team Filippo Lambiente (Progress Software)
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
REST - Introduction Based on material from InfoQ.com (Stefan Tilkov) And slides from MindTouch.com (Steve Bjorg) 1.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
1 © Donald F. Ferguson, All rights reserved.Modern Internet Service Oriented Application Development – Lecture 2: REST Details and Patterns Some.
A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.
Dapper, a Large-Scale Distributed System Tracing Infrastructure
4/26/2017 Use Cloud-Based Load Testing Service to Find Scale and Performance Bottlenecks Randy Pagels Sr. Developer Technology Specialist © 2012 Microsoft.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
Introduction to OOAD and UML
MANAGEMENT INFORMATION SYSTEM
© 2016 TM Forum | 1 How Platforms and API’s enable businesses to participate in the Digital Eco-systems of the future W. George Glass BT, Chief Systems.
SQL Database Management
The Emergent Structure of Development Tasks
Architecture Review 10/11/2004
Databases and DBMSs Todd S. Bacastow January 2005.
CompSci 280 S Introduction to Software Development
A prototypical tool to discover architecture changes based on multiple monitoring data sources for a distributed system Patrick Schäfer, , Munich.
Chapter 12: Architecture
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Welcome to M301 P2 Software Systems & their Development
Software Project Configuration Management
By: Raza Usmani SaaS, PaaS & TaaS By: Raza Usmani
Component and Deployment Diagrams
Distributed Tracing How to do latency analysis for microservice-based applications Reshmi
Automate Does Not Always Mean Optimize
Improving searches through community clustering of information
Course Outcomes of Object Oriented Modeling Design (17630,C604)
Software Architecture in Practice
Outline Introduction Standards Project General Idea
Information Systems Today: Managing in the Digital World
Unified Modeling Language
Distributed Tracing Of Microservices
Joseph JaJa, Mike Smorul, and Sangchul Song
NOX: Towards an Operating System for Networks
Enterprise Computing Collaboration System Example
A prototypical tool to discover architecture changes based on multiple monitoring data sources for a distributed system Patrick Schäfer, , Munich.
Dapper, a Large-Scale Distributed System Tracing Infrastructure
FORMAL SYSTEM DEVELOPMENT METHODOLOGIES
New Mexico State University
Microsoft Build /8/2018 5:15 AM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Breaking through with Blockchain
Chapter 12: Automated data collection methods
The Extensible Tool-chain for Evaluation of Architectural Models
Human Complexity of Software
Enhancing enterprise architecture models using application monitoring data Christopher Janietz, 2018/04/23, Advanced Seminar (Advisor: Martin Kleehaus)
Lecture 1: Multi-tier Architecture Overview
Software Architecture
Chapter 12: Physical Architecture Layer Design
Cloud computing mechanisms
CS240: Advanced Programming Concepts
Microsoft Build /2/2019 6:45 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Multithreaded Programming
Distributed Systems through Web Services
About Thetus Thetus develops knowledge discovery and modeling infrastructure software for customers who: Have high value data that does not neatly fit.
Lecture 06:Software Maintenance
The Anatomy and The Physiology of the Grid
Chapter 5 Architectural Design.
Enhancing enterprise architecture models using application monitoring data Christopher Janietz, 2018/10/12, Advanced Seminar (Advisor: Martin Kleehaus)
Master’s Thesis – Kick-off presentation Assessing the cost and benefit of a microservice landscape discovery method in the automotive industry Advisor:
From Use Cases to Implementation
Presentation transcript:

Combining Process Mining and Distributed Tracing to Improve Root Cause Analysis Jochen Graeff (B.Sc.), 27.02.2017, Munich Advisor: Martin Kleehaus

Agenda Motivation Introduction: Distributed Tracing and Process Mining Problem Statement and Research Questions Combining Process Mining and Distributed Tracing Architectural Sketch Working Approach Timeline 4. Viewing Root Cause Analysis from an Enterprise Architecture Perspective © sebis

Motivation The following scenario … A customer calls the service hotline of a car sharing because his just booked car won‘t open. The support agent records the incident (including an approximate time and the customer id). The incident is forwarded to an engineer in order to find out the root cause. In the meanwhile the support agents opens the car via the platform by force. In the next step, the root cause analysis, the engineer would check the systems log files for errors in the specified time frame for the services he thinks that might be affected. Since there a multiple possible services involved, it would probably take him or her a lot of time to investigate the issue and forward an exhaustive analysis report to the concerning colleague. - The investigation success is dependent on the logging infrastructure that is provided. - Maybe the company has a tool installed helping to filter the logs. Now imagine an multi service architecture instead of a monolithic system  multiple callstacks It‘s a complex, large scale distributed system Providing every service in the code with logging capabiities would be very costly and not be feasible. -> Logging in real life often bad maintained This is where distributed tracing comes into play. RabbitMQ is used to deliver traces to Zipkin. © sebis

Distributed Tracing ” Modern Internet services are often implemented as complex, large-scale distributed systems. These applications are constructed from collections of software modules that may be developed by different teams, perhaps in different programming languages, and could span many thousands of machines across multiple physical facilities…. Understanding system behavior in this context requires observing related activities across many different programs and machines. Dapper, A Large Scale Distributed Systems Tracing Infrastructure  – Sigelman et al. (Google) 2010 Hard to find out which services are affected Low overhead Scalable Little code change Application Level Transperency: programmers should not be aware of the system © sebis

Distributed Tracing The path taken through a simple servicing system on behalf of user request X. The letter-labeled nodes represent pocesses in a distributed system. The causal and temporal relationship between fice spans in a specific trace also share a common trace id. Sigelman et al. (2010). Dapper, A Large Scale Distributed Systems Tracing Infrastructure. Google Research © sebis

Distributed Tracing © sebis

Is there a gap? Or put another way: what is already there? Visualization Monitoring Instance Login Show Car Details Reserve Car Show FAQ Report Issue < User Layer Process Analyst Process Discovery Login Report Issue Reserve Car Business Layer Show Details Show FAQ Customer Journey S6 S9 S1 S4 S2 S3 S7 S5 S8 Distributed Tracing Systems Analyst S1 Span ID 1 Application Layer S2 Span ID 2 S3 Span ID 3 Distributed (Micro)-Services © sebis

What is Process Mining? “The idea of process mining is to discover, monitor and improve real processes (i.e., not assumed processes) by extracting knowledge from event logs readily available in today's (information) systems.” IEEE CIS Task Force on Process Mining Create PO Start Production Ship Delivery Notify Customer Delete PO < Event Log TIMESTAMP ACTIVITY CASE ID 2016-03-05 22:38:41.868 CreATE PO #1234 2016-03-05 23:46:32.306 Start production #5678 2016-03-05 23:47:42.321 NotIFY CUSTOMER #1234 2016-03-05 23:53:12.354 SHIP DELIVERY #9012 ... ... ... There are three Classes of Process Mining: Process Discovery Conformance Checking Extension © sebis

Problem Statement Monitoring techniques already exists for business and application layer but siloed No connection between process failure in the business layer and system failure in the application layer Therefore it is difficult to find correlations across the layers Root cause analysis is a very time consuming task Kleehaus, M., Uludag, Ö., Matthes, F. (2017) Towards a multi-layer IT Infrastructure monitoring approach based on Enterprise Architecture Information. CSE 17, Hannover, Germany. For example: is the reason for a high cancellation rate maybe a slow or even not functioning service? © sebis

Research Questions RQ1: How can a relationship between business activities and a distributed application architecture be established? RQ2: What data has to be extracted and how has it to be mapped to enable and store the relationship knowledge? RQ3: What is the state of the art of monitoring both the application and business layer? RQ4: How can a root cause analysis across the two different layers be partially automated? ? ? ? ? © sebis

Process Mining + Distributed Tracing Using log data from Distributed Tracing to generate a business activities CASE ID ACTIVITY TIMESTAMP 12382 List available Cars 07/06/17 12:45:32 Reserve Car 07/06/17 12:46:12 19816 Book Car 07/06/17 17:45:32 39273 07/06/17 18:22:12 83947 07/06/17 18:24:14 Data required for Business Process Mining TRACE ID SESSION ID (through annotations) SPAN NAME REQUEST TIMESTAMP 12382 43212 Service 1 /car/list 07/06/17 12:45:32 65433 Service 4 07/06/17 12:46:12 19816 Service 3 /car/book 07/06/17 17:45:32 39273 74689 Service 2 07/06/17 18:22:12 83947 34686 07/06/17 18:24:14 Data available from Distibuted Tracing How can we close the gap? © sebis

Architectural Sketch Zipkin Server (Distributed Tracing) POST /reserveCar GET /carDetails GET /userInfo S6 S9 S1 S4 S2 S3 S7 S5 S8 Sample Microservice Architecture with n services Process Mining Creation of the Event Log Zipkin, Twitter, opensourced Celonis Event Log Timestamp, Activity, ID 2016-03-05 22:38:41.868 Create purchase order #1234 2016-03-05 23:46:32.306 Start production #5678 2016-03-05 23:47:42.321 Receive payment #1234 2016-03-05 23:53:12.354 SEND EMAIL #9012 © sebis

Working Approach Which REST calls define an activity? Which process steps define a process? Data manipulation: Create previously defined activities Define Activities & Processes Boundaries 1 Extract Data from Zipkin 2 Create Event Log 3 Visualize Process in Celonis 4 Connect Celonis with Zipkin 5 Enhance Visuali-zation with further data 5 Persist Data in RDBMS Link SessionID to some sort of Cases Table (e.g. purchases, dependent on process context) to display relevant information © sebis

Timeline Feb March April May June July August Setup test microservice infrastructure Setup real* microservice infrastructure Event log generation Event log generation Build connector Build analysis in process mining tool Enhance analysis with further (case) data Literature research 15.08 Submission Thesis writing * high dependency on possible industry partners © sebis