Approved for Public Release; Distribution Unlimited. 14-0899 Richard F. Eng PRINCE2, PMP, CSQE, CRE, CQE 14 March 2014 Applying Process Mining to IT Big.

Slides:



Advertisements
Similar presentations
C6 Databases.
Advertisements

Mike Goffin and Wesley Shields Approved for Public Release; Distribution Unlimited. Case Number
4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.
An Introduction to Information Systems in Organizations
Managing Data Resources
What do Computer Scientists and Engineers do? CS101 Regular Lecture, Week 10.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
The Hierarchy of Data Bit (a binary digit): a circuit that is either on or off Byte: 8 bits Character: each byte represents a character; the basic building.
1 1 File Systems and Databases Chapter 1 The Worlds of Database Systems Prof. Sin-Min Lee Dept. of Computer Science.
CSE 4482, 2009 Session 21 Personal Information Protection and Electronic Documents Act Payment Card Industry standard Web Trust Sys Trust.
Maintaining and Updating Windows Server 2008
1 1 File Systems and Databases Chapter 1 Prof. Sin-Min Lee Dept. of Computer Science.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 17 Slide 1 Rapid software development.
NIST BIG DATA WG Reference Architecture Subgroup Meeting Agenda Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented Intelligence)
Project Communications Management J. S. Chou Assistant Professor.
Understanding Task Orientation Guidelines for a Successful Manual & Help System.
Copyright © 2014 Pearson Education, Inc. 1 It's what you learn after you know it all that counts. John Wooden Key Terms and Review (Chapter 6) Enhancing.
Ravi Sankar Technology Evangelist | Microsoft Corporation
What is Business Analysis Planning & Monitoring?
What is Business Intelligence? Business intelligence (BI) –Range of applications, practices, and technologies for the extraction, translation, integration,
Chapter 7 Requirement Modeling : Flow, Behaviour, Patterns And WebApps.
Integrated Capability Maturity Model (CMMI)
Enabling Organization-Decision Making
Chapter 5 Lecture 2. Principles of Information Systems2 Objectives Understand Data definition language (DDL) and data dictionary Learn about popular DBMSs.
1 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
EBZ318 Deploying A Content Management Server 2002 Solution Case Study Daniel Kogan Program Manager Microsoft CMS / E-Biz server Group.
Business Analysis and Essential Competencies
Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Decision Support Systems Chapter 10.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 7 Slide 1 Requirements Engineering Processes.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Data Warehousing Data Mining Privacy. Reading Bhavani Thuraisingham, Murat Kantarcioglu, and Srinivasan Iyer Extended RBAC-design and implementation.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 1 Foundations of Information Systems in Business.
Microsoft Reseach, CambridgeBrendan Murphy. Measuring System Behaviour in the field Brendan Murphy Microsoft Research Cambridge.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
Chapter 1 An Introduction to Information Systems
NIST BIG DATA WG Reference Architecture Subgroup Draft Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented Intelligence) August.
Cryptography and Network Security Sixth Edition by William Stallings.
IT and Network Organization Ecommerce. IT and Network Organization OPTIMIZING INTERNAL COLLABORATIONS IN NETWORK ORGANIZATIONS.
© 2014 The MITRE Corporation. All rights Reserved. Roger Westman Principal Information Security Engineer September 29, 2014 Authorization.
Unit 17: SDLC. Systems Development Life Cycle Five Major Phases Plus Documentation throughout Plus Evaluation…
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Search Engine Optimization © HiTech Institute. All rights reserved. Slide 1 Click to edit Master title style What is Business Analysis Body of Knowledge?
Foundations of Information Systems in Business
Richard F. Eng PRINCE2, PMP, CSQE, CRE, CQE, SAFe Agilist Applying Machine Learning Techniques to Improve.
Software as a Service (SaaS) Fredrick Dande, MBA, PMP.
If you have a transaction processing system, John Meisenbacher
BIG DATA. The information and the ability to store, analyze, and predict based on that information that is delivering a competitive advantage.
Maintaining and Updating Windows Server 2008 Lesson 8.
CMMI Certification - By Global Certification Consultancy.
Managing Data Resources File Organization and databases for business information systems.
Unlock your Big Data with Analytics and BI on Office365 Brian Culver ● SharePoint Fest Seattle● BI102 ● August 18-20, 2015.
Big Data and Official Statistics: Philippine Context Erniel B. Barrios.
Serve as Director Funded by the Louisiana Department of Transportation and Development Developed LaCrash application to electronically capture crash.
Data Analytics 1 - THE HISTORY AND CONCEPTS OF DATA ANALYTICS
INFORMATION SYSTEM CATEGORIES
Fundamentals of Information Systems
7/23/ :49 AM © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Cloudy with a Chance of Data
Basic Concepts in Data Management
Microsoft Services Provider License Agreement Program reference card
Big DATA.
Presentation transcript:

Approved for Public Release; Distribution Unlimited Richard F. Eng PRINCE2, PMP, CSQE, CRE, CQE 14 March 2014 Applying Process Mining to IT Big Data The author’s affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITRE’s concurrence with, or support for, the positions, opinions, or viewpoints expressed by the author. © 2014 The MITRE Corporation. All rights reserved.

 Biography  Big Data Definitions  Process Mining  Quantitative Analysis Stages and Steps  Applying Process Mining  Lessons Learned | 2 || 2 | Agenda © 2014 The MITRE Corporation. All rights reserved.

Biography  Principal Software Systems Engineer at the MITRE Corporation  Over 20 years of industry experience in telecommunications and software systems  Areas of interests –Solving systems level problems –Applying quantitative methods to improve business, IT processes, and software quality  Education: –M.S. in Data Analytics, University of Maryland (Expected 2015) –MBA, Georgetown University –M.S. in Bioengineering, Brooklyn Polytechnic Institute –B.S. in Chemistry, Brooklyn Polytechnic Institute | 3 || 3 | © 2014 The MITRE Corporation. All rights reserved.

Big data is data which “exceed(s) the capacity or capability of current or conventional methods and systems.” In other words, the notion of “big” is relative to the current standard of computation. The National Institute of Standards and Technology “… the increasing size of data, the increasing rate at which it is produced and the increasing range of formats and representations employed. This report predated the term “big data” but proposed a three-fold definition encompassing the “three Vs”: Volume, Velocity and Variety. This idea has since become popular and sometimes includes a fourth V: Veracity, to cover questions of trust and uncertainty.” Gartner. In 2001, a Meta (now Gartner) report | 4 || 4 | Big Data Definitions © 2014 The MITRE Corporation. All rights reserved.

“Big data is the term increasingly used to describe the process of applying serious computing power—the latest in machine learning and artificial intelligence — to seriously massive and often highly complex sets of information.” Microsoft “Big data opportunities emerge in organizations generating a median of 300 terabytes of data a week. The most common forms of data analyzed in this way are business transactions stored in relational databases, followed by documents, , sensor data, blogs, and social media. Intel The Method for an Integrated Knowledge Environment open-source project. The MIKE project argues that big data is not a function of the size of a data set but its complexity. Consequently, it is the high degree of permutations and interactions within a data set that defines big data. | 5 || 5 | Big Data Definitions (continued) © 2014 The MITRE Corporation. All rights reserved.

 Data analytics technique  System agnostic  Examines large amounts of process data  Provides the ability to –Evaluate and understand actual process flows –Compare actual against expected process flows from a  Policy, procedures, and/or Information Technology perspective  Impact of visualizing actual process flows –Detection of process patterns –Identification of process inefficiencies –Identification of anomalous behavior | 6 || 6 | Process Mining © 2014 The MITRE Corporation. All rights reserved.

Frame the Problem Problem recognition Review previous results Solving the Problem Modeling and variable selection Data collection Data analysis Communicating and Acting On Results Results presentation and action | 7 || 7 | Quantitative Analysis 3 Stages and 6 Steps Source: Keeping up with the Quants: Your Guide to Understanding and Using Analytics – Davenport, T. and Kim, J. (2013). © 2014 The MITRE Corporation. All rights reserved.

 Problem recognition –Figure out where the problems are with the SYSTEM & fix them fast –E-Commerce site experiencing significant operational availability issues  Site performance degrades and crashes as more customers access services  No current documentation (e.g., Systems requirements, design specifications, applications software, test procedures, test scripts, etc.) –We were Agile. We didn’t have time to …  Review of previous results –Ad hoc problem solving huddles  It’s not my system. It must be your …  Try this, it should/might fix it … –No tracing of business processes to software applications to IT infrastructure –Manual review of system logs | 8 || 8 | Frame the Problem “Furious activity is no substitute for understanding.” - H. H. Williams © 2014 The MITRE Corporation. All rights reserved.

 Modeling and variable selection –Apply process mining to model end user transactions through the E- commerce site –Variables needed  Unique Transaction ID, Start and Stop Time Stamp, and Activity  Data collection –Gain access to system log files  Normal operations  During outages and low operational availability  Data analysis –Run process model –Review results –Provide feedback to stakeholders that own the processes, application software, and IT infrastructure | 9 || 9 | Solving the Problem © 2014 The MITRE Corporation. All rights reserved.

 Results presentation and action –Demonstrate process mining results to stakeholders –Capture process mining visualization as a video and to stakeholders so they can see the process bottlenecks and deviations –Explain the findings | 10 | Communicating and Acting on Results © 2014 The MITRE Corporation. All rights reserved.

 Modeling and variable selection –Process mining  Visualize normal and anomalous operations –Transaction process flows –Application process flows –IT Infrastructure process flows –Variables needed  Unique Transaction ID  Start and End Time Stamps  Business Process  Application Software  IT Infrastructure  IP Address | 11 | Applying Process Mining to IT Big Data © 2014 The MITRE Corporation. All rights reserved.

 Data collection –Talk with system administrators to capture log data –Explain the data fields and formats you want –Transform the log files into the format that the process mining system needs  Data analysis –Load the data into the process mining model –Run the process mining models –Visualize and analyze data  Transaction business process flows  Application software process flows  IT infrastructure process flows –Review the results | 12 | Applying Process Mining to IT Big Data (Continued) Data Collection Transform Data Run the Model & Data Analysis © 2014 The MITRE Corporation. All rights reserved.

 Communicate results –Provide feedback to stakeholders  Process owners  Application software team  IT Infrastructure team  Benefits –Visualization of the actual process flows –Comparison of normal and anomalous operations - rather than manually reviewing individual system logs –More rapidly identify spurious processes | 13 | Applying Process Mining to IT Big Data (Continued) © 2014 The MITRE Corporation. All rights reserved.

| 14 | Illustration of Process Mining Data DUMMY DATA © 2014 The MITRE Corporation. All rights reserved.

 Transaction Business Process Flow  Application Software Process Flow  IT Infrastructure and IP Address Process Flow | 15 | Process Mining Demo (Using DUMMY DATA) © 2014 The MITRE Corporation. All rights reserved.

 Good news –Process mining concept works  Bad news –Not all of the IT infrastructure was instrumented to collect log data –In some cases the log data was aggregate rather than per transaction –Some logs lacked unique transaction IDs –System time was not synchronized across the IT infrastructure –IaaS provider never contracted to provide system log data  Parting thoughts –Make sure the network and system time for all of your infrastructure are synchronized –Require all XaaS providers to provide system log data –Plan for process mining at the start of the project | 16 | Lessons Learned © 2014 The MITRE Corporation. All rights reserved.

 “The most difficult subjects can be explained to the most slow- witted man if he has not formed any idea of them already; but the simplest thing cannot be made clear to the most intelligent man if he is firmly persuaded that he knows already, without a shadow of doubt, what is laid before him.” – Leo Tolstoy | 17 | Should You Consider Process Mining? © 2014 The MITRE Corporation. All rights reserved.

 Richard F. Eng | 18 | Contact Information © 2014 The MITRE Corporation. All rights reserved.