Download presentation
Presentation is loading. Please wait.
Published byCoral Morgan Modified over 8 years ago
1
Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa { shernandez@unizar.es,s.hernandez.de.Mesa@tue.nl }shernandez@unizar.ess.hernandez.de.Mesa@tue.nl Eindhoven, The Netherlands 12th March, 2015
2
Distributed Process Discovery From Large Event Logs Distributed Process Discovery A Framework for Distributed Computing Summary and Future Work Outline Sergio Hernández de Mesa 12th March, 2015 2
3
Distributed Process Discovery From Large Event Logs Distributed Process Discovery A Framework for Distributed Computing Summary and Future Work Outline Sergio Hernández de Mesa 12th March, 2015 3
4
Distributed Process Discovery From Large Event Logs Distributed Process Discovery Sergio Hernández de Mesa 12th March, 2015 Big Data 4
5
Distributed Process Discovery From Large Event Logs Distributed Process Discovery Sergio Hernández de Mesa 12th March, 2015 The 3 V’s of Big Data 5
6
Distributed Process Discovery From Large Event Logs Distributed Process Discovery Sergio Hernández de Mesa 12th March, 2015 Big Data and process discovery 6 XES logs CSV files Offline analysis Real-time analysis Data streams TB GB MB
7
Distributed Process Discovery From Large Event Logs Distributed Process Discovery Sergio Hernández de Mesa 12th March, 2015 Actual problem 7
8
Distributed Process Discovery From Large Event Logs Distribute/Parallelize process discovery techniques - Inductive Miner - Alpha Miner - Heuristics Miner - … Take advantage of HPC infrastructures and parallel programming models - Clusters, grids and clouds - MapReduce Distributed Process Discovery Performance improvement opportunities Sergio Hernández de Mesa 12th March, 2015 8
9
Distributed Process Discovery From Large Event Logs No computing resources No computing resources but money Hadoop Cluster HPC infrastructure Sergio Hernández de Mesa 12th March, 2015 9 “Classical” ProM Amazon Elastic MapReduce MapReduce model Distributed approach Distributed Process Discovery Execution scenarios
10
Distributed Process Discovery From Large Event Logs MapReduce - Programming model for data-oriented applications - Proposed by Google - Map: (k 1, v 1 ) list (k 2,v 2 ) - Reduce: (k 2, list(v 2 ) ) list (v 3 ) Hadoop - Software for reliable, scalable and distributed computing - Developed by Apache - Core components: Hadoop Distributed File System (HDFS) Hadoop MapReduce Hadoop YARN Sergio Hernández de Mesa 12th March, 2015 10 Distributed Process Discovery MapReduce and Hadoop
11
Distributed Process Discovery From Large Event Logs Distributed Process Discovery Sergio Hernández de Mesa 12th March, 2015 Step 1 Directly-Follows Graph DFG Process Model XES log XES log Optimized version of Step 1 – Reading data as a stream (SAXParser) – HashMaps to efficiently count frequencias Example XES log: 100 million traces (40 activities) – Size: 218 GB – Step 1: XES to DFG: ~ 2-3 hours – Step 2: DFG to Process Model: ~ few seconds Step 2 Motivational example: Inductive Miner 11
12
Distributed Process Discovery From Large Event Logs HDFS (Hadoop Distributed File System) HDFS (Hadoop Distributed File System) HDFS (Hadoop Distributed File System) XES Logs Block 1 Block 2 Block N …...... ….. ….. … MAP 1 MAP 2 MAP N … … … … … … … … … … … DFG 1 DFG 2 DFG N REDUCEREDUCE REDUCEREDUCE FINAL DFG Split phase Distributed Process Discovery Computing DFG: Hadoop/MapReduce approach …… Sergio Hernández de Mesa 12th March, 2015 12
13
Distributed Process Discovery From Large Event Logs XES Logs XES sublog 1 XES sublog 2 XES Sublog N XES Sublog N … XES2DFG (MAP) XES2DFG (MAP) XES2DFG (MAP) XES2DFG (MAP) XES2DFG (MAP) XES2DFG (MAP) … DFG 1 DFG 2 DFG N REDUCE_DFGSREDUCE_DFGS REDUCE_DFGSREDUCE_DFGS FINAL DFG … Distributed Process Discovery Computing DFG: Distributed/HPC approach Sergio Hernández de Mesa 12th March, 2015 13
14
Distributed Process Discovery From Large Event Logs Distributed Process Discovery A Framework for Distributed Computing Summary and Future Work Outline Sergio Hernández de Mesa 12th March, 2015 14
15
Distributed Process Discovery From Large Event Logs A Framework for Distributed Computing Sergio Hernández de Mesa 12th March, 2015 15 Scientific computing
16
Distributed Process Discovery From Large Event Logs A Framework for Distributed Computing Sergio Hernández de Mesa 12th March, 2015 16 Heterogeneous Execution Environments
17
Distributed Process Discovery From Large Event Logs A Framework for Distributed Computing Sergio Hernández de Mesa 12th March, 2015 17 Challenges of scientific computing in HPC Strong coupling between applications and execution environments Lifecycle management Using multiple computing infrastructures
18
Distributed Process Discovery From Large Event Logs A Framework for Distributed Computing Sergio Hernández de Mesa 12th March, 2015 18 Framework architecture
19
Distributed Process Discovery From Large Event Logs Amazon EC2 Mediator A Framework for Distributed Computing Sergio Hernández de Mesa 12th March, 2015 HERMES Mediator Message bus HERMES Meta-scheduler Fault Management User application JSDL Message ✓ Selecting a computing infrastructure Job execution ✘ Selecting fault handling policy Resubmission Alternative infrastructure Aborting job execution 19 Framework operation Job execution ✓
20
Distributed Process Discovery From Large Event Logs Distributed Process Discovery A Framework for Distributed Computing Summary and Future Work Outline Sergio Hernández de Mesa 12th March, 2015 20
21
Distributed Process Discovery From Large Event Logs Summary and Future Work Summary Sergio Hernández de Mesa 12th March, 2015 21 Inductive Miner Alpha Miner Heuristics Miner …
22
Distributed Process Discovery From Large Event Logs New ProM plugin No computing resources No computing resources but money Hadoop Server HPC infrastructure Sergio Hernández de Mesa 12th March, 2015 22 “Classical” ProM Amazon Elastic MapReduce MapReduce model Distributed approach Summary and Future Work Execution scenarios
23
Distributed Process Discovery From Large Event Logs Summary and Future Work Solution approach Sergio Hernández de Mesa 12th March, 2015 23
24
Distributed Process Discovery From Large Event Logs Process discovery from Large Event Logs - “Sequential” way: Time-consuming - Solution approach: MapReduce and Distributed computing Current state - Code developed for distributed computing DFGs - Setting up Hadoop Cluster Future Work - Integration with the distributed computing framework - Development of a ProM plugin Sergio Hernández de Mesa 12th March, 2015 24 Summary and Future Work Conclusions
25
Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa { shernandez@unizar.es,s.hernandez.de.Mesa@tue.nl }shernandez@unizar.ess.hernandez.de.Mesa@tue.nl Eindhoven, The Netherlands 12th March, 2015
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.