D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA 01609

Slides:



Advertisements
Similar presentations
Evaluating and Institutionalizing
Advertisements

Performance Evaluation and Benchmarking Using DEA
Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Table of Contents Exit Appendix Behavioral Statistics.
Continuous Value Enhancement Process
Bayesian Decision Theory
The Experience Factory May 2004 Leonardo Vaccaro.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Concept of Measurement
Decision Tree Algorithm
INDUSTRIAL & SYSTEMS ENGINEERING
Week 9 Data Mining System (Knowledge Data Discovery)
Monitoring and Pollutant Load Estimation. Load = the mass or weight of pollutant that passes a cross-section of the river in a specific amount of time.
RESEARCH METHODS IN EDUCATIONAL PSYCHOLOGY
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Codex Guidelines for the Application of HACCP
RESEARCH DESIGN.
Chapter 2 The Research Enterprise in Psychology. n Basic assumption: events are governed by some lawful order  Goals: Measurement and description Understanding.
1.Knowledge management 2.Online analytical processing 3. 4.Supply chain management 5.Data mining Which of the following is not a major application.
Chapter 1: Introduction to Statistics
BSBIMN501A QUEENSLAND INTERNATIONAL BUSINESS ACADEMY.
PROJECT IDENTIFICATION AND FORMULATION
Capacity analysis of complex materials handling systems.
Retail Labor Planning Model – Alix Partners Carolyn Taricco Erin Gripp Victoria Cohen.
Introduction to Probability and Statistics Consultation time: Ms. Chong.
Lecture 2 Process Concepts, Performance Measures and Evaluation Techniques.
Software Project Management With Usage of Metrics Candaş BOZKURT - Tekin MENTEŞ Delta Aerospace May 21, 2004.
Baseline Data Measure Kaizen Facilitation. Objectives Define data types and purpose Explain concepts of efficiency and effectiveness Provide tips on establishing.
Development of a Comprehensive Framework for the Efficiency Measurement of Road Maintenance Strategies using Data Envelopment Analysis by Mehmet Egemen.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
Topics To Be Covered 1. Tasks of a Shop Control Manager.
Assessing the influence on processes when evolving the software architecture By Larsson S, Wall A, Wallin P Parul Patel.
Programme Objectives Analyze the main components of a competency-based qualification system (e.g., Singapore Workforce Skills) Analyze the process and.
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
1. 2 Traditional Income Statement LO1: Prepare a contribution margin income statement.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
ABSTRACT Employment (Firm) location is a significant issue in urban planning. The importance of firm location stems from its significant contribution to.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Strategic Plan Development Using KPIs to Develop the Strategic Plan.
Measurement and Scaling
Unit 1 Sections 1-1 & : Introduction What is Statistics?  Statistics – the science of conducting studies to collect, organize, summarize, analyze,
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Lecture №4 METHODS OF RESEARCH. Method (Greek. methodos) - way of knowledge, the study of natural phenomena and social life. It is also a set of methods.
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
WHAT IS RESEARCH? According to Redman and Morry,
Towards an Agenda for Measuring Efficiency in Health Care Michael Chernew Sept. 27, 2007.
Building Valid, Credible & Appropriately Detailed Simulation Models
9 - 1 Chapter 9 Management Control Systems and Responsibility Accounting.
Measurement Chapter 6. Measuring Variables Measurement Classifying units of analysis by categories to represent variable concepts.
1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter : Zhao-Wei Luo Che-Jung Chang,Der-Chiang.
Data Mining Techniques Applied in Advanced Manufacturing PRESENT BY WEI SUN.
TAUCHI PHILOSOPHY SUBMITTED BY: RAKESH KUMAR ME
Cindy Tumbarello, RN, MSN, DHA September 22, 2011.
UNIT – V BUSINESS ANALYTICS
Manufacturing system design (MSD)
A Unifying View on Instance Selection
Comparative Evaluation of SOM-Ward Clustering and Decision Tree for Conducting Customer-Portfolio Analysis By 1Oloyede Ayodele, 2Ogunlana Deborah, 1Adeyemi.
Machine Learning in Practice Lecture 27
Performance Evaluation and Benchmarking Using DEA
Managing Project Work, Scope, Schedules, and Cost
STATISTICS derived from the Latin word STATUS, Italian word STATISTA, German word STATISTIK, and French word STATISTIQUE which express one meaning “ Political.
Process Wind Tunnel for Improving Business Processes
Presentation transcript:

D ATA ENVELOPMENT A NALYSIS A Tool for Data Mining and Analytics Joe Zhu School of Business Worcester Polytechnic Institute Worcester, MA

What is DEA? 2  When DEA was developed/published in 1978  Non-parametric approach to estimating production functions  Thus, we have multiple inputs and multiple outputs (of a production function)  DEA tries to identify the efficient units

What is DEA exactly? 3  More than production efficiency estimate  It is a balanced benchmarking  Sherman and Zhu(2013) that enables companies to benchmark and locate best practices that are not visible through other commonly-used management methodologies  Help executives to study the top-performing units, to identify the best practice and to transfer the valuable knowledge throughout the organization to enhance performance, also to test their assumptions that might be counter-productive

A tool for benchmarking 4  If one benchmarks the performance of computers, it is natural to consider different features (screen size and resolution, memory size, process speed, hard disk size, and others). One would then have to classify these features into “inputs” and “outputs” in order to apply a proper DEA analysis. However, these features may not actually represent inputs and outputs at all, in the standard notion of production

DEA - revisit 5 Multiple inputs Multiple outputs the smaller the better the larger the better a rule for classifying metrics

DMU 6  Definition of DMU is generic and flexible  Numerous applications are found in areas of finance, marketing, transportation, sports, accounting, energy, sustainability, fishery, insurance and others

(Relative) Efficiency 7  The term ‘efficiency’ here presents best-practice  Under general benchmarking, it does not necessarily mean ‘production efficiency’  We may refer to the DEA score as a form of ‘overall performance’ of an organization  An example: measuring the quality of care in the case of treating heart-attack patients  Some measures which can be used in DEA to yield a composite measure of quality indicators  Patients Given Aspirin at Arrival, Patients Given Beta Blocker at Discharge, etc.

Mathematical Model 8 Dual

Business Analytics by Data Envelopment Analysis (DEA)  Descriptive Analytics: Gain insight from historical data  Predictive Analytics: Forecasting  Prescriptive Analytics: Recommend decisions using optimization, simulation, etc.  Decisive Analytics: supports human decisions with visual analytics

D ATA ENVELOPMENT A NALYSIS  DEA is a D ATA A NALYSIS tool  Data Mining and Knowledge Discovery by DEA  More than Relative Efficiency 10

Sample Size 11  DEA is not a form of regression model  It is meaningless to apply a sample size requirement to DEA  It is likely that a significant portion of DMUs will be benchmarked as the best practice with ratio 1, if there are too many performance metrics given the number of DMUs  One can use certain DEA approaches to reduce the number of best- practice DMUs

12 Regression analysis

Numerous Models/Approaches One modification to DEA is called stratification. Stratification results in many efficiency frontiers. The first represents all DMUs with the highest efficiency, and so on down each stratified level until all DMUs have been included. Data Envelopment Analysis 13

Network Structure 14

Ship Block Manufacturing Process Performance Evaluation

Shipbuilding process Business & Service Computing Laboratory Main processes of shipbuilding consist of several work stages 16 For effective ship construction A ship is divided into properly sized blocks in the design stage All blocks are manufactured (or assembled) into the body of a ship Design Cutting & Forming Assembly Pre-Outfitting & Painting Pre-Erection Erection Quay

management of block manufacturing process (BMP) 17 Effective block manufacturing process (BMP) management has been regarded as one of the most important issues in shipbuilding industry 250 different blocks A large ship usually needs more than 250 different blocks, each manufactured through a different process according to the ship’s type and size Many blocks are assembled into a ship, each block has complex manufacturing processes Thus An effective and efficient BMP performance enables a reduction of the overall shipbuilding period and thereby the cost -If any one block includes unnecessary work stages, the related inefficient resource assignment or long queuing times in the storage yard will have a negative effect on the overall shipbuilding period and productivity considers various factors reflecting real manufacturing processes and situations practical and accurate performance evaluation method that considers various factors reflecting real manufacturing processes and situations is crucial For an effective management of BMP performance For example

Practical difficulties in evaluating BMP performance 18 For effective BMP management, the shipbuilding companies have implemented production information systems ( e.g. BAMS (Block Assembly Monitoring System) or RPMS (Real-time Progress Management System)… ) These systems only focus on work scheduling, process monitoring and work automation There are at least two practical difficulties in evaluating BMP performance many block assembly types 1) There are many block assembly types ( e.g. Sub-assembly, Unit-assembly, and Grand-assembly... ) assembly type is in turn classified into one of three form types and each assembly type is in turn classified into one of three form types ( e.g. Small, Curved, and Large… ) Generally, there is a 5~9 day delay between planned work and performed work There are discrepancies between actual and planned work 2) There are discrepancies between actual and planned work in the form of time gaps due to various problems ( e.g. work delay, urgent work, and the convergence of blocks at the end of the process… ) But

Goal of this research Business & Service Computing Laboratory  This research addresses above two practical difficulties in evaluating BMP performance 19 Data pre- processing Data Extraction Database in shipbuilding company integrated systematic approach to evaluate the performance of BMP in the shipbuilding industry This research proposes an integrated systematic approach to evaluate the performance of BMP in the shipbuilding industry by integrating process mining (PM) and DEA Block manufacturing processes Generation Performance evaluation of BMP Evaluation Guideline for improving the performance of underperforming BMPs Process mining (PM) Data envelopment analysis (DEA)

Business & Service Computing Laboratory 20 Proposed method

Business & Service Computing Laboratory 21 Clustering Consider block ID 101  It includes three operations; C1, G9 and S6 Extract sample log data based on the defined attributes Database Defined attributes BMP is generated as a form of operations flow from the extracted log data  We arrange these operations by End time in ascending order  The sequence of operations C1  S6  G9, is the BMP of block ID 101 Generation of BMPs The generated BMPs are then subjected to performance evaluation

Business & Service Computing Laboratory 22 Proposed method heterogeneous Generated BMPs are heterogeneous since there are many kinds of BMPs Block clustering For a more accurate performance evaluation homogeneous BMPs Our intention is to evaluate homogeneous BMPs We classify BMPs into several peer groups by their similarity Therefore similarity indextwo vectors The similarity of BMPs is measured by the similarity index, which is calculated by two vectors: Task vector: Task vector: based on the presence or absence of the same operations in two BMPs Transition vector: Transition vector: based on the sequential relationship of the operations in two BMPs The task vector and transition vector take values from 0 to 1, with values closer to 1 indicating that two BMPs are more similar

Business & Service Computing Laboratory 23 Performance evaluation Each BMP is regarded as a DMU, and only BMPs in the same group are considered for performance evaluation DEA model where some performance metrics have target levels Due to the nature of our performance metrics, we use a DEA model where some performance metrics have target levels developed recently by Lim & Zhu (2013) In our case, the performance metrics are selected based on the extracted log data. We conducted a questionnaire survey of 30 shipbuilding operating experts to obtain information on which factors are most critical to BMP performance

Business & Service Computing Laboratory 24 Case study from a Korean shipbuilding company Two projects’ event logs exported from a Block Assembly Monitoring System (BAMS) were used. Eighty-six blocks six clusters Eighty-six blocks are generated from the log data, which are then classified into six clusters Condition of Experiment We refer to these defined block types in deciding the number of clusters In general, production planners assign the work resources and establish the production scheduling based on the block types defined by the empirical knowledge of shipbuilding operating experts. We refer to these defined block types in deciding the number of clusters

Business & Service Computing Laboratory 25 Case study Clustering results including the number of blocks and the process characteristics of each cluster We aggregate all BMPs in the cluster C5 to show a concrete instance for the clustering result The aggregated model of all BMPs in C5 represents BMPs performed in the work shop #2

26 Case study The performance metrics are calculated and the descriptive statistics for them are listed

27 Case study The evaluation results are summarized Average performance scores of BMPs Performance scores of BMPs in C5 Five blocks (1XXX_622, 2XXX_509, 2XXX_622, 2XXX_631, 2XXX_642) are determined as the best-practice, whereas the remaining 14 blocks are underperforming In particular, 1XXX_110 and 2XXX_110 are the most underperforming blocks. Most of the best-practice blocks have the same BMPs as Comp 101-‘C’  Grand 201-‘P’  Grand 202-‘3’  Grand 203-‘3’  Grand 301-‘3’ Five blocks (1XXX_622, 2XXX_509, 2XXX_622, 2XXX_631, 2XXX_642) are determined as the best-practice, whereas the remaining 14 blocks are underperforming In particular, 1XXX_110 and 2XXX_110 are the most underperforming blocks. Most of the best-practice blocks have the same BMPs as Comp 101-‘C’  Grand 201-‘P’  Grand 202-‘3’  Grand 203-‘3’  Grand 301-‘3’

Business & Service Computing Laboratory 28 Case study operations executionresources utilization We analyze the underperforming BMPs (block 2XXX_110 and 1XXX_110) in from the operations execution and resources utilization perspectives We compare the difference between planned operations flow, which is managed by production schedulers, and the actual operations flow of block 2XXX_110 operations execution For the analysis of underperforming block from operations execution perspective The actual operations flows for all best-practice blocks are the same as the planned operations flow The actual operations flows for the underperforming BMPs are different from the planned operations flows very similar operation characteristics Grand 201-‘P’ and Grand 201-‘3’ have very similar operation characteristics, but the work shop and items for these are different As a result, block 2XXX_110 might have incurred a longer waiting time and execution time On the other hand The Grand 201-‘3’ was chosen discretionally by the worker for its similar operation characteristics

Conclusion Business & Service Computing Laboratory 29 We proposed an integrated approach to BMP performance evaluation in the shipbuilding industry by using process mining (PM) and DEA Through application of the proposed approach, we verified its effectiveness and practicality Shipbuilding operations experts, moreover, agreed that the provided guidelines can be valuable in establishing additional strategies for improving the performance and productivity of block manufacturing It can be said that this research makes a constructive contribution to practical block performance evaluation in the shipbuilding industry

30

United Network for Organ Sharing (UNOS) Many variables and observations related to lung and heart transplants. Need for fair and accurate predictions of survival time and quality of life. Ability for medical professionals to accurately predict best donor/recipient pairings may be flawed/biased. Variables contributing towards accurate predictions may be many, complex, and have poorly understood relationships. Reduction of large datasets is important. 31

Data concerning donor/recipient for lung/heart transplants. Over 400 variables and 100,000+ observations  BIG DATA ANALYTICS 24 variables chosen by Oztekin et al. [2] Can reduce to 12,744 observations from cleaning. Dataset VariablesExplanationVariable type Donor AgeYearsCont. Recipient AgeYearsCont. ABO_MATABO match levelOrdinal EINTEthnicity match levelBinary GINTGender match levelBinary GTIMEGraft survival timeCont. Etc… 32

Variables are chosen according to contribution Data is preprocessed using DEA ANN is trained Predictions DEANN Methodology Metrics chosen according to importance with no need to be few in number. Preprocessing with DEA allows better training of ANN. ANN is applicable for “fuzzy” situations. 33

DEANN Methodology 34 12,744 records

Stratification yielded 12 efficiency levels. Individual levels yielded a higher correlation between the recipient functional status and the input variables when compared to consideration of many (or all) levels. The ANN is trained using one or more of these levels using ten-fold cross validation. DEA allows efficient observations to be utilized so that outlying transplants do not result in poor training of the ANN. DEANN allows the ANN to be trained from efficient data which will result in accurate predictions/faster training time. DEA Results 35