Download presentation
Presentation is loading. Please wait.
Published byPierce Nash Modified over 9 years ago
1
DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of DaytonMBA 66413 APR 09
2
MBA 664, Team #12 Data Mining: Outline Introduction Applications / Issues Products Process Techniques Example
3
MBA 664, Team #13 Introduction Data Mining Definition – Analysis of large amounts of digital data – Identify unknown patterns, relationships – Draw conclusions AND predict future Data Mining Growth – Increase in computer processing speed – Decrease in cost of data storage
4
MBA 664, Team #14 Introduction High Level Process – Summarize the Data – Generate Predictive Model – Verify the Model Analyst Must Understand – The business – Data and its origins – Analysis methods and results – Value provided
5
MBA 664, Team #15 Applications / Issues Applications – Telecommunications Cell phone contract turnover – Credit Card Fraud identification – Finance Corporate performance – Retail Targeting products to customers Legal and Ethical Issues – Aggregation of data to track individual behavior
6
MBA 664, Team #16 Data Mining Products Angoss Software (www.angoss.com)www.angoss.com – Knowledge Seeker/Studio – Strategy Builder Infor Global Solutions (www.infor.com)www.infor.com – Infor CRM Epiphany Portrait Software (www.portraitsoftware.com)www.portraitsoftware.com SAS Institute (www.sas.com)www.sas.com – SAS Enterprise Miner – SAS Analytics SPSS Inc (www.spss.com)www.spss.com – Clementine
7
MBA 664, Team #17 Angoss Knowledge Studio
8
MBA 664, Team #18 SAS Institute
9
MBA 664, Team #19 SPSS Inc.
10
MBA 664, Team #110 Data Mining Process No uniformly accepted practice 2002 www.KDnuggets.com surveywww.KDnuggets.com – SPSS CRISP-DM – SAS SEMMA
11
MBA 664, Team #111 Data Mining Process SPSS CRISP-DM – CRoss Industry Standard Process for Data Modeling – Consortium: Daimler-Chrysler, SPSS, NCR – Hierarchical Process – Cyclical and Iterative
12
MBA 664, Team #112 Data Mining Process CRISP-DM
13
MBA 664, Team #113 Data Mining Process SAS SEMMA – Model development is focus – User defines problem, conditions data outside SEMMA Sample – portion data, statistically Explore – view, plot, subgroup Modify – select, transform, update Model – fit data, any technique Assess – evaluate for usefulness
14
MBA 664, Team #114 Data Mining Process Common Steps in Any DM Process – 1. Problem Definition – 2. Data Collection – 3. Data Review – 4. Data Conditioning – 5. Model Building – 6. Model Evaluation – 7. Documentation / Deployment
15
MBA 664, Team #115 Data Mining Techniques Statistical Methods (Sample Statistics, Linear Regression) Nearest Neighbor Prediction Neural Network Clustering/Segmenting Decision Tree
16
MBA 664, Team #116 Statistical Methods Sample Statistics – Quick look at the data – Ex: Minimum, Maximum, Mean, Median, Variance Linear Regression – Easy and works with simple problems – May need more complex model using different method
17
MBA 664, Team #117 Example: Linear Regression Customer Income Total Purchase Amount
18
MBA 664, Team #118 Nearest Neighbor Prediction Easy to understand Used for predicting Works best with few predictor variables Based on the idea that something will behave the same as how others “near” it behave Can also show level of confidence in prediction
19
MBA 664, Team #119 Distance from Competitor Population of City B A A A A AA A U B B BB A C C C C Product Sales by Population of City and Distance from Competitor A: > 200 units B: 100 – 200 units C: < 100 units Example: Nearest Neighbor
20
MBA 664, Team #120 Neural Network Contains input, hidden and output layer Used when there are large amounts of predictive variables Model can be used again and again once confirmed successful Can be hard to interpret Extremely time consuming to format the data
21
MBA 664, Team #121 Example: Neural Network W 1 =.36 W 2 =.64 Population of City Product Sales Prediction Distance from Competitor 0.736
22
MBA 664, Team #122 Clustering/Segmenting Not used for prediction Forms groups that are very similar or very different Gives an overall view of the data Can also be used to identify potential problems if there is an outlier
23
MBA 664, Team #123 Example: Clustering/Segmenting < 40 years >= 40 years Red = Female Blue = Male Dimension B Dimension A
24
MBA 664, Team #124 Decision Trees Uses categorical variables Determines what variable is causing the greatest “split” between the data Easy to interpret Not much data formatting Can be used for many different situations
25
MBA 664, Team #125 Example: Decision Trees F M -.63 n = 24 -.29 n = 24 -.29 n = 24 Change from original score.14 n = 115.58 n = 67 -.46 n = 48 Baseline < 3.75 Baseline >= 3.75 MF.76 n = 51.47 n = 28 1.11 n = 23 Large body type Small body type
26
MBA 664, Team #126 Data Mining Example 1. Problem Definition Improve On-Time Delivery of New Products
27
MBA 664, Team #127 Data Mining Example 2. Collect Data Brainstorm Variation SourcesData Collection Plan
28
MBA 664, Team #128 Data Mining Example 3. Data Review Data Segments TOTAL LEAD TIME by Part Type: p <.05 Level N Mean StDev ----+---------+---------+---------+-- BRACKET 520 x6.76 x3.14 (--*-) DUCT 138 x6.70 x0.40 (----*---) MANIFOLD 44 x9.95 x4.68 (-------*-------) TUBE 47 x3.60 x2.79 (------*-------) ----+---------+---------+---------+-- Pooled StDev = 68.47
29
MBA 664, Team #129 Data Mining Example 5. Build Model
30
MBA 664, Team #130 Data Mining Example 5. Build Model SHIP-DUE = 7.97 + 0.269*(MODEL_CR-DUE) + 0.173*(CR-ISS) + 0.704*(MAN_BOMC) + 0.748*(SCH_ST-MAN) + 0.862*(MOS_MOFIN) [R^2A 4.4%] – {R^2A(1) 76.5%, R^2A(2) 68.0%} Combined Model: 2 separate regressions Design and Manufacturing – combined thru a common term
31
MBA 664, Team #131 Data Mining Example 6. Model Evaluation Model Accurately Reflects Delivery Distribution
32
MBA 664, Team #132 Data Mining Example 7. Document / Deploy Design Release Required for On Time Delivery Due Date
33
MBA 664, Team #133 Data Mining Example 7. Document / Deploy Update Planning and Automate Tracking Requirements Plan Actual
34
MBA 664, Team #134 Data Mining Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.