Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of DaytonMBA 66413 APR 09.

Similar presentations


Presentation on theme: "DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of DaytonMBA 66413 APR 09."— Presentation transcript:

1 DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of DaytonMBA 66413 APR 09

2 MBA 664, Team #12 Data Mining: Outline Introduction Applications / Issues Products Process Techniques Example

3 MBA 664, Team #13 Introduction Data Mining Definition – Analysis of large amounts of digital data – Identify unknown patterns, relationships – Draw conclusions AND predict future Data Mining Growth – Increase in computer processing speed – Decrease in cost of data storage

4 MBA 664, Team #14 Introduction High Level Process – Summarize the Data – Generate Predictive Model – Verify the Model Analyst Must Understand – The business – Data and its origins – Analysis methods and results – Value provided

5 MBA 664, Team #15 Applications / Issues Applications – Telecommunications Cell phone contract turnover – Credit Card Fraud identification – Finance Corporate performance – Retail Targeting products to customers Legal and Ethical Issues – Aggregation of data to track individual behavior

6 MBA 664, Team #16 Data Mining Products Angoss Software (www.angoss.com)www.angoss.com – Knowledge Seeker/Studio – Strategy Builder Infor Global Solutions (www.infor.com)www.infor.com – Infor CRM Epiphany Portrait Software (www.portraitsoftware.com)www.portraitsoftware.com SAS Institute (www.sas.com)www.sas.com – SAS Enterprise Miner – SAS Analytics SPSS Inc (www.spss.com)www.spss.com – Clementine

7 MBA 664, Team #17 Angoss Knowledge Studio

8 MBA 664, Team #18 SAS Institute

9 MBA 664, Team #19 SPSS Inc.

10 MBA 664, Team #110 Data Mining Process No uniformly accepted practice 2002 www.KDnuggets.com surveywww.KDnuggets.com – SPSS CRISP-DM – SAS SEMMA

11 MBA 664, Team #111 Data Mining Process SPSS CRISP-DM – CRoss Industry Standard Process for Data Modeling – Consortium: Daimler-Chrysler, SPSS, NCR – Hierarchical Process – Cyclical and Iterative

12 MBA 664, Team #112 Data Mining Process CRISP-DM

13 MBA 664, Team #113 Data Mining Process SAS SEMMA – Model development is focus – User defines problem, conditions data outside SEMMA Sample – portion data, statistically Explore – view, plot, subgroup Modify – select, transform, update Model – fit data, any technique Assess – evaluate for usefulness

14 MBA 664, Team #114 Data Mining Process Common Steps in Any DM Process – 1. Problem Definition – 2. Data Collection – 3. Data Review – 4. Data Conditioning – 5. Model Building – 6. Model Evaluation – 7. Documentation / Deployment

15 MBA 664, Team #115 Data Mining Techniques Statistical Methods (Sample Statistics, Linear Regression) Nearest Neighbor Prediction Neural Network Clustering/Segmenting Decision Tree

16 MBA 664, Team #116 Statistical Methods Sample Statistics – Quick look at the data – Ex: Minimum, Maximum, Mean, Median, Variance Linear Regression – Easy and works with simple problems – May need more complex model using different method

17 MBA 664, Team #117 Example: Linear Regression Customer Income Total Purchase Amount

18 MBA 664, Team #118 Nearest Neighbor Prediction Easy to understand Used for predicting Works best with few predictor variables Based on the idea that something will behave the same as how others “near” it behave Can also show level of confidence in prediction

19 MBA 664, Team #119 Distance from Competitor Population of City B A A A A AA A U B B BB A C C C C Product Sales by Population of City and Distance from Competitor A: > 200 units B: 100 – 200 units C: < 100 units Example: Nearest Neighbor

20 MBA 664, Team #120 Neural Network Contains input, hidden and output layer Used when there are large amounts of predictive variables Model can be used again and again once confirmed successful Can be hard to interpret Extremely time consuming to format the data

21 MBA 664, Team #121 Example: Neural Network W 1 =.36 W 2 =.64 Population of City Product Sales Prediction Distance from Competitor 0.736

22 MBA 664, Team #122 Clustering/Segmenting Not used for prediction Forms groups that are very similar or very different Gives an overall view of the data Can also be used to identify potential problems if there is an outlier

23 MBA 664, Team #123 Example: Clustering/Segmenting < 40 years >= 40 years Red = Female Blue = Male Dimension B Dimension A

24 MBA 664, Team #124 Decision Trees Uses categorical variables Determines what variable is causing the greatest “split” between the data Easy to interpret Not much data formatting Can be used for many different situations

25 MBA 664, Team #125 Example: Decision Trees F M -.63 n = 24 -.29 n = 24 -.29 n = 24 Change from original score.14 n = 115.58 n = 67 -.46 n = 48 Baseline < 3.75 Baseline >= 3.75 MF.76 n = 51.47 n = 28 1.11 n = 23 Large body type Small body type

26 MBA 664, Team #126 Data Mining Example 1. Problem Definition Improve On-Time Delivery of New Products

27 MBA 664, Team #127 Data Mining Example 2. Collect Data Brainstorm Variation SourcesData Collection Plan

28 MBA 664, Team #128 Data Mining Example 3. Data Review Data Segments TOTAL LEAD TIME by Part Type: p <.05 Level N Mean StDev ----+---------+---------+---------+-- BRACKET 520 x6.76 x3.14 (--*-) DUCT 138 x6.70 x0.40 (----*---) MANIFOLD 44 x9.95 x4.68 (-------*-------) TUBE 47 x3.60 x2.79 (------*-------) ----+---------+---------+---------+-- Pooled StDev = 68.47

29 MBA 664, Team #129 Data Mining Example 5. Build Model

30 MBA 664, Team #130 Data Mining Example 5. Build Model SHIP-DUE = 7.97 + 0.269*(MODEL_CR-DUE) + 0.173*(CR-ISS) + 0.704*(MAN_BOMC) + 0.748*(SCH_ST-MAN) + 0.862*(MOS_MOFIN) [R^2A 4.4%] – {R^2A(1) 76.5%, R^2A(2) 68.0%} Combined Model: 2 separate regressions Design and Manufacturing – combined thru a common term

31 MBA 664, Team #131 Data Mining Example 6. Model Evaluation Model Accurately Reflects Delivery Distribution

32 MBA 664, Team #132 Data Mining Example 7. Document / Deploy Design Release Required for On Time Delivery Due Date

33 MBA 664, Team #133 Data Mining Example 7. Document / Deploy Update Planning and Automate Tracking Requirements Plan Actual

34 MBA 664, Team #134 Data Mining Questions?


Download ppt "DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of DaytonMBA 66413 APR 09."

Similar presentations


Ads by Google