Download presentation
Presentation is loading. Please wait.
Published byGabriella Jefferson Modified over 8 years ago
1
1 Exploring Data Mining Implementation By Karim Hirji, IBM Canada Chichang Jou, Tamkang University
2
2 Motivation Traditional Statistical techniques could not scale to handle millions of records and thousands of variables Data mining emerges to handle the scalability issue How to perform data mining? The study (2001) provides information and experiences about 5-step data mining proposed by Cabena et. al
3
3 5-Stage Data Mining Model (Cabena et al.) 1.Business objective determination 2.Data preparation 3.Data mining 4.Results analysis 5.Knowledge assimilation
4
4 Research Method Case study (Benbasat et. al) –Concerned with the larger question of developing a deeper understanding of “how” data mining should be done –It is important to find a company willing to participate in this study and provide full access to the organization during the time frame of the study –Multiple methods of data collection were used archival records, documentation, interviews, observation
5
5 The Participating Company TAKCO is a mature North America fast- food retailer Its Canadian headquarter is in Toronto Interesting aspects of the fast-food industry – Consumer-driven –Striving toward operational efficiency –Extensive marketing analysis
6
6 Data Collection Direct observation is the primary data collection method –Comments recorded and probed –Notes reviewed after each site visit for content –Final data analysis after gathering qualitative data from all site visits, structured by comparing to Cabena model stage-by-stage –Totally 10 site visits from 1998/07 to 1998/11
7
7 Members in the Data Mining Project A data mining specialist A project manager A senior director of strategic planning (the executive sponsor) A research supervisor A business analyst An end-user analyst A data architect, and A database administrator (DBA) The executive sponsor and project manager decided that the entire team should be present during key project activities. Accordingly, the data mining activities were highly interactive.
8
8 Project Time Line An enterprise-wide transaction data warehouse of 30 gigabyte were completed eight months ago. Tool: IBM Intelligent Miner for Data Functions: clustering, associations, predicting
9
9 Project Outcome The executive sponsor –Not a failure Completed on time and within budget –No completely new and unexpected results
10
10 First Visit The executive sponsor and project manager discuss the final parameters of the DM project –Candidate business problems identified –How much historical transaction data to mine –Received a formal project number and budget
11
11 not to develop a production data mining application
12
12 Stage 1 A workshop held in 1998/09 to identify the business problem to be mined –Team members introduced –Roles and responsibilities assigned –High-level project plan developed Extensive discussion about original candidate business problems –Immediate obstacle: to ensure supporting data was readily available –Input from the data architect and DBA invaluable 2 out of 3 original business problems replaced, due to data issues –The research supervisor played a dominant role in framing the business problems
13
13 Two Project Discontinuity Points in Stage 1 1.Anticipation –Expectations about the potential to deliver novel and interesting findings –Goal alignment was employed to provide focus and clarity –Emphasis placed on establishing and reaching consensus on a realistic, measurable, and achievable business goal and project goal –Agreed business goal: understanding DM technology benefits to enhance business decision making –Agreed project goal: Demonstrating DM potential to provide new and valuable insights into a subset of existing production system data
14
14 Two Project Discontinuity Points in Stage 1 2.Anxiety/Apprehension –Concerns about the nature of the data preparation stage and the potential bias and noise in the data set –With data quality efforts in the data warehouse project, members concerned about incorrect interpretation and improper transformation –A data audit stage added after data preparation to demonstrate validity, reliability, consistency, completeness and integrity of the resulting transformed data set Minimize the danger of automatically dismissing potential anomalous and relevant DM results
15
15 One Project Discontinuity Point in Interactive Data Mining and Results Analysis Stage 3.Frustration –OLAP already used extensively to gain knowledge about product offerings and fast-food customer profiles –“I already know that” comment –Back end data mining, involving data enrichment and additional DM algorithm execution, introduced to increase the dimensionality of the data set with 3rd- party demographic data Effective in providing different and interesting analysis results
16
16 Implications and Discussion 1.A DM project appears to follow a more elaborated set of stages than previously reported 2.Unlike other work, data preparation in this study is not the most resource intensive stage 3.Several important process aspects relevant for the Interactive DM and Results Analysis stage A DM briefing would have made the stage more efficient and effective without “I don’t understand what this means” comment The DM specialist worked as a facilitator Linking DM results with business strategy and using application software to perform sensitivity analysis Product combinations Importance of contextualizing the stage with business strategy Use spreadsheet to perform sensitivity analysis
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.