Presentation is loading. Please wait.

Presentation is loading. Please wait.

October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY.

Similar presentations


Presentation on theme: "October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY."— Presentation transcript:

1 October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY Project Management in Data Mining

2 PRESENTATION OUTLINE What is Data Mining? Data Mining Environment Decision Making Process CRISP-DM Methodology Phases of Data Mining Process Flowchart of Data Mining Process (Proposal) Conclusions 2/17October 2, 2015

3 What is Data Mining? “Data mining is the process of discovering useful patterns and trends in large data sets.” (Larose, 2014).  Data mining makes the difference which are used in many areas: health care, banking, finance, insurance, telecommunications, manufacturing, retail, market research, and the public sector. 3/17October 2, 2015

4 Data Mining Environment Database Technology Statistics Database Technology Data Mining Database Technology Machine Learning Other Disciplines Information Science Visualizations 4/17October 2, 2015

5 Decision Making Process DATA INFORMATION KNOWLEDGE DECISIONS ACTION 5/17October 2, 2015

6 CRISP-DM Methodology CRISP-DM focuses data mining on rapid model development and deployment to optimize decisions. CRoss-Industry Standard Process for Data Mining (Shearer, 2000) 6/17October 2, 2015

7 CRISP-DM  The Cross-Industry Standard Process for Data Mining (CRISP-DM) is the dominant data-mining process framework. It's an open standard; anyone may use it. The following list describes the various phases of the process. 7/17October 2, 2015

8 Tasks (bold) and outputs (italic) of the CRISP-DM reference model 8/17October 2, 2015

9 Define Project Data Gathering Data Sources Data Understanding & Data Selection Data Preprocessing Supervised Learning ? Training Dataset Training Dataset Test Dataset Test Dataset Evaluation of Model Performance Classification Methods Clustering Methods or Association Rules Selecting Algorithm & Model Building Measuring Model Performance Evaluate Model Data Preparation Model Implementation Data Mining Phases (Proposal Flowchart) No Yes High Low Dataset Crucial Phase ! Knowledge Representation & Decision October 2, 2015

10 Planing for data mining project  Produce project plan: List the stages in the project, together with duration, resources required, and relations. Define the project Prepare data for data mining modeling Separate data into training and testing parts for performance evaluation Apply alternative algorithms to build model and evaluate the model’s performances Implement the model to generate knowledge and make a decision before action 10/17October 2, 2015

11 Define project  Understand the project objectives and requirements on the first phase of data mining  List the assumptions made by the project and list the constrains on the project  Construct a cost-benefit analysis for the project 11/17October 2, 2015

12 Prepare data for data mining  Collect the data (or datasets),  Select data,  Explore data,  Clean the data,  Reformat data,  Transform data. 12/17October 2, 2015

13 Separate the dataset for performance evaluation  Select the evaluation method Hold-out Cross validation (k-fold cv) Bootstrapping 13/17October 2, 2015

14 Apply alternative algorithms and select the best model  There are several techniques for the same data mining problem type. Some techniques have specific requirements on the form of data. Classification algorithms  k Nearest Neigbour (kNN)  Naive Bayes  Logistic Regression  Decision Trees  Support Vector Machines  Artificial Neural Networks –ANNs Clustering Algorithms Assocation Algorithms  The generated models that meet the selected criteria become approved models. 14/17October 2, 2015

15 Implement the model to make a decision  Creation of the model is generally not the end of the project. Even if the purpose of the model is to increase knowledge of the data.  Apply the model within the organization’s decision making process and then activate. 15/17October 2, 2015

16 CONCLUSIONS 1.Data Mining Techniques are important to discover knowledge which is more meaningful and valuable for decision making. 2.Project management approach is important for succeessful data mining. 3.Each phase of data mining process is important but most important phases are data preparation before modeling and evaluation of model performance after modeling. These crucial phases are usually disregarded or skipped in practice. 4.All phases and sub operations should be planned and scheduled by using project management methods for successful data mining. 16/17October 2, 2015

17  Thank you very much for your attention and listenning.  Are there any questions and suggestions? mebalaban@gmail.com 17October 2, 2015


Download ppt "October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY."

Similar presentations


Ads by Google