CRISP-DM Tommy Wei Cory Hutchinson ISDS 4180. Overview What is CRISP-DM (CRoss Industry Standard Process for Data Mining) Blueprint Phases and Tasks Summary.

Slides:



Advertisements
Similar presentations
Testing Relational Database
Advertisements

A Systems Approach To Training
Life Science Services and Solutions
Test Automation Success: Choosing the Right People & Process
Software Quality Assurance Plan
Enterprise Resource Planning
Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Business Systems Intelligence: 7. B.I. Methodologies Dr. Brian Mac Namee (
CSE634 Data Mining Prof. Anita Wasilewska Jae Hong Kil ( )
NCTM’s Focus in High School Mathematics: Reasoning and Sense Making.
Data Mining.
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Computers: Tools for an Information Age
Chapter 4: Beginning the Analysis: Investigating System Requirements
The Software Product Life Cycle. Views of the Software Product Life Cycle  Management  Software engineering  Engineering design  Architectural design.
Developing the Marketing Plan
CHAPTER 19 Building Software.
Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Chapter 17 Acquiring and Implementing Accounting Information Systems
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Application of SAS®! Enterprise Miner™ in Credit Risk Analytics
Dr. Awad Khalil Computer Science Department AUC
Data Mining Techniques
More on Data Mining KDnuggets Datanami ACM SIGKDD
S/W Project Management
Chapter 10.
System Implementation. System Implementation and Seven major activities Coding Testing Installation Documentation Training Support Purpose To convert.
Overview of the Database Development Process
1 Software Testing (Part-II) Lecture Software Testing Software Testing is the process of finding the bugs in a software. It helps in Verifying and.
University of Palestine software engineering department Testing of Software Systems Fundamentals of testing instructor: Tasneem Darwish.
S oftware Q uality A ssurance Part One Reviews and Inspections.
Do it pro bono. Strategic Scorecard Service Grant The Strategy Management Practice is presented by Wells Fargo. The design of the Strategic Scorecard Service.
The CRISP-DM Process Model
ITEC224 Database Programming
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Data Mining Process A manifestation of best practices A systematic way to conduct DM projects Different groups has different versions Most common standard.
What is a Business Analyst? A Business Analyst is someone who works as a liaison among stakeholders in order to elicit, analyze, communicate and validate.
Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
Management & Development of Complex Projects Course Code MS Project Management Project Life Cycle & PM Process Groups Lecture # 4.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
1 Introduction to Software Engineering Lecture 1.
Information Systems Engineering. Lecture Outline Information Systems Architecture Information System Architecture components Information Engineering Phases.
The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
The Implementation of BPR Pertemuan 9 Matakuliah: M0734-Business Process Reenginering Tahun: 2010.
Smart Home Technologies
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
The Marketing Plan Vishnu Parmar, IBA, University of Sindh, Jamshoro.
1 The Requirements Problem Chapter 1. 2 Standish Group Research Research paper at:  php (1994)
CHAPTER 8 DATA MINING BASICS.
Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
Data Mining Copyright KEYSOFT Solutions.
Software Engineering Lecture 10: System Engineering.
Introduction to System Analysis and Design MADE BY: SIR NASEEM AHMED KHAN DOW VOCATIONAL & TECHNICAL TRAINING CENTRE.
MANAGEMENT INFORMATION SYSTEM
4 Chapter 4: Beginning the Analysis: Investigating System Requirements Systems Analysis and Design in a Changing World, 3 rd Edition.
What we mean by Big Data and Advanced Analytics
CSE634 Data Mining Prof. Anita Wasilewska Jae Hong Kil ( )
Office of Education Improvement and Innovation
Software life cycle models
Project Management Process Groups
CRISP Process Stephen Wyrick.
KEY INITIATIVE Financial Data and Analytics
Presentation transcript:

CRISP-DM Tommy Wei Cory Hutchinson ISDS 4180

Overview What is CRISP-DM (CRoss Industry Standard Process for Data Mining) Blueprint Phases and Tasks Summary

CRISP-DM A guide or blueprint as to how to conduct a data mining project Breaks down life cycle of a data mining project into 6 phases Developed to give a standardized approach towards data mining projects Intended for better, faster results from data mining

Why a Standard Process? There was a clear need for data mining, but no sense of direction as to how organizations launch their own data mining projects Before data mining was very scattered Used to encourage good habits and best practices Makes it reliable and repeatable with people who have little data mining experience Monitoring and maintenance is easier

CRISP-DM Creation Created by 4 data mining veterans DaimlerChrysler ISL NCR OHRA SIG group created to develop a standard data mining process Association of data mining enthusiasts got together, large input from wide range of people Data miners, data warehousing vendors, management consultants Started to refine and improve model, had live trials for data mining projects

CRISP-DM: Process Flow Data Mining Methodology For all businesses Complete Outline Life Cycle: 6 Phases

CRISP-DM: 6 Phases Business Understanding Understanding the business objectives, business goals, how can data mining help in this regard Data understanding Start with a data set, increase familiarity, get some insight and identify any data quality issues Data preparation All activities included to make the final data set that will be used in the different modeling techniques Modeling Choose a modeling technique, create the model design and test it Evaluation Thoroughly evaluate the model and the results to see if it meets the business objectives, process review Deployment Can be using the model to create a dashboard or report, or putting the data mining process across the entire organization

Phases and Tasks Business Understanding Data Understanding Determine Business Objectives Assess the Situation Determine the Data Mining Goals Produce a Project Plan Collect the Initial Data Describe the Data Explore the Data Verify Data Quality Select Data Clean Data Construct Data Integrate Data Plan Deployment Plan Monitoring Maintenance Produce Final Report Data Preparation Modeling Evaluation Deployme nt Format Data Select the Modeling Technique Generate Test Design Build the Model Assess the Model Evaluate Results Review Process Determine Next Steps

Phase 1: Business Understanding Summary: Focuses on project objectives, requirements from a business perspective. Then converting that knowledge into problems or thoughts that can be solved with data mining. Rough outline of what to do to achieve the objectives.

Phase 1: Business Understanding Determine the business objectives: Have a deep understanding of what the client wants, from a business perspective, what they REALLY want accomplished Understand any business related questions associated with it Assess the situation: A more detailed understanding of what resources you need as well as any constraints, potential obstacles and assumptions you might need to make More specific details are found here Determine the data mining goals: Determine the data mining objectives that need to be completed in order to achieve this business goal EX. Business goal: Increase our overall restaurant sales in the northeast and southeast regions of the US Data mining goal: predict how well people from those specific regions embrace our flavor of food given data from several franchises in the past 3 years, demographic information, price of item, and other intangible factors such as culture, brand recognition, and reputation

Phase 1: Business Understanding Produce a project plan Project the goals that data miners want to achieve in order to get closer to achieving the business goals. What do data miners have to achieve in order to achieve those business goals EX. Business goal: To reduce churn rate for our internet provider company Data mining goals: Identify the characteristics of high value customers based on the most recent 5 years of data Identify which customers left after 1 year of service Build a mathematical model (logistic regression) to determine which customer is most likely to leave within 3 years of service

Phase 2: Data Understanding Summary: It starts with some data already collected and proceeds with activities in order to get more familiar with the data set. Identify data quality problems Discover data insight Detecting subsets Extracting hidden information.

Phase 2: Data Understanding Collect the initial data Acquire the necessary data to complete data mining goals and the entire project Loading data, and possibly integrating data if you are taking data from multiple data sources Describe the data Examine the properties of the acquired data, do you have everything you need? EX. Data formatting, quantity of data, number of records, fields within each table, datatype within each field Explore the data You start to tackle data mining questions, you start using querying, visualization and reporting Aggregations, relationships between data, subsets of data

Phase 2: Data Understanding Verify data quality examine the quality of data, is everything you need there? Are there any missing gaps? Does the data make any sense? The spelling? Any ambiguity?

Phase 3: Data Preparation Summary: 50% to 70% of the time will be spent on this phase. All the activities used to construct the final dataset from the original raw data. A lot of steps will be taken to prepare the data. Selecting certain tables, records, attributes, doing some conversions and transformations, data cleaning

Phase 3: Data Preparation Select data Decide on the data to be used for analysis Defines which attributes and which records and tables are selected Data types and data volume that you want Relevance to data mining goals Clean data Make sure data quality is at a high level Removing corrupt, inaccurate, or duplicate data from table, record or database Construct data This is where you start preparing the final data set Create derived attributes, new records, transform and format data (date for example) Integrate data This is where you combine information from multiple tables into one and create new records or values Maybe join multiple data source Mathematical calculations on the data, and group them a certain way

Phase 3: Data Preparation Format data This is extra formatting required in order for the data set to be accepted into the modeling tool The design of the data, illegal characters

Phase 4: Modeling Summary: Time to select a modeling technique for the data set you finalized based on the data mining goals and objectives. You will have to set the parameter settings to optimize results and then compare results if you used several modeling techniques.

Phase 4: Modeling Select the modeling technique Time to select the actual modeling technique you will use on your data set Examples are: decision trees, sequential patterns, linear/logistic regression, clustering, categorical analysis, segmentation Generate test design Make sure you have a way to test the model’s quality and validity Have a training data set that you built your model off of and then test that on a test data set to see its accuracy EX. For categorical analysis, run the model on a test data set and compare those results to the real results. Did it categorize everything correctly? What was the error rate? Build the model Time to run the model you built on the data set and see the results

Phase 4: Modeling Assess the model judge the success of the data mining model based on the results, data mining success criteria, desired test design Make sure to contact business analysts and domain experts to discuss the results in a business context, see if it makes sense Consider if it is a good model that can be given to others in the organization

Phase 5: Evaluation Summary: Thoroughly evaluate the model. Review the steps that were executed to construct the model to make sure it properly aligns with the business objectives. Make sure all important business issues have been considered. At the end, you should decide whether you want to keep this data mining model and the results or not.

Phase 5: Evaluation Evaluate results Assess if the model and results meet business requirements Is there any reason at all that this data mining model is deficient? Did it give you everything you want? Test the model multiple times in the real world Document any challenges, useful tips, information and hints for future reference Review process Did we correctly build the model? Is there any important factor or task that we left out or overlook? Determine next steps Decide where to proceed next: move to deployment, run the model a few more times with new data sets, or set up new data mining projects Includes analysis of remaining resources and budget to determine next steps

Phase 6: Deployment Summary: In this phase, you are going to determine how the results will be used. Who will use them, how often? The model and the knowledge gained will need to be given in a way so clients will understand it and other people can run the model throughout the organization. It can be as simple as making a report or implementing a repeatable data mining process across the enterprise.

Phase 6: Deployment Plan deployment Takes the results and develops a strategy on how the data results will be sent throughout the organization Plan monitoring and maintenance Need to teach people how to independently operate and maintain the data mining model if it becomes part of the day to day business Teach people how to correctly use the data mining results Produce final report Project leader and team write up a final report Can be a summary of project and experiences Can be a comprehensive presentation of the data mining results

Summary CRISP-DM A way to design a data mining model that is reliable and repeatable by people with little data mining skills Provides a uniform framework Flexible to account for differences in data and business problems and objectives