Lecture 2 - 1 DSCI 4520/5240 DATA MINING MYRAW Nonprofit donor data MYRAW Overview Determine who is likely to donate to a non-profit organization campaign.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Interval Heaps Complete binary tree. Each node (except possibly last one) has 2 elements. Last node has 1 or 2 elements. Let a and b be the elements in.
1 Data Mining: and Knowledge Acquizition — Chapter 5 — BIS /2014 Summer.
Brief introduction on Logistic Regression
Extension The General Linear Model with Categorical Predictors.
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
Decision Tree.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
Data Mining Techniques Outline
Final Project: Project 9 Part 1: Neural Networks Part 2: Overview of Classifiers Aparna S. Varde April 28, 2005 CS539: Machine Learning Course Instructor:
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Tree-based methods, neutral networks
Lecture 5 (Classification with Decision Trees)
Sociology 601: Class 1, September Syllabus Course website Objectives Prerequisites Text Homeworks Class time Exams Grading Schedule.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Quantitative Data Analysis Definitions Examples of a data set Creating a data set Displaying and presenting data – frequency distributions Grouping and.
Chapter 5 Data mining : A Closer Look.
Decision Tree Models in Data Mining
May 21, 2009 Stata for micro targeting using C++ and ODBC Masahiko Aida.
Lecture 8 MARK2039 Summer 2006 George Brown College Wednesday 9-12.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Data Mining Techniques
Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute.
Building And Interpreting Decision Trees in Enterprise Miner.
Slides for “Data Mining” by I. H. Witten and E. Frank.
by B. Zadrozny and C. Elkan
Introduction to Statistics What is Statistics? : Statistics is the sciences of conducting studies to collect, organize, summarize, analyze, and draw conclusions.
Mailing Campaign Model Nan Yang University of Central Florida 04/11/2008.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Saturation Mail: The Cost Effective Way to Reach More Customers.
Graph Theory in Computer Science Greg Stoll November 22, 2008.
Probability & Statistics – Bell Ringer  Make a list of all the possible places where you encounter probability or statistics in your everyday life. 1.
Chapter 9 – Classification and Regression Trees
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Modul 6: Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of.
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division Methodology Branch, Statistics Canada UNECE April 28-30, 2014.
Assignment (1) WEEK 3.
Notes 1.3 (Part 1) An Overview of Statistics. What you will learn 1. How to design a statistical study 2. How to collect data by taking a census, using.
Use of Guest Speakers David A. Kravitz August 7, 2015 Academy of Management Annual Meeting PDW: Ramping up our game! How we can do a better job teaching.
Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.
Lesson 4 - Topics Creating new variables in the data step SAS Functions.
Lecture Notes for Chapter 4 Introduction to Data Mining
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
Chapter 17 Preparing Data for Mining. 2 Introduction Just as manufacturing and refining are about transformation of raw materials into finished products,
Monday, February 22,  The term analytics is often used interchangeably with:  Data science  Data mining  Knowledge discovery  Extracting useful.
Machine Learning in Practice Lecture 8 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
Nearest Neighbour and Clustering. Nearest Neighbour and clustering Clustering and nearest neighbour prediction technique was one of the oldest techniques.
Pattern Recognition Lecture 20: Data Mining 2 Dr. Richard Spillman Pacific Lutheran University.
EMPA Statistical Analysis
Lecture Notes for Chapter 2 Introduction to Data Mining
Intro to Machine Learning
Prediction as Data Mining Task
Introduction to Data Mining and Classification
BUS 308 HELPS Lessons in Excellence-- bus308help.com.
BUS 308 HELPS Perfect Education/ bus308helps.com.
BUS 308 HELPS Education for Service-- bus308helps.com.
Advanced Analytics Using Enterprise Miner
Machine Learning in Practice Lecture 17
Joni Nunnery and Helmut Schneider
Decision trees MARIO REGIN.
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Lecture DSCI 4520/5240 DATA MINING MYRAW Nonprofit donor data MYRAW Overview Determine who is likely to donate to a non-profit organization campaign and target them for donation solicitation

2 The Scenario A nonprofit organization wants to send greeting cards to lapsed donors to encourage them to make a new donation. The organization wants to build a model to predict those most likely to donate. The types of information available include personal information such as age, gender, and income past donation information such as average gift and time since first donation census tract information based on the donor’s address such as the percent of households with at least one member that works for the federal, state, or local government.

Lecture DSCI 4520/5240 DATA MINING MYRAW data

Lecture DSCI 4520/5240 DATA MINING MYRAW data

Lecture DSCI 4520/5240 DATA MINING Variable roles The first several variables (Age through FirstTime) have the measurement level interval because they are numeric in the data set and have more than 20 distinct levels. The model role for all interval variables is set to input by default. The variables Gender and Homeowner have the measurement level binary because they have only two different nonmissing levels. The model role for all binary variables is set to input by default.

Lecture DSCI 4520/5240 DATA MINING Variable roles The variable IDCode is listed as a nominal variable because it is a character variable with more than two nonmissing levels. Furthermore, because it is nominal and the number of distinct values is greater than 20, the IDCode variable has the model role ID. If IDCode had been stored as a number, it would have been assigned an interval measurement level and an input model role. The variable Income is listed as a nominal variable because it is a numeric variable with more than two but no more than ten distinct levels. All nominal variables are set to have the input model role. You will change the level to ordinal.

Lecture DSCI 4520/5240 DATA MINING Variable roles These variables do have useful information, however, and it is the way in which they are coded that makes them seem useless. Both variables contain the value Y for a person if the person has that condition (pet owner for Pets, computer owner for PCOwner) and a missing value otherwise. Decision trees handle missing values directly, so no data modification needs to be done for fitting a decision tree; however, neural networks and regression models ignore any observation with a missing value, so you will need to recode these variables to get at the desired information. For example, you can recode the missing values as a U, for unknown. You do this later using the Impute node.