Lecture 7 MARK2039 Summer 2006 George Brown College Wednesday 9-12.

Slides:



Advertisements
Similar presentations
Contact strategy.
Advertisements

Describing Quantitative Variables
RFM Analysis Collaboration Excercise Chapter 9 (page. 366) MGS Group F Maria Del Moral, Marcela Lascano Brian Varela, Nayon Powlett.
Data Analytics : A powerful insight into your donors’ giving potential Insight SIG 19th February, 2013.
Types & Typical Applications of DWH
How Abacus solutions can increase your ROI Abacus Insights Event – Wednesday 1 st October 2014.
How to Read the Equifax Commercial Credit Report.
Telecom Analytics – by Arindam Guptaray. Few words about me... B. TECH FROM IIT KHARAGPUR. MBA (FINANCE) FROM UNIV. OF MINNESOTA, CARLSON SCHOOL. HAVE.
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Quantitative Evidence for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 26, 2009.
University of Washington MBA Program Managing Customer Relationships through Direct Marketing “Lists/RFM” Instructor: Elizabeth Stearns.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Quantitative Evidence for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library March 6, 2009.
Data Mining: A Closer Look
Lecture 5 Geocoding. What is geocoding? the process of transforming a description of a location—such as a pair of coordinates, an address, or a name of.
Enterprise systems infrastructure and architecture DT211 4
Measures of Central Tendency
Targeting Research: Segmentation Birds of a feather flock together, i.e. people with similar characteristics tend to exhibit similar behaviors Characteristics.
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta ACCOLEDS 2007.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
Lecture 8 MARK2039 Summer 2006 George Brown College Wednesday 9-12.
Lecture 4 MARK2039 Winter 2006 George Brown College Wednesday 9-12.
Data Warehousing by Industry Chapter 4 e-Data. Retail Data warehousing’s early adopters Capturing data from their POS systems  POS = point-of-sale Industry.
Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104
Title: Spatial Data Mining in Geo-Business. Overview  Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through.
Data Mining Techniques
Lecture 4 MARK2039 Winter 2006 George Brown College Wednesday 9-12.
Planning & Available from home, 24/7 Class handout Reference books list with explanations and examples (click on the cover)
Chapter 10 Hypothesis Testing
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
1 Marketing Research Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides.
Building profitable customer loyalty
Using Data Hygiene. Real World Issues Focus moves from strategic concepts in identifying individuals to operational concepts. Representative issues Multiple.
SPRING FOR MUSIC (SFM) 2012 Campaign Analysis. Executive summary  Order transactions received: 2,294 total SFM transaction records for 10,095 tickets.
Household Panel Data Reference
Some Key Questions about you Data Damian Gordon Brendan Tierney Brian Mac Namee.
Lecture 9 MARK2039 Summer 2006 George Brown College Wednesday 9-12.
Data and Social Research Chuck Humphrey Data Library Rutherford North Library.
Lecture 6 MARK2039 Winter 2006 George Brown College Wednesday 9-12.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Introduction, or what is data mining? Introduction, or what is data mining? Data warehouse and query tools Data warehouse and query tools Decision trees.
Interpreting Performance Data
Managing Knowledge in Business Intelligence Systems Dr. Jan Mrazek.
 Mail Order Company in USA › Would like to find out if there is a way › To reduce mailing cost › By analyzing the past data.
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
Chapter 3 The Impact of Databases. What is a database? Flat file – Access is slow – Most older legacy systems Relational – Files are linked by a duplicate.
Panel Study of Entrepreneurial Dynamics Richard Curtin University of Michigan.
Boire Filler Group Desired Outcomes: Data Mining 1. Explain the fundamental concepts and business uses of data mining 2. Describe the critical aspects.
Building Marketing Databases. In-House or Outside Bureau? Outside Bureau: Outside agency that specializes in designing and developing customized databases.
Lecture 10 MARK2039 Summer 2006 George Brown College Wednesday 9-12.
Lecture 3 MARK2039 Winter 2006 George Brown College Wednesday 9-12.
Research Your Market Know and understand market segmentation and target marketing elements.
Credit Scoring Update CAS November 14, 2007 John Wilson.
Business Your Library Amy J. Lee Business Resource Coordinator Canton Public Library.
Creating Customer Profiles
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Optimal Database Marketing Drozdenko & Drake, ©
Clustering Algorithms Minimize distance But to Centers of Groups.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
TruVue LLC Visual Decision Support Tools TruVue provides location-based solutions to the healthcare industry for facility and physician network optimization.
 Propensity Model  Propensity Model refers to Statistical Models predicting “willingness” to perform an action like accepting an offer etc. Acquisition.
Nearest Neighbour and Clustering. Nearest Neighbour and clustering Clustering and nearest neighbour prediction technique was one of the oldest techniques.
Stat 101Dr SaMeH1 Statistics (Stat 101) Associate Professor of Environmental Eng. Civil Engineering Department Engineering College Almajma’ah University.
Copyright  2007 McGraw-Hill Pty Ltd PPTs t/a Marketing Research 2e by Lukas, Hair, Bush and Ortinau Slides prepared by Judy Rex 19-1 Chapter Nineteen.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Geo-referenced data and DLI aggregate data sources
Geocoding and Georeferencing
MEASURES OF CENTRAL TENDENCY
Presentation transcript:

Lecture 7 MARK2039 Summer 2006 George Brown College Wednesday 9-12

2 Exam a)Acct ID, Date of Promotion, credit score, postal code b)Index account ID and make the DB relational

3 Exam 1)Col A: mean=8086, median=120 COL. B: mean=15,median=15 Col. C: mean= median= )Normal dist. Is B because mean and median are same. 3)Median as it is not skewed by otliers

4 Exam Str. A: std.dev= – CI:.0361<=.0380<= Str.B: std.dev=.003 -CI:.014<=.02<=.026 Do not use either strategy and continue with existing strategy

5 Exam

6 a)Cube b)dimensions:product type,1 st digit of postal code,payment type Measure: acct Id c)Give me count of all customers who bought prod. A with cash Determine number of customers in postal code, determine number of persons in postal code from Stats Can data. Create penetration index: Number of customers/ number of persons at postal code. Rank postal codes by penetration index and use ranked postal codes to target prospects.

7 Exam Stats Can Census is richer as it has more records(50000 vs for taxfiler Advantage of using Taxfiler data is that data is more recent a)Implementation b)Reducing costs c)Must be one to one in analytical file d)Standard deviation or variation

8 Exam Sample A, although std. dev. is larger, if we look at std. dev. on a relative basis when comparing to the range or magnitude of values in the sample, we will observe that we are getting a much tighter bound around A rather than B Legacy: billing or call detail files,external data such as Stats Can Advantage to building data mart is the following: -data aggregated and summarized-easier to use for analysis -Quicker processing -Easier intrpretation as data deals solely with functional area

9 Exam i)No,ii)yes,iii)No,iv)No,v)yes Prom.Date-interval,not useful,only one value Prom.codes-nominal-not useful-too granular Income-interval-not useful too many missing values Number of children: interval-useful-few missing values Credit decile rank: ordinal-useful-0 missing values

10 Creating the Analytical File-Reviewing Data Dumps Initial dump of 1 st few records

11 Creating the Analytical File-Reviewing Data Dumps Initial dump of 1 st few records

12 Creating the Analytical File-Reviewing Data Dumps View of the Transaction File

13 Creating the Analytical File-Reviewing Data Dumps View of the Promo History File

14 Creating the Analytical File-Reviewing Data Dumps Using your marketing knowledge, give me examples of variables that we might create from the last three slides –Slide 11 –Slide 12 –Slide 13 Slide 11: Age, region of country, tenure Slide 12: Total Amount, Total amount for a given product, and recency of purchase. Slide 13: Total promotions, Total Promotions by Type and recency of last promotion

15 Creating the Analytical File-Data Hygiene and Cleansing Once the data has been dumped in order to view records, typically data hygiene and cleansing have to take place Two key deliverables –Clean name and address information –Standard rules for coding of data values

16 Creating the Analytical File-Data Hygiene and Cleansing Clean Name and Address Information –Market to right Individual –Create Match keys

17 Clean Name and Address Information –Market to right Individual –Create Match keys –Name and Address Standardization BankID Name JONH SMITH JR. Address1 123 WILLIAMS STRET Address2 2ND FLOOR Address3 TRT., O.N. M5G-1F3 Country CDN UnIndivID BankID PreName FirstName Surname JONH SMITH JR. PostName Street1 123 WILLIAMS STRET Street2 2ND FLOOR City TRT Province O.N. Postal Code M5G-1F3 Country CANADA UnIndivID Origin Bank Creating the Analytical File Name and Address Standardization

18 DATA CLEANING Address correction Name parsing Genderizing Casing BankID PreName Mr. FirstName John Surname Smith PostName Jr. Street Williams Street Street2 City Toronto Province ON Postal Code M5G 1F3 Country Canada UnIndivID Origin Bank BankID PreName FirstName Surname JONH SMITH JR. PostName Street1 123 WILLIAMS STRET Street2 2ND FLOOR City TRT Province O.N. Postal Code M5G-1F3 Country CANADA UnIndivID Origin Bank Creating the Analytical File- Creating the Analytical File-Name and Address Standardization

19 Creating the Analytical File-Merge Purge of Names What are the reasons for creating unique match customer keys –Generating a marketing list –Conducting analysis Should the match keys be the same for both above scenarios? No, tighter matchkeys in generating lists and looser matchkeys when conducting analysis What are the situations when match keys that are numeric? When dealing with existing customer data where you are matching Files involving only existing customer data.

20 Creating the Analytical File-Merge Purge of Names Common fields to use in creating Match keys First Name; Surname; Unique Individual ID; Postal Code Credit Card Number Duns Number for Businesses Phone Number Unique I.D’s or number type I.D’s are the preferred choice when creating match keys Let’s take a closer look at creating match keys using name and address

21 Creating the Analytical File-Merge Purge of Names Let’s take a look at 6 records and see what this means.

22 Creating the Analytical File-Merge Purge of Names Example: You have one record here: –Richard Boire-4628 Mayfair Ave. H4B2E5 –How would you use the above information for a backend analysis if I were a responder to an acquisition campaign? BOIREH4B2E5 –What about if you were conducting analysis on me as an existing customer who responded to a cross-sell campaign. –Need only customer id –How about if you wanted to send me a direct mail piece –BOIRERICHARDH4B2E54628MAYFAIR

23 Creating the Analytical File- Data standardization Refers to a process where values from a common variable from different files are mapped to the same value. Some common examples: SIC Code Industry Classification Table –Industry categories have common set of codes Postal Code Variable –Postal code has to have 6 digits comprised of alpha,numeric,alpha,numeric,alpha,numeric which exclude the following alphas: D,F,O,Q,U, and Z. Give me examples of bad postal codes vs. good postal codes. –D4B2E5, H442E6,etc. are bad postal codes. –M5J1A1, A1A1A3,etc. are good postal codes

24 Creating the Analytical File- Data Standardization Here is an example of how disposition codes for telemarketing outcomes might be handled

25 Creating the Analytical File- Data Standardization Postal Code Standardization –Six digit code comprising Alpha,numeric,alpha,numeric,alpha,numeric –1 st letters: A,B,C,E,G,H,J,K,L,M,N,P,R,S,T,V,X,Y SIC(Standard Industry Code Classification –4 digit code used to classify all companies into standard set of industries

26 Creating the Analytical File- Data standardization Example: –You have been asked to build retention model You have two years worth of transaction data. Changes in the product category codes occurred six months ago. Key information that you would look at would be as follows: Income category Product Category Transaction Codes Transaction Amount Postal Code Transaction Date Gender What would you need to do Need to map the old product category code definitions from prior to six months ago to the new product category code definitions

27 Geocoding is the process that assigns a latitude-longitude coordinate to an address. Once a latitude-longitude coordinate is assigned, the address can be displayed on a map or used in a spatial search. Data miners often use these coordinates to calculate such things as “distance to the nearest store” Creating the Analytical File- Geo-Codingn

28 Demographic AnalysisPopulationCountPopulationCount AgeDistributionAgeDistribution Average Age StoreLocationStoreLocation GeoProfile

29 Creating the Analytical File-What is Geocoding? Let’s look at a sample of what some data might look like? How do we use this data to create meaningful variables? -using the pythagorean theorem where distance**2=lat**2+ longitude**2. This is extremely useful in calculating distance type variables between a customer and a given location

30 Creating the Analytical File-What is Geocoding Example: –A retailer has the following information: Name and address of its customers Address of its stores Stats Can Information –As a marketer, how would you intelligently use this information –Find the distance between the nearest store and a given customer. –Create a trading area around a given store. Find out which stores have the best penetration. At the same time, analyze these best penetration stores and determine some key stats can attributes around these best penetration stores

Region# of Customers% of Total Prairie Provinces25 M2.5% Quebec100 M10% Ontario350 M35% West25 M2.5% Missing Values500 M50% Total1 MM100% Frequency Distribution The report below uses first digit of postal code to assign customers to region. For example, postal codes beginning with ‘G’, ‘H’, or ’J’ represent the Quebec region. Customer Profiling

32 Frequency Distribution

33 Frequency Distribution

34 Creating Variables Source/ Raw File Variables  # in Household  Income  Credit score  Total lifetime spend  Total number of promotions Derived Variables  Region of country  Total spend within certain period  Age  Tenure  Number of promotions in last year by campaign category Example of source variables Example of source variables Example of derived variables Example of derived variables

35 Other variables –Total spend in certain time periods –Total spend by product category in certain time periods –Decline in spend-total & by product type –Trend variables related to spending and product category: Median Mean Variation –Index Variables Grouping of variable into meaningful categories where category values are index values Binary Variables-yes/no type variables such as gender More Creations

36 Creating the Analytical File-Reviewing Data Dumps View of the Transaction File What kind of variables can be derived. What kind of variables can be derived.

37 Creating Binary Groups

38 Creating Indices # of Months% ofResponse Months Since LastCustomersRateIndexSince Last Promotion 116%2.50% %1.50% %3.75% %3.25% %6.00% %4.00%1.14 Average100%3.50%

39 More Variable Creation What would you do here What would you do here Is there any trend? Given that there seems to be no trend or impact between spend and response, it is highly unlikely that further information would be derived from this field. Is there any trend? Given that there seems to be no trend or impact between spend and response, it is highly unlikely that further information would be derived from this field.

40 More Variable Creation What would you do here? What would you do here? Here, this variable in all likelihood would be useful given its trend with response rate. Here, this variable in all likelihood would be useful given its trend with response rate.

41 Stage 3 of Data Mining What stage are we at: –Application of data mining tools Give me some examples of what data miners would be doing in stage 3 –Data discovery Data Audit/Frequency Distribution Analysis, Value Segmentation –Models,profiles,etc. –Post Campaign Analysis –Reporting i.e such as standard KBM-Key Business Measure Reports –AdHoc Reports Modelling and profiling represent some examples of what we might be doing in this stage.

42 Types of Predictive Models Examples:Discrete Models –Response Models Cross Sell Upsell Acquisition –Attrition Models –Product Affinity Models –Risk Models

43 Types of Predictive Models Examples-Continuous Models –Profitability/Value Models –Spending Models What is the concept of the objective function or dependant variable? –This the variable that we trying to predict Response,bad credit,defection,spend,etc. –What are we trying to optimize essentially becomes our objective function. –This is the variable we are trying to predict