Basics of Statistical Analysis
Basics of Analysis The process of data analysis Example 1: –Gift Catalog Marketer –Mails 4 times a year to its customers –Company has I million customers on its file ObservationDataInformation Encode Analysis
Example 1 Cataloger would like to know if new customers buy more than old customers? Classify New Customers as anyone who brought within the last twelve months for first time. Analyst takes a sample of 100,000 customers and notices the following.
Example orders received in the last month 3000 (60%) were from new customers 2000 (40%) were from old customers So it looks like the new customers are doing better
Example 1 Is there any Catch here!!!!! Data at this gross level, has no discrimination between customers within either group. –A customer who bought within the last 11 days is treated exactly similar to a customer who bought within the last 11 months.
Example 1 Can we use some other variable to distinguish between old and new Customers? Answer: Actual Dollars spent ! What can we do with this variable? –Find its Mean and Variation. We might find that the average purchase amount for old customers is two or three times larger than the average among new customers
Numerical Summaries of data The two basic concepts are the Center and the Spread of the data Center of data - Mean, which is given by - Median - Mode
Numerical Summaries of data Forms of Variation –Sum of differences about the mean: –Variance: –Standard Deviation: Square Root of Variance
Confidence Intervals In catalog eg, analyst wants to know average purchase amount of customers He draws two samples of 75 customers each and finds the means to be $68 and $122 Since difference is large, he draws another 38 samples of 75 each The mean of means of the 40 samples turns out to be $ How confident should he be of this mean of means?
Confidence Intervals Analyst calculates the standard deviation of sample means, called Standard Error (SE). (For our example, SE is 12.91) Basic Premise for confidence Intervals –95 percent of the time the true mean purchase amount lies between plus or minus 1.96 standard errors from the mean of the sample means. C.I. = Mean (+or-) (1.96) * Standard Error
Confidence Intervals However, if CI is calculated with only one sample then Standard Error of sample mean = Standard deviation of sample Basic Premise for confidence Intervals with one sample –95 percent of the time the true mean lies between plus or minus 1.96 standard errors from the sample means.
16-12 Example 2: Confidence Intervals for response rates You are the marketing analyst for Online Apparel Company You want to run a promotion for all customers on your database In the past you have run many such promotions Historically you needed a 4% response for the promotions to break-even You want to test the viability of the current full- scale promotion by running a small test promotion
© 2007 Prentice Hall16-13 Example 2: Confidence Intervals for response rates Test 1,000 names selected at random from the full list. The test sample returns 3.8%. You construct CI based on sample rate of 3.8% and n=1000 Confidence Interval= Sample Response ± 1.96*SE The SE=.006, and CI is (0.032, 0.044) In our case C.I. = 3.2 % to 4.4%. Thus any response between 3.2 and 4.4 % supports hypothesis that true response rate is 4%
16-14 Example 2: Confidence Intervals for response rates So if sample response rate is 3.8%. Then the true response rate maybe 4% What if the sample response rate were 5% ? Regression towards mean: Phenomenon of test result being different from true result Give more thought to lists whose cutoff rates lie within confidence interval