Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Further advanced methods Chapter 17.

Slides:



Advertisements
Similar presentations
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Advertisements

CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Sampling: Final and Initial Sample Size Determination
Evaluating Diagnostic Accuracy of Prostate Cancer Using Bayesian Analysis Part of an Undergraduate Research course Chantal D. Larose.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Maximum likelihood (ML) and likelihood ratio (LR) test
Chapter 10 Simple Regression.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Statistical Methods Chichang Jou Tamkang University.
Visual Recognition Tutorial
Maximum likelihood (ML) and likelihood ratio (LR) test
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 9-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Chapter 11 Multiple Regression.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
8-1 Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall Chapter 8 Confidence Interval Estimation Statistics for Managers using Microsoft.
Copyright ©2011 Pearson Education 8-1 Chapter 8 Confidence Interval Estimation Statistics for Managers using Microsoft Excel 6 th Global Edition.
7-2 Estimating a Population Proportion
Lecture Slides Elementary Statistics Twelfth Edition
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Lecture II-2: Probability Review
Simple Linear Regression Analysis
Chapter 10 Hypothesis Testing
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Fundamentals of Hypothesis Testing: One-Sample Tests
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Confidence Interval Estimation
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Random Sampling, Point Estimation and Maximum Likelihood.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Introduction Osborn. Daubert is a benchmark!!!: Daubert (1993)- Judges are the “gatekeepers” of scientific evidence. Must determine if the science is.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
Section 10.1 Confidence Intervals
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Confidence Interval Estimation For statistical inference in decision making:
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Chap 8-1 Chapter 8 Confidence Interval Estimation Statistics for Managers Using Microsoft Excel 7 th Edition, Global Edition Copyright ©2014 Pearson Education.
Confidence Interval Estimation For statistical inference in decision making: Chapter 9.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Univariate Gaussian Case (Cont.)
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
Warsaw Summer School 2015, OSU Study Abroad Program Normal Distribution.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Bayesian Estimation and Confidence Intervals Lecture XXII.
LECTURE 33: STATISTICAL SIGNIFICANCE AND CONFIDENCE (CONT.)
Bayesian data analysis
Statistics II: An Overview of Statistics
LECTURE 09: BAYESIAN LEARNING
Parametric Methods Berlin Chen, 2005 References:
Lecture Slides Elementary Statistics Twelfth Edition
Presentation transcript:

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Further advanced methods Chapter 17

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 2 Data mining Data mining is “the exploration of a large set of data with the aim of uncovering relationships between variables” (Oxford Dictionary of Statistics) Also known as Knowledge Discovery in Databases (KDD) Making extensive use of information technology, through the automation of data analysis procedures

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 3 Statistics and data mining Statistics is also exploited, but it is adapted to deal with (very) large data sets Statistical approaches are those who valorize computer intensive methods Data mining merges statistics with other disciplines: Computer science Machine learning Artificial intelligence Database technology Pattern recognition

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 4 Data warehousing The common denominator among the techniques is always the use of very large databases These databases are the outcome of data warehousing, which Organizes all of the data available to a company into a common format allows integration of different data types Allows analysis through data mining The organization of company information in data warehouses requires recognition of linkages of data which relate to the same objects the time dimension (to monitor changes)

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 5 Marketing applications A typical application is market basket analysis customer purchasing patterns are discovered by looking at the databases of transactions in one or more stores of the same chain (e.g. through loyalty cards) the contents of the trolley are analyzed to detect repeated purchases and brand switching behaviors

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 6 Problems with data mining Data mining is a complex and automated process, which faces many risks: Data-sets may be contaminated (affected by error) Data may be affected by selection biases and non-independent observations Automated data analysis could find spurious relationships (as in spurious regression)

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 7 Steps for successful data mining 1.data warehousing 2.target data selection 3.data cleaning 4.preprocessing 5.transformation and reduction 6.data mining 7.model selection (or combination) 8.evaluation and interpretation 9.consolidation and use of the extracted knowledge

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 8 Frequentist vs. Bayesian statistics – the Frequentist paradigm Assumption: true and fixed population parameters exist albeit unknown Statistics can exploit sampling to estimate these unknown parameters Observations are associated with probabilities: the probability of a given outcome for a random event can be proxied by the frequency of that outcome The larger is the sample the closer is the estimated probability to the true probability Example: a linear regression model tries to estimate the true coefficients which link the explanatory variables and the dependent variable using a sample of observations A key concept of the frequentist approach is the confidence interval where a range of values contains the true and fixed value with a confidence level The confidence level is nothing more than the frequency with which an interval contains the true and fixed value considering different random samples.

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 9 The Bayesian approach The unknown parameters in the population are not fixed, but treated as a random variable with their own probability distribution One is allowed to exploit knowledge or beliefs about the shape of the probability distribution which existed prior to estimation Once data are collected, Bayesian methods exploit this information to update this and the final outcome is a posterior distribution which depends on the data and the prior knowledge

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 10 Bayes rule The estimation of the posterior distribution opens the way to Bayesian statistical operations and is based on the Bayes rule which relates the probability of the outcomes of two random events in the following way P(A|B) is the probability of the first random event to generate the outcome A when the second random event has generated the outcome B, thus it is the probability of A conditional on B P(A,B) is the joint probability that both events A and B happen P(B) is the unconditional probability of the event B The Bayes theorem shows that P(A,B) can be also expressed as the product P(B|A)P(A), that is the product between the probability that the event B happens conditional on the outcome A and the probability of the event A

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 11 Bayes estimation To understand the use of the Bayes rule the two random events could be –the value of unknown parameter (A), which in Bayesian statistics is determined by a random variable –the available data (B) which is also the outcome of a random variable since it was obtained through sampling The Bayes theorem says that the probability to obtain the parameter estimate A given the observed sample B (the posterior probability) can be computed through the Bayes rule as a function of the probability of observing sample B when the parameter estimate is A and the unconditional probabilities of the parameter estimate A The unconditional probability of the parameter estimate A is the prior probability

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 12 Use of the Bayes rule The Bayes rule is very helpful when it is easier to estimate P(B|A) than P(A|B) If the probability of having the sample B conditional on the unknown parameter A can be computed, and some prior information on the probability of the parameter A is available the unconditional probability of the sample B is known then it becomes possible to find the probability distribution of the parameter A conditional on the data which is the final objective of estimation

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 13 Unconditional probability The denominator of the Bayes rule can be rewritten as: which means that the unconditional probability of the sample B can be seen as the sum of probabilities of the sample B conditional on all of the possible estimates A j weighted by the probability of each estimate A j

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 14 Estimation Two elements have to be considered 1)P(B|A) is the likelihood function of A, that is the probability of a given set of observations depending on a set of parameters and its generally known (frequentists use it in maximum likelihood methods as well) 2)the denominator of the Bayes rule is a constant and it is generally not necessary to estimate it so that estimation can be based on the following result Where the sign which substitutes the equal sign means that the left-hand side is proportional to the right-hand side

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 15 Example Estimation of a single regression coefficient in a bivariate regression Caviar expenditure (c) as a function of income (i) Data come from a random sample which generates a set of observations included in the vectors (c) (for simplicity consider (i) as the observations of a fixed exogenous variable). The equation is c =  i

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 16 Frequentist estimation of the regression coefficient Start from some assumptions on the probability distribution of the data and the error term –E.g. normal distribution Get point estimates that are the most likely given the observed sample –E.g. maximum likelihood estimates Since the sample is random, confidence intervals can be built for the coefficient estimate

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 17 Bayesian estimation Start with the assumption that caviar expenditure follows a Normal probability distribution (the prior distribution) around its mean, which is equal to Second, assume a given standard deviation for this Normal distribution, e.g. the standard deviation of caviar expenditure is 0.02 Consider the value  =0.05 If the prior distribution holds, we should have that c is normally distributed around 0.05i. Now it becomes necessary to evaluate the probability to get the observed sample c given that  =0.05 Generate c* by multiplying i by 0.05

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 18 Bayesian estimation Considering that c is a random sample from a normal distribution, one can get the likelihood of c* conditional on  =0.05 using the known likelihood function The unconditional (prior) probability that  =0.05 is also known, given that we have assumed that the distribution is normal, with a 0.05 mean and a standard deviation of 0.02 –It means that the probability of  =0.05 is about 20% With a computer and given the prior distribution of , one can compute the unconditional probabilities for all possible values of  and the probabilities of all possible values of c* Using a slightly different notation of the Bayes rule which defines L(  |c) as the likelihood function of the sample c: Where the left-hand side is the (unknown) posterior probability of .

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 19 Posterior distribution As mentioned, for any fixed value of  it is possible to compute –the likelihood function –the unconditional probability using the prior Suppose that for  =0.05 the likelihood of observing the collected data set is 10%. Then, one may compute The above result does not mean that the probability is 2%, since there is a proportionality relationship (not an equality one) However, repeating the experiment for the whole range of values for  allows one to compute the probability distribution for b conditional on the observed sample (the posterior distribution) This ultimately allows one to determine the most likely estimate for . This estimate will be different from 0.05 unless we had an excellent prior.

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 20 Final output The posterior distribution might also differ from the normal distribution (although not in this case) From the posterior distribution it is possible to compute the percentiles (see appendix); thus a 95% Bayesian confidence interval can be obtained by considering the values of  corresponding to the 2.5th percentile and the 97.5th one from the posterior distribution The final result depends on the quality of the prior However, Bayesian statistics have extended the above founding concepts very much and there are many ways to relax the relevance of the prior assumption and check for their robustness For example, there are non-informative priors which do not assume particular knowledge of the parameters as they are uniformly distributed around the maximum range of possible values

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 21 Why Bayesian statistics are becoming so popular One of the reasons for the Bayesian statistics comeback in the 21st century is the fact that the Bayes rule can be applied iteratively This means that the prior distribution can be updated The progress in automated computing power has led to excellent results in estimating complex models through Bayesian methods For example, modern Bayesian methods exploit the posterior distribution to generate a larger number of draws from which estimates are actually computed Bayesian statistics and marketing In a recent article, Rossi and Allenby (2003) have explored the major role that Bayesian methods can play in marketing and include a long and annotated list hypothesis testing with scanner data extensions of conjoint analysis Bayesian multidimensional scaling the multinomial probit many other Bayesian alternatives to frequent multivariate statistics