Some Preliminaries © 2007 Prentice Hall.

Slides:



Advertisements
Similar presentations
STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Advertisements

I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Data Preparation © 2007 Prentice Hall 14-1.
Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
Describing Data: Numerical
6 - 1 Basic Univariate Statistics Chapter Basic Statistics A statistic is a number, computed from sample data, such as a mean or variance. The.
Chapter XV Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Chapter XV.
Basics of Statistical Analysis. Basics of Analysis The process of data analysis Example 1: –Gift Catalog Marketer –Mails 4 times a year to its customers.
Chapter SixteenChapter Sixteen. Figure 16.1 Relationship of Frequency Distribution, Hypothesis Testing and Cross-Tabulation to the Previous Chapters and.
© 2009 Pearson Education, Inc publishing as Prentice Hall 16-1 Chapter 16 Data Analysis: Frequency Distribution, Hypothesis Testing, and Cross-Tabulation.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Frequency Distribution, Cross-Tabulation and Hypothesis Testing 15.
Copyright © 2010 Pearson Education, Inc Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing.
Skewness & Kurtosis: Reference
© 2007 Prentice Hall16-1 Some Preliminaries. © 2007 Prentice Hall16-2 Basics of Analysis The process of data analysis Example 1: Gift Catalog Marketer.
Statistics in Biology. Histogram Shows continuous data – Data within a particular range.
MARKETING RESEARCH CHAPTER
Basics of Statistical Analysis. Basics of Analysis The process of data analysis Example 1: –Gift Catalog Marketer –Mails 4 times a year to its customers.
Lecture 2 Frequency Distribution, Cross-Tabulation, and Hypothesis Testing.
Chapter SixteenChapter Sixteen. Figure 16.1 Relationship of Frequency Distribution, Hypothesis Testing and Cross-Tabulation to the Previous Chapters and.
Copyright © 2010 Pearson Education, Inc Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing.
CHAPTERS HYPOTHESIS TESTING, AND DETERMINING AND INTERPRETING BETWEEN TWO VARIABLES.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Hypothesis Testing. Steps for Hypothesis Testing Fig Draw Marketing Research Conclusion Formulate H 0 and H 1 Select Appropriate Test Choose Level.
© 2007 Prentice Hall16-1 Some Preliminaries. © 2007 Prentice Hall16-2 Basics of Analysis The process of data analysis Example 1: Gift Catalog Marketer.
Copyright © 2010 Pearson Education, Inc Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 18.
Outline Sampling Measurement Descriptive Statistics:
Hypothesis Testing.
Introduction to Marketing Research
Measurements Statistics
Chapter 12 Chi-Square Tests and Nonparametric Tests
Descriptive Statistics
Analysis and Empirical Results
Chapter 3 Describing Data Using Numerical Measures
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Chapter Fifteen Frequency Distribution, Cross-Tabulation, and
Hypothesis Testing Review
NUMERICAL DESCRIPTIVE MEASURES
Descriptive Statistics
Chapter 12 Using Descriptive Analysis, Performing
Social Research Methods
Numerical Descriptive Measures
MEASURES OF CENTRAL TENDENCY
Introduction to Statistics
Basics of Statistical Analysis
Basic Statistical Terms
Descriptive and inferential statistics. Confidence interval
Chapter Fourteen Data Preparation
Hypothesis Testing.
Numerical Descriptive Measures
Contingency Tables (cross tabs)
Numerical Descriptive Measures
15 Frequency Distribution, Cross-Tabulation and Hypothesis Testing
15.1 The Role of Statistics in the Research Process
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Statistical analysis and its application
Hypothesis Testing S.M.JOSHI COLLEGE ,HADAPSAR
Chapter Nine: Using Statistics to Answer Questions
Chapter Fifteen Frequency Distribution, Cross-Tabulation, and
Numerical Descriptive Measures
Biostatistics Lecture (2).
Basics of Statistical Analysis
CHI SQUARE (χ2) Dangerous Curves Ahead!.
Presentation transcript:

Some Preliminaries © 2007 Prentice Hall

Basics of Analysis The process of data analysis Example 1: Observation Gift Catalog Marketer Mails 4 times a year to its customers Company has I million customers on its file Observation Data Information Analysis Encode © 2007 Prentice Hall

Analyst takes a sample of 100,000 customers and notices the following. Example 1 Cataloger would like to know if new customers buy more than old customers? Classify New Customers as anyone who brought within the last twelve months. Analyst takes a sample of 100,000 customers and notices the following. © 2007 Prentice Hall

5000 orders received in the last month Example 1 5000 orders received in the last month 3000 (60%) were from new customers 2000 (40%) were from old customers So it looks like the new customers are doing better © 2007 Prentice Hall

Example 1 Is there any Catch here!!!!! Data at this gross level, has no discrimination between customers within either group. A customer who bought within the last 11 days is treated exactly similar to a customer who bought within the last 11 months. © 2007 Prentice Hall

Example 1 Can we use some other variable to distinguish between old and new Customers? Answer: Actual Dollars spent ! What can we do with this variable? Find its Mean and Variation. We might find that the average purchase amount for old customers is two or three times larger than the average among new customers © 2007 Prentice Hall

Numerical Summaries of data The two basic concepts are the center and the Spread of the data Center of data - Mean, which is given by - Median - Mode © 2007 Prentice Hall

Numerical Summaries of data Forms of Variation Sum of differences about the mean: Variance: Standard Deviation: Square Root of Variance © 2007 Prentice Hall

Confidence Intervals In catalog eg, analyst wants to know average purchase amount of customers He draws two samples of 75 customers each and finds the means to be $68 and $122 Since difference is large, he draws another 38 samples of 75 each The mean of means of the 40 samples turns out to be $ 94.85 How confident should he be of this mean of means? © 2007 Prentice Hall

Confidence Intervals Analyst calculates the standard deviation of sample means, called Standard Error (SE). It is 12.91 Basic Premise for confidence Intervals 95 percent of the time the true mean purchase amount lies between plus or minus 1.96 standard errors from the mean of the sample means. C.I. = Mean (+or-) (1.96) * Standard Error © 2007 Prentice Hall

Confidence Intervals However, if CI is calculated with only one sample then Standard Error of sample mean = Standard deviation of sample Basic Premise for confidence Intervals with one sample 95 percent of the time the true mean lies between plus or minus 1.96 standard errors from the sample means. © 2007 Prentice Hall

Example 2: Confidence Intervals for response rates You are the marketing analyst for Online Apparel Company You want to run a promotion for all customers on your database In the past you have run many such promotions Historically you needed a 4.5% response for the promotions to break-even You want to test the viability of the current full-scale promotion by running a small test promotion © 2007 Prentice Hall

Example 2: Confidence Intervals for response rates Test 1,000 names selected at random from the full list. You construct CI based on required rate of 4.5% and n=1000 Confidence Interval= Expected Response ± 1.96*SE The SE=.00655, and CI is (.0322, .0578) In our case C.I. = 3.22 % to 5.78%. Thus any response between 3.22 and 5.78 % supports hypothesis that true response rate is 4.5% © 2007 Prentice Hall

Example 2: Confidence Intervals for response rates The list is mailed and actually pulls in 3.5% Thus, the true response rate maybe 4.5% What if the actual rate pulled in were 5% ? Regression towards mean: Phenomenon of test result being different from true result Give more thought to lists whose cutoff rates lie within confidence interval © 2007 Prentice Hall

Cross-Tabulation Frequency Distribution and © 2007 Prentice Hall 15

Chapter Outline 1) Frequency Distribution 2) Statistics Associated with Frequency Distribution Measures of Location Measures of Variability Measures of Shape 3) Cross-Tabulations Two Variable Case Three Variable Case General Comments on Cross-Tabulations 4) Statistics for Cross-Tabulation: Chi-Square © 2007 Prentice Hall

Internet Usage Data Table 15.1 Respondent Sex Familiarity Internet Attitude Toward Usage of Internet Number Usage Internet Technology Shopping Banking 1 1.00 7.00 14.00 7.00 6.00 1.00 1.00 2 2.00 2.00 2.00 3.00 3.00 2.00 2.00 3 2.00 3.00 3.00 4.00 3.00 1.00 2.00 4 2.00 3.00 3.00 7.00 5.00 1.00 2.00 5 1.00 7.00 13.00 7.00 7.00 1.00 1.00 6 2.00 4.00 6.00 5.00 4.00 1.00 2.00 7 2.00 2.00 2.00 4.00 5.00 2.00 2.00 8 2.00 3.00 6.00 5.00 4.00 2.00 2.00 9 2.00 3.00 6.00 6.00 4.00 1.00 2.00 10 1.00 9.00 15.00 7.00 6.00 1.00 2.00 11 2.00 4.00 3.00 4.00 3.00 2.00 2.00 12 2.00 5.00 4.00 6.00 4.00 2.00 2.00 13 1.00 6.00 9.00 6.00 5.00 2.00 1.00 14 1.00 6.00 8.00 3.00 2.00 2.00 2.00 15 1.00 6.00 5.00 5.00 4.00 1.00 2.00 16 2.00 4.00 3.00 4.00 3.00 2.00 2.00 17 1.00 6.00 9.00 5.00 3.00 1.00 1.00 18 1.00 4.00 4.00 5.00 4.00 1.00 2.00 19 1.00 7.00 14.00 6.00 6.00 1.00 1.00 20 2.00 6.00 6.00 6.00 4.00 2.00 2.00 21 1.00 6.00 9.00 4.00 2.00 2.00 2.00 22 1.00 5.00 5.00 5.00 4.00 2.00 1.00 23 2.00 3.00 2.00 4.00 2.00 2.00 2.00 24 1.00 7.00 15.00 6.00 6.00 1.00 1.00 25 2.00 6.00 6.00 5.00 3.00 1.00 2.00 26 1.00 6.00 13.00 6.00 6.00 1.00 1.00 27 2.00 5.00 4.00 5.00 5.00 1.00 1.00 28 2.00 4.00 2.00 3.00 2.00 2.00 2.00 29 1.00 4.00 4.00 5.00 3.00 1.00 2.00 30 1.00 3.00 3.00 7.00 5.00 1.00 2.00 Table 15.1 © 2007 Prentice Hall

Frequency Distribution In a frequency distribution, one variable is considered at a time. A frequency distribution for a variable produces a table of frequency counts, percentages, and cumulative percentages for all the values associated with that variable. © 2007 Prentice Hall

Frequency Distribution of Familiarity with the Internet Table 15.2 © 2007 Prentice Hall

Frequency Histogram 2 3 4 5 6 7 1 Frequency Familiarity 8 Fig. 15.1 1 Frequency Familiarity 8 © 2007 Prentice Hall

Statistics for Frequency Distribution: Measures of Location The mean, or average value, is the most commonly used measure of central tendency. The mean, ,is given by Where, Xi = Observed values of the variable X n = Number of observations (sample size) The mode is the value that occurs most frequently. The mode is a good measure of location when the variable is inherently categorical or has otherwise been grouped into categories. X = i / n S 1 © 2007 Prentice Hall

Statistics for Frequency Distribution: Measures of Location The median of a sample is the middle value when the data are arranged in ascending or descending order. If the number of data points is even, the median is the midpoint between the two middle values. The median is the 50th percentile. © 2007 Prentice Hall

Statistics for Frequency Distribution: Measures of Variability The range measures the spread of the data. The variance is the mean squared deviation from the mean. The variance can never be negative. The standard deviation is the square root of the variance. The coefficient of variation is the ratio of the standard deviation to the mean expressed as a percentage, and is a unitless measure of relative variability. C V = s x / X © 2007 Prentice Hall

Statistics for Frequency Distribution: Measures of Shape Skewness. The tendency of the deviations from the mean to be larger in one direction than in the other. Tendency for one tail of the distribution to be heavier than the other. Kurtosis is a measure of the relative peakedness or flatness of the frequency distribution curve. The kurtosis of a normal distribution is zero. -kurtosis>0, then dist is more peaked than normal dist. -kurtosis<0, then dist is flatter than a normal distribution. © 2007 Prentice Hall

Skewness of a Distribution Fig. 15.2 Skewed Distribution Symmetric Distribution Mean Median Mode (a) Mean Median Mode (b) © 2007 Prentice Hall

Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes two or more variables simultaneously. Cross-tabulation results in tables that reflect the joint distribution of two or more variables with a limited number of categories or distinct values, e.g., Table 15.3. © 2007 Prentice Hall

Gender and Internet Usage Table 15.3 Gender Row Internet Usage Male Female Total Light (1) 5 10 15 Heavy (2) Column Total 15 1 © 2007 Prentice Hall

Two Variables Cross-Tabulation Since two variables have been cross-classified, percentages could be computed either columnwise, based on column totals (Table 15.4), or rowwise, based on row totals (Table 15.5). The general rule is to compute the percentages in the direction of the independent variable, across the dependent variable. The correct way of calculating percentages is as shown in Table 15.4. © 2007 Prentice Hall

Internet Usage by Gender Table 15.4 © 2007 Prentice Hall

Gender by Internet Usage Table 15.5 © 2007 Prentice Hall

Introduction of a Third Variable in Cross-Tabulation Refined Association between the Two Variables No Association between the Two Variables No Change in the Initial Pattern Some Association between the Two Variables Fig. 15.7 Introduce a Third Variable Original Two Variables © 2007 Prentice Hall

3 Variables Cross-Tab: Refine an Initial Relationship As can be seen from Table 15.6, 52% (31%) of unmarried (married) respondents fell in the high-purchase category Do unmarried respondents purchase more fashion clothing? A third variable, the buyer's sex, was introduced As shown in Table 15.7, - 60% (25%) of unmarried (married) females fell in the high-purchase category - 40% (35%) of unmarried (married) males fell in the high-purchase category. Unmarried respondents are more likely to fall in the high purchase category than married ones, and this effect is much more pronounced for females than for males. © 2007 Prentice Hall

Purchase of Fashion Clothing by Marital Status Table 15.6 Purchase of Fashion Current Marital Status Clothing Married Unmarried High 31% 52% Low 69% 48% Column 100% Number of respondents 700 300 © 2007 Prentice Hall

Purchase of Fashion Clothing by Marital Status and Gender Table 15.7 Purchase of Fashion Clothing Sex Male Female Married Not High 35% 40% 25% 60% Low 65% 75% Column totals 100% Number of cases 400 120 300 180 © 2007 Prentice Hall

3 Variables Cross-Tab: Initial Relationship was Spurious Table 15.8 shows that 32% (21%) of those with (without) college degrees own an expensive automobile Income may also be a factor In Table 15.9, when the data for the high income and low income groups are examined separately, the association between education and ownership of expensive automobiles disappears, Initial relationship observed between these two variables was spurious. © 2007 Prentice Hall

Ownership of Expensive Automobiles by Education Level Table 15.8 Own Expensive Automobile Education College Degree No College Degree Yes No Column totals Number of cases 32% 68% 100% 250 21% 79% 750 © 2007 Prentice Hall

Ownership of Expensive Automobiles by Education Level and Income Levels Table 15.9 © 2007 Prentice Hall

3 Variables Cross-Tab: Reveal Suppressed Association Table 15.10 shows no association between desire to travel abroad and age. In Table 15.11, sex was introduced as the third variable. Controlling for effect of sex, the suppressed association between desire to travel abroad and age is revealed for the separate categories of males and females. Since the association between desire to travel abroad and age runs in the opposite direction for males and females, the relationship between these two variables is masked when the data are aggregated across sex as in Table 15.10. © 2007 Prentice Hall

Desire to Travel Abroad by Age Table 15.10 © 2007 Prentice Hall

Desire to Travel Abroad by Age and Gender Table 15.11 © 2007 Prentice Hall

Three Variables Cross-Tabulations No Change in Initial Relationship Consider the cross-tabulation of family size and the tendency to eat out frequently in fast-food restaurants as shown in Table 15.12. No association is observed. When income was introduced as a third variable in the analysis, Table 15.13 was obtained. Again, no association was observed. © 2007 Prentice Hall

Eating Frequently in Fast-Food Restaurants by Family Size Table 15.12 © 2007 Prentice Hall

Eating Frequently in Fast Food-Restaurants by Family Size and Income Table 15.13 © 2007 Prentice Hall

Statistics Associated with Cross-Tab: Chi-Square H0: there is no association between the two variables Use chi-square statistic H0 will be rejected when the calculated value of the test statistic is greater than the critical value of the chi-square distribution © 2007 Prentice Hall

Statistics Associated with Cross-Tab: Chi-Square compares the observed cell frequencies (fo) to the frequencies to be expected when there is no association between variables (fe) The expected frequency for each cell can be calculated by using a simple formula: nr=total number in the row nc=total number in the column n=total sample size © 2007 Prentice Hall

Statistics for Cross-Tab: Chi-Square From Table 3 in the Statistical Appendix, the probability of exceeding a chi-square value of 3.841 is 0.05. The calculated chi-square is 3.333. Since this is less than the critical value of 3.841, the null hypothesis can not be rejected Thus, the association is not statistically significant at the 0.05 level. © 2007 Prentice Hall