Chapter 14 Basic Data Analysis
LEARNING OUTCOMES After studying this chapter, you should be able to Know that analysis consists of summarizing, rearranging, ordering, or manipulating data Create and interpret simple tabulation and cross-tabulation tables Understand how cross-tabulations can reveal relationships Perform basic data transformations Define hypothesis and significance level Discuss the steps in the hypothesis-testing procedure Describe the factors that influence the choice of statistical methods to use for analysis
The Nature of Descriptive Analysis The elementary transformation of raw data in a way that describes the basic characteristics such as central tendency, distribution, and variability. Histogram A graphical way of showing a frequency distribution in which the height of a bar corresponds to the observed frequency of the category.
EXHIBIT 14.1 Levels of Scale Measurement and Suggested Descriptive Statistics
Tabulation Tabulation Frequency Table The orderly arrangement of data in a table or other summary format showing the number of responses to each response category; tallying. Frequency Table A table showing the different ways respondents answered a question.
Cross-Tabulation Cross-Tabulation Contingency Table Marginals Addresses research questions involving relationships among multiple less-than interval variables Results in a combined frequency table displaying one variable in rows and another variable in columns. Contingency Table A data matrix that displays the frequency of some combination of responses to multiple variables. Marginals Row and column totals in a contingency table, which are shown in its margins.
EXHIBIT 14.2 Cross-Tabulation Tables from a Survey on Ethics in America From Roger Ricklefs, “Ethics in America,” The Wall Street Journal, October 31, 1983, p. 33, 42; November 1, 1983, p. 33; November 2, 1983, p. 33; and November 3, 1983, pp. 33, 37.
EXHIBIT 14.3 Possible Cross-Tabulations of One Question
Cross-Tabulation (cont’d) Percentage Cross-Tabulations Statistical base: the number of respondents or observations (in a row or column) used as a basis for computing percentages. Elaboration and Refinement Elaboration analysis: an analysis of the basic cross-tabulation for each level of a variable not previously considered, such as subgroups of the sample. Moderator variable: a third variable that changes the nature of a relationship between the original independent and dependent variables.
EXHIBIT 14.4 Cross-Tabulation of Marital Status, Sex, and Responses to the Question “Do You Shop at Target?”
Cross-Tabulation (cont’d) How Many Cross-Tabulations? Every possible response becomes a possible explanatory variable. When hypotheses involve relationships among two categorical variables, cross-tabulations are the right tool for the job.
Data Transformation Data Transformation (also Data Conversion) The process of changing the data from their original form to a format suitable for performing a data analysis addressing research objectives. Bimodal
Calculating Rank Order Ranking data can be summarized by performing a data transformation. The transformation involves multiplying the frequency by the ranking score for each choice to resulting in a new scale.
EXHIBIT 14.5 Executive Rankings of Potential Conference Destinations
EXHIBIT 14.6 Frequencies of Conference Destination Rankings
Tabular and Graphic Methods of Displaying Data Tables Are useful for presenting numerical information Facilitate summarization and communication of data Can include stubheads and bannerheads that allow the reader to view several cross-tabulations at once Charts and Graphs Translate information into visual forms so that relationships may be easily grasped Can increase the effectiveness of a well-designed presentation Can create strong visual impressions
EXHIBIT 14.7 Table of Regional Airline Usage for Vacation/Pleasure by Income and Education Class
EXHIBIT 14.8 A Stubhead Format Table Allowing Several Cross-Tabulations to Be Included in a Single Table Confidence in Church/Organized Religion Question: I am going to read you a list of institutions in American society. Would you tell me how much confidence you, yourself, have in each one—a great deal, quite a lot, some, or very little? The Church or Organized Religion *Less than 1 percent. From “Confidence in Church/Organized Religion,” The Gallup Report 238, July 1985, p. 4.
EXHIBIT 14.9 The Basic Forms of Graphic Presentation
EXHIBIT 14.10 Line Graphs Highlighting Comparisons over Time Americans spend nine more hours at the office today than they did 27 years ago. Reprinted from “Time Off ” American Demographics, March 2001, p. 24.
Computer Programs for Analysis Statistical Packages Spreadsheets Excel Statistical software: SAS SPSS (Statistical Package for Social Sciences) MINITAB
EXHIBIT 14.11 The Basic Data Analysis Window
EXHIBIT 14.12 SAS Computer Output of Descriptive Statistics
EXHIBIT 14.13 SPSS Histogram Output
EXHIBIT 14.14 Examples of SPSS Output for Cross-Tabulation From Real Stats Real Easy: SPSS for Windows. Copyright © 1992, SPSS, Inc.
Univariate Statistics: Stating a Hypothesis Unproven proposition: a supposition that tentatively explains certain facts or phenomena An assumption about nature of the world. Null Hypothesis (Ho) Statement about the status quo No difference in sample and population Alternative Hypothesis (H1) Statement that indicates the opposite of the null hypothesis
The Hypothesis-Testing Procedure Process The specifically stated hypothesis is derived from the research objectives. A sample is obtained and the relevant variable is measured. The measured sample value is compared to the value either stated explicitly or implied in the hypothesis. If the value is consistent with the hypothesis, the hypothesis is supported. If the value is not consistent with the hypothesis, the hypothesis is not supported.
Significance Levels and p-Values A critical probability associated with a statistical hypothesis test that indicates how likely an inference supporting a difference between an observed value and some statistical expectation is true. The acceptable level of Type I error. p-Value Probability value, or the observed or computed significance level; p-values are compared to significance levels to test hypotheses. Higher p-values equal more support for an hypothesis. Critical values The values that lie exactly on the boundary of the region of rejection.
EXHIBIT 14.15 A Sampling Distribution of the Mean Assuming µ = 3.0
An Example of Hypothesis Testing The null hypothesis: the mean is equal to 3.0: The alternative hypothesis: the mean does not equal to 3.0:
An Example of Hypothesis Testing (cont’d)
— EXHIBIT 14.16 A Hypothesis Test Using the Sampling Distribution of X under the Hypothesis µ = 3.0
The Chi-Square Test for Goodness of Fit Chi-square (χ2) t-test Tests for statistical significance Is particularly appropriate for testing hypotheses about frequencies arranged in a frequency or contingency table. Goodness-of-Fit (GOF) A general term representing how well some computed table or matrix of values matches some population or predetermined table or matrix of the same size.
The Chi-Square Test for Goodness of Fit: The Process Formulate the null hypothesis and determine the expected frequency of each answer. Determine the appropriate significance level. Calculate the χ2 value, using the observed frequencies from the sample and the expected frequencies. Make the statistical decision by comparing the calculated χ2 value with the critical χ2 value.
Chi-Square Formula χ² = chi-square statistic Oi = observed frequency in the ith cell Ei = expected frequency on the ith cell Ri = total observed frequency in the ith row Cj = total observed frequency in the jth column n = sample size
Degrees of Freedom (d.f.) (R-1)(C-1)=(2-1)(2-1)=1 d.f.=(R-1)(C-1)
EXHIBIT 14.17 One-Way Frequency Table for Brand Awareness EXHIBIT 14.18 Table Calculating the Chi-Square Statistic
Choosing the Appropriate Statistical Technique Choosing the correct statistical technique requires considering: The number of variables involved The level of scale measurement The type of question to be answered Choice of statistical technique influences: The research design The type of data collected
EXHIBIT 14.19 Descriptive Statistics Permissible with Different Types of Measurement All statistics appropriate for lower-order scales (nominal is the lowest) are appropriate for higher-order scales (ratio is the highest).
EXHIBIT 14.20 Examples of Selecting the Appropriate Univariate Statistical Method
Key Terms and Concepts Descriptive analysis Histogram Tabulation Frequency table Cross-tabulation Contingency table Marginals Statistical base Elaboration analysis Moderator variable Data transformation Hypothesis Null hypothesis Alternative hypothesis Significance level Critical values Chi-square (χ2) test