CHAPTER 1 INTRODUCTION Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved
WHAT IS STATISTICS? Definition Data: Any observations that have been collected Statistics is a group of methods used to collect, analyze, present, and interpret data and to make decisions. Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved
Key Terms Population: A population consists of all elements – individuals, items, or objects – whose characteristics are being studied. The population that is being studied is also called the target population. Or The entire category under consideration. Or the complete set of elements being studied. The population size is usually indicated by a capital N. Examples: every lawyer in the United States; all single women in the United States.
Key Terms Sample. A portion of the population selected for study is referred to as a sample. or That portion of the population that is available, or to be made available, for analysis. A good sample is representative of the population. We will learn about probability samples and how they provide assurance that a sample is indeed representative. The sample size is shown as lower case n. If your company manufactures one million laptops, they might take a sample of say, 500, of them to test quality. The population size is N = 1,000,000 and the sample size is n= 500. Census: A survey that includes every member of the population is called a census. The technique of collecting information from a portion of the population is called a sample survey.
Figure 1.1 Population and Sample Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved
Key Terms Introduction
TYPES OF STATISTICS Descriptive Statistics consists of methods for organizing, displaying, and describing data by using tables, graphs, and summary measures. Those statistics that summarize a sample of numerical data in terms of averages and other measures for the purpose of description. Descriptive statistics, as opposed to inferential statistics, are not concerned with the theory and methodology for drawing inferences that extend beyond the particular set of data examined. Thus, a teacher who gives a class, of say, 35 students, an exam is interested in the descriptive statistics to assess the performance of the class. What was the class average, the median grade, the standard deviation, etc.? The teacher is not interested in making any inferences to some larger population.
TYPES OF STATISTICS
TYPES OF STATISTICS Example of inferential statistics from quality control: GE manufactures LED bulbs and wants to know how many are defective. Suppose one million bulbs a year are produced in its new plant in Staten Island. The company might sample, say, 500 bulbs to estimate the proportion of defectives. N = 1,000,000 and n = 500 If 5 out of 500 bulbs tested are defective, the sample proportion of defectives will be 1% (5/500). This statistic may be used to estimate the true proportion of defective bulbs (the population proportion). In this case, the sample proportion is used to make inferences about the population proportion.
POPULATION VERSUS SAMPLE A sample that represents the characteristics of the population as closely as possible is called a representative sample. A sample drawn in such a way that each element of the population has a chance of being selected is called a random sample. If all samples of the same size selected from a population have the same chance of being selected, we call it simple random sampling. Such a sample is called a simple random sample. Sample with replacement Sample without replacement
BASIC TERMS An element or member of a sample or population is a specific subject or object (for example, a person, firm, item, state, or country) about which the information is collected. A variable is a characteristic under study that assumes different values for different elements. In contrast to a variable, the value of a constant is fixed. The value of a variable for an element is called an observation or measurement. A data set is a collection of observations on one or more variables.
Table 1.1 Charitable Givings of Six Retailers in 2007 Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved
TYPES OF VARIABLES Quantitative Variables Discrete Variables Continuous Variables Qualitative or Categorical Variables Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved
TYPES OF VARIABLES Qualitative variables A variable that cannot assume a numerical value but can be classified into two or more nonnumeric categories. result in categorical or non-numeric responses. Also called Nominal, or categorical data (variable) Example: Sex MALE FEMALE Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved
TYPES OF VARIABLES Discrete variables Continuous variables Quantitative variables A variable that can be measured numerically is called a quantitative variable. The data collected on a quantitative variable are called quantitative data. result in numerical responses, and may be Discrete variables Continuous variables Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved
Quantitative Variables Discrete variables A variable whose values are countable is called a discrete variable. In other words, a discrete variable can assume only certain values with no intermediate values. Example: How many courses have you taken at this College? ____
Quantitative Variables Continuous variables A variable that can assume any numerical value over a certain interval or intervals is called a continuous variable. Arise from a measuring process. Example: How much do you weigh? ____ One way to determine whether data is continuous, is to ask yourself whether you can add several decimal places to the answer. For example, you may weigh 150 pounds but in actuality may weigh 150.23568924567 pounds. On the other hand, if you have 2 children, you do not have 2.3217638 children.
Figure 1.2 Types of Variables Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved
Cross-Section Data Definition Data collected on different elements at the same point in time or for the same period of time are called cross-section data. Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved
Table 1.2 Charitable Givings of Six Retailers in 2007 Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved
Time-Series Data Definition Data collected on the same element for the same variable at different points in time or for different periods of time are called time-series data. Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved
Table 1.3 Number of Movie Screens Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved
SOURCES OF DATA Data may be obtained from Internal Sources External Sources Surveys and Experiments Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved
Primary vs. Secondary Data Primary data. This is data that has been compiled by the researcher using such techniques as surveys, experiments, depth interviews, observation, focus groups. Types of surveys. A lot of data is obtained using surveys. Each survey type has advantages and disadvantages. Mail: lowest rate of response; usually the lowest cost Personally administered: can “probe”; most costly; interviewer effects (the interviewer might influence the response) Telephone: fastest Web: fast and inexpensive Introduction
Primary vs. Secondary Data Secondary data. This is data that has been compiled or published elsewhere, e.g., census data. The trick is to find data that is useful. The data was probably collected for some purpose other than helping to solve the researcher’s problem at hand. Advantages: It can be gathered quickly and inexpensively. It enables researchers to build on past research. Problems: Data may be outdated. Variation in definition of terms. Different units of measurement. May not be accurate (e.g., census undercount). Introduction
SUMMATION NOTATION A sample of prices of five literary books: $75, $80, $35, $97, and $88 The variable price of a book: x Price of the first book = x1 = $75 Price of the second book = x2 = $80 … Adding the prices of all five books gives 75+80+35+97+88 = x1+x2+x3+x4+x5 = Σx Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved
Example 1-1 Annual salaries (in thousands of dollars) of four workers are 75, 90, 125, and 61, respectively. Find (a) ∑x (b) (∑x)² (c) ∑x² Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved
Example 1-1: Solution (a) ∑x = x1 + x2 + x3 + x4 = 75 + 90 + 125 + 61 = 351 = $351,000 (b) (∑x)² = (351)² = 123,201 (c) ∑x² = (75)² + (90)² + (125)² + (61)² = 5,625 + 8,100 + 15,625 + 3,721 = 33,071 Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved
(a) Σm (b) Σf² (c) Σmf (d) Σm²f Example 1-2 The following table lists four pairs of m and f values: Compute the following: (a) Σm (b) Σf² (c) Σmf (d) Σm²f Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved
Example 1-2: Solution Table 1.4 (a) (b) (c) (d) Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved