Elementary Statistics Thirteenth Edition Chapter 1 Introduction to Statistics Copyright © 2018, 2014, 2012 Pearson Education, Inc. All Rights Reserved
Introduction to Statistics 1-1 Statistical and Critical Thinking 1-2 Types of Data 1-3 Collecting Sample Data
Key Concept A major use of statistics is to collect and use sample data to make conclusions about populations.
Parameter Parameter a numerical measurement describing some characteristic of a population
Statistic Statistic a numerical measurement describing some characteristic of a sample
Parameter and Statistic Larson/Farber 4th ed. Parameter and Statistic Parameter A number that describes a population characteristic. Average age of all people in the United States Statistic A number that describes a sample characteristic. Average age of people from a sample of three states
Quantitative Data Quantitative (or numerical) data consists of numbers representing counts or measurements. Example: The weights of supermodels Example: The ages of respondents
Categorical Data Categorical (or qualitative or attribute) data consists of names or labels (not numbers that represent counts or measurements). Example: The gender (male/female) of professional athletes Example: Shirt numbers on professional athletes uniforms - substitutes for names
Example: Classifying Data by Type Larson/Farber 4th ed. Example: Classifying Data by Type The base prices of several vehicles are shown in the table. Which data are categorical (qualitative) data and which are quantitative data? (Source Ford Motor Company)
Working with Quantitative Data Quantitative data can be further described by distinguishing between discrete and continuous types.
Discrete Data Discrete data result when the data values are quantitative and the number of values is finite, or “countable.” Example: The number of tosses of a coin before getting tails
Continuous Data Continuous (numerical) data result from infinitely many possible quantitative values, where the collection of values is not countable. Example: The lengths of distances from 0 cm to 12 cm
Types of Random Variables Identify each random variable as discrete or continuous. x = The number of people in a car x = The gallons of gas bought in a week x = The time it takes to drive from home to school x = The number of heads in three tosses of a coin Discrete – you count the number of people in a car 0, 1, 2, 3… Possible values can be listed. Continuous – you measure the gallons of gas. You cannot list the possible values. A continuous random variable can take on any value between two values. Discrete random variables are usually integers, however shoe size is a discrete random variable since the values can be listed. No shoe size occurs between 9 and 9½. Continuous – you measure the amount of time. The possible values cannot be listed. Discrete – you count the number of heads. The possible numbers can be listed.
Levels of Measurement Another way of classifying data is to use four levels of measurement: nominal, ordinal, interval, and ratio.
Nominal Level Nominal level of measurement characterized by data that consist of names, labels, or categories only, and the data cannot be arranged in some order (such as low to high). Example: Survey responses of yes, no, and undecided Color of eyes, car, hair Name of TV channels, FOX, ABC, CWPhilly, CSN, etc
Ordinal Level Ordinal level of measurement involves data that can be arranged in some order, but differences (obtained by subtraction) between data values either cannot be determined or are meaningless. Example: Course grades A, B, C, D, or F Horse racing Win, Place, Show (1st, 2nd, 3rd place) Ranking of TV shows.(Nielsen Ratings)
Levels of Measurement Ordinal level of measurement Larson/Farber 4th ed. Levels of Measurement Nominal level of measurement Qualitative (categorical) data only Categorized using names, labels, or qualities No mathematical computations can be made Ordinal level of measurement Qualitative or quantitative data Data can be arranged in order Differences between data entries is not meaningful
Example: Classifying Data by Level Larson/Farber 4th ed. Example: Classifying Data by Level Two data sets are shown. Which data set consists of data at the nominal level? Which data set consists of data at the ordinal level? (Source: Nielsen Media Research)
Solution: Classifying Data by Level Larson/Farber 4th ed. Solution: Classifying Data by Level Ordinal level (lists the rank of five TV programs. Data can be ordered. Difference between ranks is not meaningful.) Nominal level (lists the call letters of each network affiliate. Call letters are names of network affiliates.)
Interval Level Interval level of measurement involves data that can be arranged in order, and the differences between data values can be found and are meaningful. However, there is no natural zero starting point at which none of the quantity is present. Example: Years 1000, 2000, 1776, and 1492 Temperature: In U.S. 0˚F, is it same as in Canada 0 ˚C? No, zero doesn’t mean zero. NO natural real zero.
Levels of Measurement Interval level of measurement Quantitative data Larson/Farber 4th ed. Levels of Measurement Interval level of measurement Quantitative data Data can ordered Differences between data entries is meaningful Zero represents a position on a scale (not an inherent zero – zero does not imply “none”)
Ratio Level Ratio level of measurement data can be arranged in order, differences can be found and are meaningful, and there is a natural zero starting point (where zero indicates that none of the quantity is present). Differences and ratios are both meaningful. Example: Class times of 50 minutes and 100 minutes Wait time at fast food place for lunch. Zero really means Zero.
Levels of Measurement Ratio level of measurement Larson/Farber 4th ed. Levels of Measurement Ratio level of measurement Similar to interval level Zero entry is an inherent zero (implies “none”) A ratio of two data values can be formed One data value can be expressed as a multiple of another
Example: Classifying Data by Level Larson/Farber 4th ed. Example: Classifying Data by Level Two data sets are shown. Which data set consists of data at the interval level? Which data set consists of data at the ratio level? (Source: Major League Baseball)
Solution: Classifying Data by Level Larson/Farber 4th ed. Solution: Classifying Data by Level Interval level (Quantitative data. Can find a difference between two dates, but a ratio does not make sense.) Ratio level (Can find differences and write ratios.)
Summary - Levels of Measurement Nominal - categories only Ordinal - categories with some order Interval - differences but no natural zero point Ratio - differences and a natural zero point
Big Data Big data Data science refers to data sets so large and so complex that their analysis is beyond the capabilities of traditional software tools. Analysis of big data may require software simultaneously running in parallel on many different computers. Data science involves applications of statistics, computer science, and software engineering, along with some other relevant fields (such as sociology or finance).
Missing Data A data value is missing completely at random if the likelihood of its being missing is independent of its value or any of the other values in the data set. That is, any data value is just as likely to be missing as any other data value. A data value is missing not at random if the missing value is related to the reason that it is missing.
Correcting for Missing Data Delete Cases: One very common method for dealing with missing data is to delete all subjects having any missing values. Impute Missing Values: We “impute” missing data values when we substitute values for them.