Statistics: The Interpretation of Data 13.1 Organizing and Representing Data 13.2 Measuring the Center and Variation of Data 13.3 Statistical Inference
13.1 Organizing and Representing Data
VISUAL REPRESENTATIONS OF DATA DOT PLOTS – used to summarize relatively small sets of data From a table - To a dot plot -
VISUAL REPRESENTATIONS OF DATA STEM AND LEAF PLOTS – also used to summarize relatively small sets of data From a table - To a stem and leaf plot -
VISUAL REPRESENTATIONS OF DATA HISTOGRAMS – data is grouped into intervals
VISUAL REPRESENTATIONS OF DATA LINE GRAPHS – also called frequency polygons
VISUAL REPRESENTATIONS OF DATA BAR GRAPHS – used with categorical data, where the horizontal scale may be some nonnumerical attribute
VISUAL REPRESENTATIONS OF DATA PIE CHARTS – represents relative amounts to a whole Percent of each tax dollar expended by Mile High School District by category
VISUAL REPRESENTATIONS OF DATA PICTOGRAPHS – useful in comparing quantities
13.2 Measuring the Center and Variation of Data
MEASURES OF CENTRAL TENDENCY MEAN – the arithmetic mean, or average MEDIAN – the middle value in a collection when the values are arranged in order of increasing size MODE – the value that occurs most frequently in a collection of values
THE MEAN The mean, or average, of a collection of values is where S is the sum of the values and n is the number of values.
THE MEAN A visual understanding using a data set of 7, 5, 7, 3, 8, and 6:
THE MEDIAN Let a collection of n data values be written in order of increasing size. If n is odd, the median, denoted by , is the middle value in the list. If n is even, is the average of the two middle values.
THE MEDIAN Data set 1: 24, 25, 25, 27, 29, 31, 32, 34, 37 Data set 2: 42, 42, 43, 44, 44, 46, 47, 47, 47, 49 average
THE MODE A mode of a collection of values is a value that occurs the most frequently. If two or more values occur equally often and more frequently than all other values, there are two or more modes. Data Set: 42, 42, 43, 44, 44, 46, 47, 47, 47, 49 The mode of this data set is 47.
MEASURES OF VARIABILITY RANGE – the difference between the smallest and largest data values QUARTILES – casually speaking, these values divide the data set into four sections, each of which contains, in increasing order, about ¼ of the data STANDARD DEVIATION – a measure of the typical deviation from the mean
DEFINITION: OUTLIER An outlier is a value that "lies outside" (is much smaller or larger than) most of the other values in a set of data. Eg. Wayne Gretzky’s statistics
DEFINITION: STANDARD DEVIATION Standard deviation is a measure of how spread out numbers are around the mean.
DEFINITION: STANDARD DEVIATION Let be the values in a set of data and let denote their mean. Then is the standard deviation.
Example 13.11 Computing a Standard Deviation Compute the mean and standard deviation for this set of data: 35 42 61 29 39
13.3 Statistical Inference
TERMINOLOGY AND NOTATION A population is a particular set of objects about which one desires information. Mean of a population = Standard deviation of a population = A sample is a subset of the population. Mean of a sample = Standard deviation of a sample =
DEFINITION: A RANDOM SAMPLE A random sample of size r is a subset of r individuals from the population chosen in such a way that every such subset has an equal chance of being chosen.
THE NORMAL DISTRIBUTION
THE 68-95-99.7 RULE FOR NORMAL DISTRIBUTIONS For a population that has a normal distribution, about 68% falls within 1 standard deviation of the mean, about 95% falls within 2 standard deviations of the mean, and about 99.7% falls within 3 standard deviations.
THE STANDARDIZED NORMAL CURVE
DEFINITION: PERCENTILE A number such that the r-th percentage of a sample or distribution is less than or equal to that number is called the r-th percentile. NOTE: Scoring at the 75th percentile on a test indicates that 75% of the students had a score less than or equal to yours, not necessarily that you got 75% of the problems correct.