6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively evaluated. 3 Phases: 1.Collecting data 2.Analyzing data 3.Interpreting data
6.1 What is Statistics? Descriptive Statistics – summarize and describe a characteristic of a group example: batting average Inferential Statistics – used to estimate, infer, or conclude something about a larger group example: polls Sample – subset of the group of data available for analysis
6.1 What is Statistics? Population – the entire set Bias – favoring of certain outcomes over others Census – collects data from all members of the population Parameter – characteristic value of a population Statistic – characteristic value of a sample
6.2 Organizing Data Stem and Leaf Diagram: data – 35, 52, 37, 44, 51, 48, 45, 12 StemLeaves
6.2 Organizing Data Frequency Table: data – 35, 52, 37, 44, 51, 48, 45, 12 RangeFrequency
6.3 Displaying Data Ways to display data: –Frequency histogram –Relative frequency histogram –Multiple bar graph –Stacked bar graph –Line graph –Pie chart
6.3 Displaying Data Frequency Histogram
6.3 Displaying Data Relative Frequency Histogram
6.3 Displaying Data Multiple Bar Graph
6.3 Displaying Data Stacked Bar Graph
6.3 Displaying Data Line Graph
6.3 Displaying Data Pie Chart
6.4 Measures of Central Tendency Central Tendency – the propensity of data to be located or clustered about some point. Arithmetic Mean – sum of the values of all the observations divided by the total number of observations For sample data, mean is
6.4 Measures of Central Tendency For population data, the mean is Median – the median is the middle value of a set of data when data is arranged in ascending order
6.4 Measures of Central Tendency Finding the median: 1.Arrange the data in increasing order or decreasing order. 2.Determine if n is even or odd. a.If n is odd, pick the middle value b.If n is even, take the average of the two middle values
6.4 Measures of Central Tendency Mode – is the value or values that occur most frequently. Note: If all values occur with the same frequency, then there is no mode. Symmetric Distribution Mean, Median, and Mode
6.4 Measures of Central Tendency Distribution skewed to the left Mean Median Mode Distribution skewed to the right Mean Median Mode
6.5 Measures of Variability Definition: The range of a set of n measurements, x 1, x 2, x 3, … x n is the difference between the largest and the smallest amounts. Variance -
6.5 Measures of Variability Problem with the variance: the units are the original units squared. Standard deviation – population standard deviation is the square root of the population variance. Sample variance - s = square root of the sample variance
6.5 Measures of Variability Short cut formulas for s 2 and 2 are given on page 495 (provided with test). Short cut formula for frequency data is given on page 499 (provided with test). Short cut formulas are genuinely easier to calculate. Approximating the standard deviation: s (R/4) where R is the range.
6.6 Measures of Relative Position p th percentile - for a data in increasing order - p% of the data are less than that value and (100 – p)% of the data are greater than that value.
6.6 Measures of Relative Position Z-scores – The sample z-score for a measure x is: The population z-score for a measure x is: z-score represents the # of standard deviations away from the mean.
6.7 Normal Distribution Definition: Standardizing – converting data to z-scores. Some empirical rules: 1.About 68% of data is within one of the mean. 2.About 95% of data is within two of the mean. 3.About 99% of data is within three of the mean.
6.7 Normal Distribution The normal distribution looks like: 1.Bell-shaped 2.Symmetric 3.Mean = median = mode
6.7 Normal Distribution Definition: Standard normal distribution – normal distribution with = 1 and = 0. The standard normal distribution table (page 511 or in appendix page 647) can be used to determine probabilities for a range of z- values
6.8 Confidence Intervals Central Limit Theorem: For a large sample size, the random variable x is approximately normally distributed with mean and standard deviation / n where is the population mean of the x’s and is the population standard deviation of the x’s.
6.8 Confidence Intervals - may be replaced by s Common levels of confidence (n 30): Level of Confidence z /
6.8 Confidence Intervals Margin of Error: margin of error of an estimate of a sample proportion is given by:
6.9 Regression and Correlation Scatter Plot – a plot of data consisting of 2 variables Linear Regression – modeling the data with the line that “best fits” – usually a “least squares” line or regression line Least Squares Line – is the line that minimizes the sum of the squared errors for a set of data points (formulas given on page 531 and shortcut formulas are on page 532 – formulas to be provided on test)
6.9 Regression and Correlation Correlation Coefficient r – is a measure of the strength of the linear relationship between the 2 random variables x and y. Note: The closer the correlation is to 1 or – 1, the stronger the relationship between the x and y variables. A correlation of zero means there is no evidence of a linear pattern.