Thursday, May 12, 2016 Report at 11:30 to Prairieview AP Exam Thursday, May 12, 2016 Report at 11:30 to Prairieview
Exploring Data: Describing Patterns and Departures from Patterns We started by describing univariate (that means one variable) data graphically and numerically. Graphically: Dotplot Stemplot
More graphs… Cumulative Frequency Histogram Will need to read, interpret and answer questions of graphs.
SOCS When asked to describe data based on the graph, focus on SOCS Shape: Mound, Skewed (left or right? positive or negative?), Bimodal, Unimodal, Uniform, approximately normal… Outliers: Are there any potential? If you are only asked to describe the graph, you don’t need to calculate, just mention any potential outliers Center: Where (approx) is the median? the mean? based on the shape of the data, which is a better choice for center? Spread: Range, IQR
While a plot provides a nice visual description of a dataset, we often want a more detailed numeric summary of the center and spread.
Describing Center Mean Median or Q2
Measures of Variability: When describing the “spread” of a set of data, we can use: Range: Max-Min InterQuartile Range: IQR=Q3-Q1 Standard Deviation:
Analyzing Categorical Data Use a bar graph to display categorical data. Make sure graph is labeled, bars are equal width and evenly spaced. A pie chart may be used to display categorical data if the data is parts of a whole.
Conditional Distribution Describes the values of a variable among individuals who have a specific value of another variable.
Normal Distribution Many distributions of data and many statistical applications can be described by an approximately normal distribution. Symmetric, Bell-shaped Curve Centered at Mean μ Described as N(μ, ) Empirical Rule: 68% of data within 1 of µ 95% within 2 of µ 99.7% within 3 of µ
Standardizing Data If the data does not fall exactly 1, 2, or 3 from µ, we can standardize the value using a z-score: You can find the or percentile by finding the area left of the z-score. The area under a distribution curve = 1 Use table A to find percentiles or p-values To Get area above: To Get area below: normalcdf(z, 100, 0, 1) normalcdf(-100, z, 0, 1) Betweeen: normalcdf (z, z, 0, 1 )
Assessing Normality Shape Normal Probability Plot – Linear Plot = Normal distribution
Bivariate Data (That means 2 variables) The study of bivariate data is the study of the relationship between quantitative variables. • D O F S (Direction, Outliers, Form, Strength) • Least Squares Regression Line • Residuals (observed – predicted) • Correlation (r) –Correlation Coefficient • r2 – Coefficient of Determination Calculator Steps: Make sure Diagnostics are on Enter data in L1, L2 STAT CALC LinReg (a + bx)
Correlation “r” We can describe the strength of a linear relationship with the Correlation Coefficient, r -1 < r < 1 The closer r is to 1 or -1, the stronger the linear relationship between x and y. r alone is not enough to say there is a linear relationship between 2 variables.
Least Squares Regression Line When we observe a linear relationship between x and y, we often want to describe it with a “line of best fit” y=a+bx. We can find this line by performing least-squares regression. We can use the resulting equation to predict y-values for given x- values.
Assessing Fit If we hope to make useful predictions of y we must assess whether or not the LSRL is indeed the best fit. If not, we may need to find a different model. Use the residual plot to help determine linearity. Plots should be scattered with no obvious patterns or curvature.
Making Predictions If you are satisfied that the LSRL provides an appropriate model for predictions, you can use it to predict a y-hat for x’s within the observed range of x- values. Predictions for observed x-values can be assessed by noting the residual. Residual =
Bivariate Relationship – Non Linear If data is not best described by a LSRL, we may be able to find a Power or Exponential model that can be used for more accurate predictions. If (x,y) is non-linear, we can transform it to try to achieve a linear relationship. Power Model (ln x , ln y ) or ( log x , log y ) Exponential Model ( x , ln y ) or ( x , log y ) If transformed data appears linear, we can find a LSRL and then transform back to the original terms of the data
Sampling and Surveys Our goal in statistics is often to answer a question about a population using information from a sample. Observational Study vs. Experiment We must be sure the sample is representative of the population in question.
Observational Studies If you are performing an observational study, your sample can be obtained in a number of ways: Convenience Cluster Systematic Simple Random Sample Stratified Random Sample
Experimental Study In an experiment, we impose a treatment with the hopes of establishing a causal relationship. Experiments exhibit 3 Principles: Randomization Control Replication
Experimental Designs Like Observational Studies, Experiments can take a number of different forms: Completely Controlled Randomized Comparative Experiment Blocked Matched Pairs
Scope of Inference