Review of Midterm math2200. Data W’s Subjects Variables –Categorical versus quantitative.

Slides:



Advertisements
Similar presentations
Chapter 8 Linear regression
Advertisements

Chapter 8 Linear regression
Chapter 4 The Relation between Two Variables
Chapter 6 The Standard Deviation as a Ruler and the Normal Model
Slide 3- 1 Copyright © 2010 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Business Statistics First Edition.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
The goal of data analysis is to gain information from the data. Exploratory data analysis: set of methods to display and summarize the data. Data on just.
The Simple Regression Model
A.P. Statistics: Semester 1 Review
Ch 2 and 9.1 Relationships Between 2 Variables
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Correlation and Regression Analysis
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Inference for regression - Simple linear regression
STAT 211 – 019 Dan Piett West Virginia University Lecture 2.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 1 Exploring Data
Chapter 1: Exploring Data AP Stats, Questionnaire “Please take a few minutes to answer the following questions. I am collecting data for my.
Chapter 3: Examining relationships between Data
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
CHAPTER 7: Exploring Data: Part I Review
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Univariate Data Chapters 1-6. UNIVARIATE DATA Categorical Data Percentages Frequency Distribution, Contingency Table, Relative Frequency Bar Charts (Always.
UNDERSTANDING RESEARCH RESULTS: DESCRIPTION AND CORRELATION © 2012 The McGraw-Hill Companies, Inc.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Midterm Review! Unit I (Chapter 1-6) – Exploring Data Unit II (Chapters 7-10) - Regression Unit III (Chapters 11-13) - Experiments Unit IV (Chapters 14-17)
Wednesday, May 13, 2015 Report at 11:30 to Prairieview.
Chapter 6 The Standard Deviation as a Ruler and the Normal Model Math2200.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
MMSI – SATURDAY SESSION with Mr. Flynn. Describing patterns and departures from patterns (20%–30% of exam) Exploratory analysis of data makes use of graphical.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Announcements First quiz next Monday (Week 3) at 6:15-6:45 Summary:  Recap first lecture: Descriptive statistics – Measures of center and spread  Normal.
Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.
 Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
AP Statistics Semester One Review Part 1 Chapters 1-3 Semester One Review Part 1 Chapters 1-3.
AP Stats Nationals Review. Data: Quantitative (Histogram, Stem & Leaf, Boxplots) versus Categorical (Bar or Pie Chart) Boxplots: 5 Number Summary, IQR,
UNIT #1 CHAPTERS BY JEREMY GREEN, ADAM PAQUETTEY, AND MATT STAUB.
Standard Deviation and the Normal Model PART 1 RESHIFTING DATA, RESCALING DATA, Z-SCORE.
1 Never let time idle away aimlessly.. 2 Chapters 1, 2: Turning Data into Information Types of data Displaying distributions Describing distributions.
Chapter 8 Linear Regression. Fat Versus Protein: An Example 30 items on the Burger King menu:
Linear Regression Chapter 8. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Module 8 Test Review. Find the following from the set of data: 6, 23, 8, 14, 21, 7, 16, 8  Five Number Summary: Answer: Min 6, Lower Quartile 7.5, Median.
1 Take a challenge with time; never let time idles away aimlessly.
AP Stats Review day 1 April 2, Basics Two Parts (90 Minutes each part) – 40 Multiple Choice Content Questions (10-15) Calculation Questions(25-30)
AP Statistics Review Day 1 Chapters 1-4. AP Exam Exploring Data accounts for 20%-30% of the material covered on the AP Exam. “Exploratory analysis of.
JEOPARDY! Data and Statistics 1 Variable Stats Two Way Tables Histogra m Scatterpl ot Corre Vs Cause Mystery $100 $200 $300 $400 $500.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Midterm Review IN CLASS. Chapter 1: The Art and Science of Data 1.Recognize individuals and variables in a statistical study. 2.Distinguish between categorical.
MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.
Slide Copyright © 2009 Pearson Education, Inc. Types of Distributions Rectangular Distribution J-shaped distribution.
AP Review Exploring Data. Describing a Distribution Discuss center, shape, and spread in context. Center: Mean or Median Shape: Roughly Symmetrical, Right.
AP Statistics. Chapter 1 Think – Where are you going, and why? Show – Calculate and display. Tell – What have you learned? Without this step, you’re never.
Howard Community College
Thursday, May 12, 2016 Report at 11:30 to Prairieview
MATH-138 Elementary Statistics
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
Review 1. Describing variables.
Laugh, and the world laughs with you. Weep and you weep alone
AP Exam Review Chapters 1-10
Practice Mid-Term Exam
Introduction to Probability and Statistics Thirteenth Edition
Ten things about Descriptive Statistics
Presentation transcript:

Review of Midterm math2200

Data W’s Subjects Variables –Categorical versus quantitative

One categorical variable Graphs: –Bar chart –Pie chart Numerical summary: –Frequency table –Relative frequency table

Two categorical variables Conditional and marginal distribution Graphs –Segmented bar charts –Side-by-side bar charts –Side-by-side pie charts Numerical summary –Contingency table –table percentage, row percentage, column percentage

Problems 28 (page 148) Birth order related to major? –What percent of these students are oldest or only children? (113/223) –What percent of Humanities majors are oldest children? (15/43) –What percent of oldest children are Humanities students? (15/113) –What percent of the students are oldest children majoring in the Humanities? (15/223) MajorBirth Order 1234+Total Math/Science Agriculture Humanities Other total

Problems 30 (page 148) What is the marginal distribution of majors? What is the conditional distribution of majors for the oldest children? Math/ScienceAgricultureHumanitiesOtherTotal 57 (25.6%)93 (41.7%)43 (19.3%)30 (13.5%)223 Math/ScienceAgricultureHumanitiesOtherTotal 34 (30.1%)52 (46.0%)15 (13.3%)12 (10.6%)113 MajorBirth Order 1234+Total Math/Science Agriculture Humanities Other total

Simpson’s Paradox Problem 3.38: Two delivery services Delivery Service Type of Service Number of deliveries Number of late packages Overall percentage of late deliveries Pack Rats Regular40012 (3%) 5.60% Overnight10016(16%) Boxes R Us Regular1002(2%) 6% Overnight40028 (7%)

One quantitative variable Graphs –Histogram –Boxplot Qualitative summary –# of modes –Symmetric? Transformation? –Outliers? Numerical summary –Five-number summary –Center: mean versus median –Spread: sd versus IQR

Problem 32: Pay The 1999 National Occupational Employment and Wage Estimates for management Occupations –For chief executives Mean = $48.67/hour Median = $52.08/hour –For General and Operations Managers Mean = $31.69/hour Median = $27.23/hour –Are these wage distributions likely to be symmetric, skewed to the left or skewed to the right?

Shifting and rescaling Locationshiftrescale minxx Q1xx medianxx Q3xx maxxx meanxx spread variance x Standard deviation x IQR x range x

Problem 4.42: Job Growth 20 cities’ job growth rates predicted by Standard & Poor’s DRI in 1996 Are the mean and median very different? Which one is more appropriate? –Mean (2.37%) or median (2.235%)? –SD (0.425%) or IQR (0.515%)? If we subtract from these growth rates the predicted U.S. average growth rate of 1.20%, how would this change the above summary statistics? If we omit Las Vegas (growth rate=3.72%) from the data, how would you expect the above summary statistics to change? How to summarize the distribution of the data?

One quantitative variable and one categorical variable Comparing groups –with histogram, boxplot, stem-and-leaf plot –Transformation when spread is too different across groups

Normal model Z-score and standard normal Nearly normal condition –Normal probability plot Four types of problems –Given parameters and data values (or z-score), ask for probabilities –Given parameters and probabilities, ask for data values (or z-score) –Given probabilities and data values (or z-score), ask for parameters –Given probabilities, data values (or z-score) and one parameter, ask for the other parameter

Problem 22: Winter Olympic 2002 speed skating Top 25 men’s and 25 women’s 500-m speed skating times –Mean = –Sd = 3.33 If the Normal model is appropriate, what percent of the times should be within 1.67 seconds of 73.46? –Solution 1: 1.67=0.5*sd, Normcdf(-0.5,0.5,0,1) –Solution 2: Normcdf(72.19, 75.13, 73.46, 3.33) In the data, only 6% are within that range. Why are the percentages so different?

Problem 39: assembly time Only 25% of the company’s customers succeeded in building the desk under an hour 5% said it took them over 2 hours Assume that consumer assembly time follows a Normal model Mean = ?, SD= ? –Z-score corresponding to 25%: (1- mean)/ SD = invNorm(0.25,0,1) = –Z-score corresponding to 95%: (2- mean)/ SD = invNorm(0.95,0,1) = Solve the two equations, we have mean = 1.29 SD = 0.43

Problem 39: assembly time (cont.) Mean =1.29, sd=0.43 What assembly time should the company quote in order that 60% of customers succeed in finishing the desk by then? –invNorm(0.6,1.29,0.43)

Problem 39: assembly time (cont.) Mean =1.29, sd=0.43 The company wishes to improve the one- hour success rate to 60%. If the sd stays the same, what new lower mean time does the company need to achieve? –Z-score = invnorm (0.6,0,1) –Z-score = (1-mean)/sd –Mean = 0.89

Correlation Sign of r means? The range of r? X and Y are called uncorrelated if and only if r=0 r(x,y)=r(y,x) No units Effected by shifting or rescaling X, Y or both? Uncorrelated does NOT imply no association Sensitive to outliers (remove a point close to the line fitted through the scatterplot increase or decrease r?)

Correlation: Review II 13, Page 264 What factor most explains differences in Fuel Efficiency among cars? Here’s a correlation matrix exploring that relationship for the car’s Weight, Horsepower, engine size (Displacement), and number of Cylinders. MPGWeightHorse- Power Displace ment Cylinders MPG1.000 Weight Horse-Power Displacement Cylinders a)Which factor seems most strongly associated with Fuel Efficiency ? b)What does the negative correlation indicate? c)Explain the meaning of R^2 for that relationship.

Matching r and scatterplots Here are several scatterplots. The calculated correlations are 0.85, , 0.04 and which is which?

Linear regression (least squares) How to calculate the slope? Given the slope, and standard deviations, how to calculate the correlation? The line always goes through Residual = –Overestimation –Underestimation Causal relationship ? How to interpret ?

Diagnostics of a Linear Model 1.Visual inspection: scatter plot satisfies the “Straight Enough Condition”? Looks okay, 2.Regression: calculate the regression equation, r and R^2. (R^2=r*r gives the percentage of variation of the data explained by the model). R^2 is tiny, say<0.2, a linear model may not be a good choice. 3.Residuals: check the residual plot even when R^2 is large. Bad sign if we see some pattern. The spread of the residuals are supposed to about the same across the X-axis if the linear model is appropriate. (you can either put predicted value or x-variable on x-axis). 4.Re-expression: consider re-expressing the data. If a linear model is not appropriate for the data, And remember to repeat the diagnostics every time after fitting a new linear model on the transformed data.

Randomness Simulation Simulation Component ? Response variable? Trial? Example: Suppose the chance of passing the driver’s test is 34% the first time and 72% for the subsequent retests. Estimate the percentage of those tested who still do not have a driver’s license after two attemps.

Check list Graphs and plots: bar chart, pie chart, histogram, boxplot (mod boxplot on ti-83), normal probability plot, scatterplot, residual plot How to make ? How to interpret ? Statistics : mean, medium, min, max, range, quartiles, standard deviation, IQR, correlation coefficient How to calculate ? How to interpret? Model: normal distribution, linear regression. How to get the parameters ?