Introductory Statistical Concepts. Disclaimer – I am not an expert SAS programmer. – Nothing that I say is confirmed or denied by Texas A&M University.

Slides:



Advertisements
Similar presentations
Sta220 - Statistics Mr. Smith Room 310 Class #14.
Advertisements

Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Regression Inferential Methods
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Random Sampling and Data Description
Objectives (BPS chapter 24)
Data Analysis Statistics. Inferential statistics.
DATA ANALYSIS I MKT525. Plan of analysis What decision must be made? What are research objectives? What do you have to know to reach those objectives?
Statistics: Data Analysis and Presentation Fr Clinic II.
The Simple Regression Model
The Basics of Regression continued
Chapter 19 Data Analysis Overview
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
Simple Linear Regression Analysis
Inferences About Process Quality
Data Analysis Statistics. Inferential statistics.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 18-1 Chapter 18 Data Analysis Overview Statistics for Managers using Microsoft Excel.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Chapter 12 Section 1 Inference for Linear Regression.
Correlation & Regression
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
STA291 Statistical Methods Lecture 27. Inference for Regression.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
1 Chapter 1: Introduction to Design of Experiments 1.1 Review of Basic Statistical Concepts (Optional) 1.2 Introduction to Experimental Design 1.3 Completely.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Determination of Sample Size: A Review of Statistical Theory
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 21 The Simple Regression Model.
Simple linear regression Tron Anders Moger
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Chapter 10 The t Test for Two Independent Samples
PCB 3043L - General Ecology Data Analysis.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 18 Data Analysis Overview Yandell – Econ 216 Chap 18-1.
Regression.
PCB 3043L - General Ecology Data Analysis.
Regression.
Statistics in Applied Science and Technology
CHAPTER 29: Multiple Regression*
Chapter 12 Regression.
Regression.
Regression.
Regression.
Regression Chapter 8.
Regression.
CHAPTER 12 More About Regression
Regression.
Presentation transcript:

Introductory Statistical Concepts

Disclaimer – I am not an expert SAS programmer. – Nothing that I say is confirmed or denied by Texas A&M University. 2

Why Are We Here? Deming – To Learn – To Have Fun Question: Who was Deming? 3

Poll: What type of organization do you work for? [PlaceWare Multiple Choice Poll. Use PlaceWare > Edit Slide Properties... to edit.] Business Government Education Nonprofit Other 4

Purpose of These Lectures A review of the statistical concepts used in most of the SAS Analytics Lecture Series. We will look at questions such as the following: – What is the nature of statistical analyses? – Why are population parameters so important? – What is really being tested when you see a p-value? – Why does regression handle missing data so well? – What are residual analyses? 5

Descriptive Statistics

7 The Population (Very important concepts) Variable of Interest The Distribution Parameters MeanModeRange MedianVariance Etc

Learning Outcomes You will learn – basic statistical concepts – the definition of mean, median, mode and standard deviation – the difference between populations and samples – the difference between parameters and estimates – about confidence intervals – how to test a statistical hypothesis – how to run a regression analysis 8

Parameters Characteristics of the variable of interest It is how we describe the variable of interest Parameters are unknown 9

Parameters (Characteristics) Central Tendency Mode Median Mean Measures of Variability Range Variance Standard Deviation 10 Click Here Click Here for more information on Mode Mean Median Click Here Click Here for an applet

Variability Change in the Data

12 What is an Index ? How SUNNY is SUNNY? THE UV Index Click Here

13 Air Quality Index What Does It Mean?

DOW JONES INDUSTRIAL AVERAGE INDEX 14 What does 10, really mean? What is “better” a DJIA of 10,000 Or a DJIA of 12,000?

Variability Index A Simple One Find the Largest Value Find the Smallest Value Let Range = R = Largest – Smallest 15

A More Complex Variation Index The Standard Deviation Statisticians use this index to indicate variability You will see it written as Widely available from SAS, Excel, and other statistical packages 16

Details of the More Complex Index Example – Suppose that we observe the following three numbers The mean of these number is: ( )/3 = 4 We now subtract the mean from each number and square it (1-4)*(1-4) + (4-4)*(4-4) +(7-4)*(7-4) = 18 The Standard Deviation = sqrt(18/2) = 3 17

What does this Mean? By itself, it may be confusing to some. Comparing populations, we can use it to say which population varies the most. Let us look at an applet – Click HereClick Here 18

Using Graphs to Determine Variability Box Plot Click Here 19

Distributions

Known Distribution With a known distribution, we know the following: – the shape – the mean – the variability (standard deviation) – and/or some other information 21

Classical Distributions─Normal 22

Normal─Overlay 23

Classical Distributions─Uniform 24

Survey The following are called parameters of the population: – mean, median, mode – variance, standard deviation, range, inter-quartile range (IQR) In general, are these known or unknown? – Known = yes (select using your seat indicator) – Unknown = no (select using your seat indicator) 25

MPG─Histogram 26 Compare with “true” values !

Simulated Sample In this example, we simulated taking a sample of size 1000 from one population of cars weighing 3000 pounds with a normal distribution with mean=24 and standard deviation=1. You can practice this after class. 27

Section 1.2 Populations and Samples

Objectives – Understand the relationships between populations and samples parameters and estimates. – Look at an overview of hypotheses testing. 29

Population 30 Mean, Variance, Median, Mode, Distribution, … Parameters

Example Mpg of American-made cars that weigh between 2000 and 3500 pounds and were built in the 1970s. Parameters – mean, variance, and so on In general, we do not know the parameters. 31

Purpose of Statistical Analyses – Estimate the parameters. (Make guesses.) Example: What is the population mean? – Test hypothesis about the parameters. (Ask questions.) Example: Is the population mean=30mpg? 32

Role of Samples – Taking a sample of the population enables you to make estimates of the population parameters answer the questions about the population parameters. 33

Population and Sample 34 Mean, Variance, Median, Mode, Distribution, … Parameters Sample mean Sample variance Sample S Inference: Estimates Test of hypotheses

Example: cars_american This is a sample of American-made cars that weigh between 2000 and 3500 pounds and that were built in the 1970s. We are interested in the mpg. Use summary statistics to analyze the data. 35

Results of Summary Statistics 36

Results of Histogram 37 continued...

Results of Histogram 38

Sampling Distribution Applet sampling_dist This demonstration illustrates how to estimate and plot the sampling distribution of various statistics. 39

View/Application Share: Demo: Sampling Distributions Applet [PlaceWare View/Application Share. Use PlaceWare > Edit Slide Properties... to edit.] 40

m/sampling_dist/index.h... [PlaceWare Web Page. Use PlaceWare > Edit Slide Properties... to edit.] 41

Confidence Intervals on the Population Mean Level of Comfort 50% {21.57 to 22.21} 95% {20.96 to 22.82} 99.9% {20.30 to 23.48} 42 What does this mean?

Test That the Population Mean = 30 mpg Use t-test  One Sample t-test Requirements for running this test: – Large n > 35 – Or leftovers are normal What is the p-value or sig value? 43

Testing Mean = 30 44

Conclusions of the Test Choose an alpha level, usually alpha=.05. If sig<alpha, then reject. Otherwise, fail to reject. 45

Sig and p-values When you see a sig value or p-value: – You know that some hypothesis is being tested. – You know whether or not the hypothesis is being rejected. – You probably do not know what the hypothesis really is. Ask yourself these questions: – What are the population parameters being tested? – How is what is being tested related to those parameters? 46

Requirements for Doing This Test Large n  n > 35 Or leftovers are normally distributed. Use Histogram to test for normality. 47

Populations─Which Ones are Similar? 48

Populations─Which Ones are Similar? Take samples. 49

Take Samples Use the samples to answer this question: “Which populations are similar?” Statistical translations: “Which populations are similar?” is the same as asking… Are the following the same: – distribution? – mean? – variance? 50

Background/Requirements Before we jump into the analysis, we must ask the following questions: – How many populations are there? – How many population parameters are we interested in and what are they? – What tests do we want to do, and what are the requirements for doing those? – Are we using everything we “know?” 51

Example Suppose that we are interested in the mpg of American and European cars. How many populations are there? 52 American Cars Mpg Distribution Mean Variance European Cars Mpg Distribution Mean Variance

Poll: How many populations are there? [PlaceWare Multiple Choice Poll. Use PlaceWare > Edit Slide Properties... to edit.] One - MPG Two - American and European Depends on the sample size 53

Parameters Population 1Population 2 American CarsEuropean Cars Variable of interest: mpg Distribution: Normal? Mean: Variance: 54

Analyses 1.We want to look at the distributions. 2.We want to estimate the parameters. 3.We want to answer these questions: Are the populations means the same? Are the population variances the same? 55

Example: Our Data Set car_am_eu Suppose that we are interested in the mpg of American and European cars. 56 Sample American Cars Mpg Distribution Mean Variance European Cars Mpg Distribution Mean Variance Sample

Results from the Sample 57 continued...

Results 58

Box Plots 59 American European

Histograms 60 American European

Poll: Are the populations the same? [PlaceWare Yes/No Poll. Use PlaceWare > Edit Slide Properties... to edit.] Yes No 61

Conclusion Based on Sample Numbers and Graphs Easy -- Based on the samples, the populations are different—no statistical jargon But I must have a p-value for my boss, for my paper, and so on. 62

Formal Tests The classical approach in determining whether two populations are the same is to test to see whether the two population means are equal. But first we check to see whether the two population variances are equal: 63 continued...

Formal Tests We use t-test  Two Sample. 64 Test 2 Test 1

Section 1.3 Simple Linear Regression

Objectives – Identify the following: the population parameters the appropriate model number of populations sampled the correct hypotheses what should be tested for normality what “equal variances” means. 66

MPG Example 67 Weight = 3000 Weight = 2600 Weight = 2900 Weight = 2300 Take a sample of size 1 from each population!

Data We should be in deep trouble with one sample from each population. We have eight unknown population parameters. Can you name them? But what do we “know”? 68

Survey Name the population parameters. 69

Essential Part and Leftovers We want to “model” the data as follows: MPG = Essential Part + Leftover or MPG = Mean + Leftover 70

“Know” or Assumptions First, we “know” that Second, each population mean is related to weight by the following: The population means fall on a straight line!! How many unknowns are there now? 71

Poll: How many unknowns are there? [PlaceWare Multiple Choice Poll. Use PlaceWare > Edit Slide Properties... to edit.] n 72

Graph 73

Observed, Essential Part, Leftover 74

The Official Regression Model or 75 The errors are “known” to be normal with mean 0 and variance.

Main Assumptions The means of the populations fall on a straight line. All of the variances are equal ( ). The errors are “known” to be normal with mean 0 and variance. 76

Assumptions for Simple Linear Regression Appendix A This demonstration illustrates the fundamental concepts of simple linear regression. 77

View/Application Share: Demo: Linear.doc [PlaceWare View/Application Share. Use PlaceWare > Edit Slide Properties... to edit.] 78

How Can We Estimate the Unknown Parameters? The Principle of Least Squares: or Now, choose a and b so that is as small as possible. or Minimize. 79

OUTPUT_0 80

OUTPUT 81

OUTPUT_1 82

OUTPUT_2 83

OUTPUT_3 84

OUTPUT_4 85

Missing Values Suppose that we want to estimate the mean mpg when weight=2500. Predicted (Estimated) Mean MPG = *weight Why does this work? 86

Survey Can anyone explain why this works? 87

Conclusion – Simple linear regression is very powerful. – But it is based on assumptions (what we “know”). – We need to check assumptions (residual analyses). 88