Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental.

Slides:



Advertisements
Similar presentations
“Students” t-test.
Advertisements

ADVANCED STATISTICS FOR MEDICAL STUDIES Mwarumba Mwavita, Ph.D. School of Educational Studies Research Evaluation Measurement and Statistics (REMS) Oklahoma.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Chapter 11 Inference for Distributions of Categorical Data
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
Introduction to Statistics
Chapter 13: Inference for Distributions of Categorical Data
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
1. DATA ATTRIBUTES ; SUMMARY 1.1Introduction to biostatistics 1.2 The Mean 1.3Measures of Variability 1.4The Normal Distribution 1.5 Distribution; Data.
© 2004 Prentice-Hall, Inc.Chap 1-1 Basic Business Statistics (9 th Edition) Chapter 1 Introduction and Data Collection.
1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.
Evaluating Hypotheses
Inferences About Process Quality
Business Statistics - QBM117 Statistical inference for regression.
Sampling Theory Determining the distribution of Sample statistics.
Richard M. Jacobs, OSA, Ph.D.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
AM Recitation 2/10/11.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
Elementary Statistical Concepts
Statistical inference: confidence intervals and hypothesis testing.
Chapter 1: Introduction to Statistics
Sampling Theory Determining the distribution of Sample statistics.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Elementary Statistical Concepts
Some definitions In Statistics. A sample: Is a subset of the population.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental.
Testing Hypotheses about Differences among Several Means.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.
INTRODUCTION TO ANALYSIS OF VARIANCE (ANOVA). COURSE CONTENT WHAT IS ANOVA DIFFERENT TYPES OF ANOVA ANOVA THEORY WORKED EXAMPLE IN EXCEL –GENERATING THE.
CHAPTER 11 SECTION 2 Inference for Relationships.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Question paper 1997.
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
IE241: Introduction to Design of Experiments. Last term we talked about testing the difference between two independent means. For means from a normal.
Basic Business Statistics, 8e © 2002 Prentice-Hall, Inc. Chap 1-1 Inferential Statistics for Forecasting Dr. Ghada Abo-zaid Inferential Statistics for.
Review of Statistical Terms Population Sample Parameter Statistic.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
The p-value approach to Hypothesis Testing
Sampling Theory Determining the distribution of Sample statistics.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Stats Introduction to Statistical Methods. Instructor:W.H.Laverty Office:235 McLean Hall Phone: Lectures: M T W Th F 11:00am - 12:20pm Geol.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Appendix I A Refresher on some Statistical Terms and Tests.
Stats 242.3(02) Statistical Theory and Methodology.
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
Virtual University of Pakistan
Lecture Nine - Twelve Tests of Significance.
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Comparing k Populations
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Comparing Populations
Chapter Nine: Using Statistics to Answer Questions
Presentation transcript:

Stats 845 Applied Statistics

This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental Design

The Emphasis will be on: 1.Learning Techniques through example: 2.Use of common statistical packages. SPSS Minitab SAS SPlus

What is Statistics? It is the major mathematical tool of scientific inference - the art of drawing conclusion from data. Data that is to some extent corrupted by some component of random variation (random noise)

An analogy can be drawn to data that is affected by random components of variation to signals that are corrupted by noise.

Quite often sounds that are heard or received by some radio receiver can be thought of as signals with superimposed noise.

The objective in signal theory is to extract the signal from the received sound (i.e. remove the noise to the greatest extent possible). The same is true in data analysis.

Example A: Suppose we are comparing the effect of three different diets on weight loss.

An observation on weight loss can be thought of as being made up of two components:

1.A component due to the effect of the diet being applied to the subject (the signal) 2. A random component due to other factors affecting weight loss not considered (initial weight of the subject, sex of the subject, metabolic makeup of the subject.) random noise.

Note: that random assignment of subjects to diets will ensure that this component will be a random effect.

Example B In this example we again are comparing the effect of three diets on weight gain. Subjects are randomly divided into three groups. Diets are randomly distributed amongst the groups. Measurements on weight gain are taken at the following times - - one month - two months - 6 months and - 1 year after commencement of the diet.

In addition to both the factors Time and Diet effecting weight gain there are two random sources of variation (noise) - between subject variation and - within subject variation

This can be illustrated in a schematic fashion as follows: Deterministic factors Diet Time Random Noise within subject between subject Response weight gain

This can be illustrated in a schematic fashion as follows: Questions arise about a phenomenon A decision is made to collect data A decision is made as how to collect the data The data is collected The data is summarized and analyzed Conclusion are drawn from the analysis Statistics

Notice the two points on the circle where statistics plays an important role: 1.The analysis of the collected data. 2.The design of a data collection procedure

The analysis of the collected data. This of course is the traditional use of statistics. Note that if the data collection procedure is well thought out and well designed, the analysis step of the research project will be straightforward. Usually experimental designs are chosen with the statistical analysis already in mind. Thus the strategy for the analysis is usually decided upon when any study is designed.

It is a dangerous practice to select the form of analysis after the data has been collected ( the choice may to favour certain pre- determined conclusions and therefore in a considerable loss in objectivity ) Sometimes however a decision to use a specific type of analysis has to be made after the data has been collected (It was overlooked at the design stage)

The design of a data collection procedure the importance of statistics is quite often ignored at this stage. It is important that the data collection procedure will eventually result in answers to the research questions.

And will result in the most accurate answers for the resources available to research team. Note the success of a research project should not depend on the answers that it comes up with but the accuracy of the answers. This fact is usually an indicator of a valuable research project..

Some definitions important to Statistics

A population: this is the complete collection of subjects (objects) that are of interest in the study. There may be (and frequently are) more than one in which case a major objective is that of comparison.

A case (elementary sampling unit): This is an individual unit (subject) of the population.

A variable: a measurement or type of measurement that is made on each individual case in the population.

Types of variables Some variables may be measured on a numerical scale while others are measured on a categorical scale. The nature of the variables has a great influence on which analysis will be used..

For Variables measured on a numerical scale the measurements will be numbers. Ex: Age, Weight, Systolic Blood Pressure For Variables measured on a categorical scale the measurements will be categories. Ex: Sex, Religion, Heart Disease

Types of variables In addition some variables are labeled as dependent variables and some variables are labeled as independent variables.

This usually depends on the objectives of the analysis. Dependent variables are output or response variables while the independent variables are the input variables or factors.

Usually one is interested in determining equations that describe how the dependent variables are affected by the independent variables

A sample: Is a subset of the population

Types of Samples different types of samples are determined by how the sample is selected.

Convenience Samples In a convenience sample the subjects that are most convenient to the researcher are selected as objects in the sample. This is not a very good procedure for inferential Statistical Analysis but is useful for exploratory preliminary work.

Quota samples In quota samples subjects are chosen conveniently until quotas are met for different subgroups of the population. This also is useful for exploratory preliminary work.

Random Samples Random samples of a given size are selected in such that all possible samples of that size have the same probability of being selected.

Convenience Samples and Quota samples are useful for preliminary studies. It is however difficult to assess the accuracy of estimates based on this type of sampling scheme. Sometimes however one has to be satisfied with a convenience sample and assume that it is equivalent to a random sampling procedure

Some other definitions

A population statistic (parameter): Any quantity computed from the values of variables for the entire population.

A sample statistic: Any quantity computed from the values of variables for the cases in the sample.

Statistical Decision Making

Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from some phenomena, a decision will have to be made about the phenomena

Decisions are generally broken into two types: Estimation decisions and Hypothesis Testing decisions.

Probability Theory plays a very important role in these decisions and the assessment of error made by these decisions

Definition: A random variable X is a numerical quantity that is determined by the outcome of a random experiment

Example : An individual is selected at random from a population and X = the weight of the individual

The probability distribution of a random variable (continuous) is describe by: its probability density curve f(x).

i.e. a curve which has the following properties : 1. f(x) is always positive. 2. The total are under the curve f(x) is one. 3. The area under the curve f(x) between a and b is the probability that X lies between the two values.

Examples of some important Univariate distributions

1.The Normal distribution A common probability density curve is the “Normal” density curve - symmetric and bell shaped Comment: If  = 0 and  = 1 the distribution is called the standard normal distribution Normal distribution with  = 50 and  =15 Normal distribution with  = 70 and  =20

2.The Chi-squared distribution with degrees of freedom

Comment: If z 1, z 2,..., z are independent random variables each having a standard normal distribution then U = has a chi-squared distribution with degrees of freedom.

3. The F distribution with  degrees of freedom in the numerator and  degrees of freedom in the denominator if x  0 where K =

Comment: If U 1 and U 2 are independent random variables each having Chi-squared distribution with 1 and 2 degrees of freedom respectively then F = has a F distribution with  degrees of freedom in the numerator and  degrees of freedom in the denominator

4.The t distribution with degrees of freedom where K =

Comment: If z and U are independent random variables, and z has a standard Normal distribution while U has a Chi- squared distribution with degrees of freedom then t = has a t distribution with degrees of freedom.

An Applet showing critical values and tail probabilities for various distributionsApplet 1.Standard Normal 2.T distribution 3.Chi-square distribution 4.Gamma distribution 5.F distribution

The Sampling distribution of a statistic

A random sample from a probability distribution, with density function f(x) is a collection of n independent random variables, x 1, x 2,...,x n with a probability distribution described by f(x).

If for example we collect a random sample of individuals from a population and –measure some variable X for each of those individuals, –the n measurements x 1, x 2,...,x n will form a set of n independent random variables with a probability distribution equivalent to the distribution of X across the population.

A statistic T is any quantity computed from the random observations x 1, x 2,...,x n.

Any statistic will necessarily be also a random variable and therefore will have a probability distribution described by some probability density function f T (t). This distribution is called the sampling distribution of the statistic T.

This distribution is very important if one is using this statistic in a statistical analysis. It is used to assess the accuracy of a statistic if it is used as an estimator. It is used to determine thresholds for acceptance and rejection if it is used for Hypothesis testing.

Some examples of Sampling distributions of statistics

Distribution of the sample mean for a sample from a Normal popululation Let x 1, x 2,...,x n is a sample from a normal population with mean  and standard deviation  Let

Than has a normal sampling distribution with mean and standard deviation

Distribution of the z statistic Let x 1, x 2,...,x n is a sample from a normal population with mean  and standard deviation  Let Then z has a standard normal distibution

Comment: Many statistics T have a normal distribution with mean  T and standard deviation  T. Then will have a standard normal distribution.

Distribution of the  2 statistic for sample variance Let x 1, x 2,...,x n is a sample from a normal population with mean  and standard deviation  Let = sample variance and = sample standard deviation

Let Then  2 has chi-squared distribution with = n-1 degrees of freedom.

The chi-squared distribution

Distribution of the t statistic Let x 1, x 2,...,x n is a sample from a normal population with mean  and standard deviation  Let then t has student’s t distribution with = n-1 degrees of freedom

Comment: If an estimator T has a normal distribution with mean  T and standard deviation  T. If s T is an estimatior of  T based on degrees of freedom Then will have student’s t distribution with degrees of freedom..