Lingard & van Alphen April 15 - 17, 2010 1 1 Analyzing Reliability and Validity in Outcomes Assessment (Part 2) Robert Lingard and Deborah K. van Alphen.

Slides:



Advertisements
Similar presentations
1/2/2014 (c) 2001, Ron S. Kenett, Ph.D.1 Parametric Statistical Inference Instructor: Ron S. Kenett Course Website:
Advertisements

Introduction to Hypothesis Testing
Lecture 8: Hypothesis Testing
Lecture Slides Elementary Statistics Eleventh Edition
Introductory Mathematics & Statistics for Business
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS Random Variables and Distribution Functions
BUS 220: ELEMENTARY STATISTICS
Copyright © 2010 Pearson Education, Inc. Slide
Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Quantitative Methods Lecture 3
© 2010 Pearson Prentice Hall. All rights reserved The Normal Distribution.
Chapter 7 Sampling and Sampling Distributions
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Chapter 17/18 Hypothesis Testing
The basics for simulations
You will need Your text Your calculator
Chapter 7 Hypothesis Testing
HYPOTHESIS TESTING.
6. Statistical Inference: Example: Anorexia study Weight measured before and after period of treatment y i = weight at end – weight at beginning For n=17.
Chi-Square and Analysis of Variance (ANOVA)
Hypothesis Tests: Two Independent Samples
Chapter 10 Estimating Means and Proportions
Chapter 4 Inference About Process Quality
Statistics Review – Part I
Lesson 10 - QR Summary of Sections 1 thru 4. Objectives Review Hypothesis Testing –Basics –Testing On Means, μ, with σ known  Z 0 Means, μ, with σ unknown.
Quantitative Analysis (Statistics Week 8)
“Students” t-test.
Putting Statistics to Work
Statistical Inferences Based on Two Samples
© The McGraw-Hill Companies, Inc., Chapter 10 Testing the Difference between Means and Variances.
Chapter Thirteen The One-Way Analysis of Variance.
Chapter 8 Estimation Understandable Statistics Ninth Edition
CHAPTER 15: Tests of Significance: The Basics Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.
Experimental Design and Analysis of Variance
Testing Hypotheses About Proportions
Simple Linear Regression Analysis
Part 13: Statistical Tests – Part /37 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of.
Multiple Regression and Model Building
January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.
Chapter 16 Inferential Statistics
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Section 7-2 Estimating a Population Proportion Created by Erin.
9. Two Functions of Two Random Variables
4/4/2015Slide 1 SOLVING THE PROBLEM A one-sample t-test of a population mean requires that the variable be quantitative. A one-sample test of a population.
Commonly Used Distributions
Adapted by Peter Au, George Brown College McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited.
Chapter 12 Tests of Hypotheses Means 12.1 Tests of Hypotheses 12.2 Significance of Tests 12.3 Tests concerning Means 12.4 Tests concerning Means(unknown.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Chapter 9 Hypothesis Testing.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Chapter 10 Hypothesis Testing
Overview Definition Hypothesis
Fundamentals of Hypothesis Testing: One-Sample Tests
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
© Copyright McGraw-Hill 2004
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1)
Chapter 9 Hypothesis Testing.
Presentation transcript:

Lingard & van Alphen April , Analyzing Reliability and Validity in Outcomes Assessment (Part 2) Robert Lingard and Deborah K. van Alphen California State University, Northridge

Lingard & van Alphen April , Overview – Part 2 Statistics DescriptiveInferential Software Perspective & Assessment Applications

Lingard & van Alphen April , Software Tools for Statistical Analysis MATLAB, with the Statistics Toolbox “MATrix LABoratory,” produced by The Mathworks, Inc. Complete package includes many math functions, visualization tools, and special-purpose toolboxes Excel, with the Add-In: Analysis Toolpak Part of Microsoft Office Suite Other choices: Minitab, SPSS, Mathematica, … 

Lingard & van Alphen April , Using MATLAB to Generate a Histogram >> centers = 0 : 10; % bin centers at 0, 1, …, 10 >> hist(datlarge, centers) - Constructs a histogram with bin centers specified in the vector centers Set of 95 scores, ranging from 1 to 10

Lingard & van Alphen April , Using MATLAB to Calculate the Means and Standard Deviations >> mean(datlarge) ans = >> std(datlarge)% use for sample data ans = >> mle(datlarge) % use for mean and st. dev. of population data ans =

Lingard & van Alphen April , Samples of students’ writing were scored by two evaluators, using the same rubric, with scores from one evaluator in set “dat”, and scores from the other in set “dat2” Scatter plot: scores given to each sampled student by the two evaluators. Correlating the scores: >> R = corrcoef(dat,dat2) R = Correlation coefficient of.93 between the two data sets is an indicator of the pair-wise reliability of the data from the two evaluators. Assessment Situation Reliability of Subjective Performance Data (6, 4) (Eval. #1)

Lingard & van Alphen April , A sample of students have scores in set “dat” on one test (assumed to be valid), and scores in set “dat2” on another test measuring the same learning outcome. Scatter plot: scores obtained by each sampled student on the 2 tests Correlating the scores: >> R = corrcoef(dat,dat2) R = Correlation coefficient of.93 between the two data sets is an indicator of the validity the results in dat2. Assessment Situation Validity of Test Results (6, 4) (valid)

Lingard & van Alphen April , Overview Statistics DescriptiveInferential Probability

Lingard & van Alphen April , Random Variables (RV’s) Intuitive Definition: A random variable is a variable whose value is determined by the outcome of an experiment. Often denoted by a capital bold letter, say X Example: The scores on an embedded assessment test question

Lingard & van Alphen April , Probability - Notation Notation: P(….) denotes the probability of the event described in the parentheses Example If X is the RV denoting the scores on an embedded assessment test question P(X > 8) denotes the probability of a student obtaining a score greater than 8

Lingard & van Alphen April , Probability Density Functions The probability density function (pdf), f(x), for a random variable X gives the spread of probability on the real number line. Thus, Since the total probability is always 1, the area under the pdf is 1. (area under the pdf between a & b) Example: P(1 < X < 2) f(x)

Lingard & van Alphen April , Gaussian Random Variables Definition: Random variable X is Gaussian (or Normal), with mean  and variance  2, if its probability density function is: “Bell-shaped curve” f(x) Mean:  = 0; Standard Deviation:  = 1

Lingard & van Alphen April , Critical Values for Gaussian RV’s (Leading to “Confidence Intervals”) Goal: to form an interval, centered at the mean (  ), containing specific amounts of probability in the pdf of a Gaussian RV. How many  ’s away from the mean do we need to go? Notation: z* is the number of standard deviations Example: For a Gaussian RV, find an interval of values, centered at the mean, that contains 90% of the probability in the pdf: 90%  – z*    + z*  z* 

Lingard & van Alphen April , Critical Values for Gaussian RV’s 90%   Claim: The number z* = is the critical value required to contain 90% of the probability. For all Gaussian RV’s, 90% of the probability is contained within a distance of standard deviations from the mean.  –    

Lingard & van Alphen April , Commonly-used Critical Values Some commonly-used z* values, and the corresponding amounts of probability captured are: z* values  probability % % % % 95%  – 1.96    

Lingard & van Alphen April , Inferential Statistics Statistics DescriptiveInferential Parameter Estimation Hypothesis Testing Given: Random samples of data Sample statistics: mean, median, … Goal: Determine population parameters: mean, median, … Quantify the confidence we have in the estimates Formulate hypotheses to interpret the population data

Lingard & van Alphen April , Entire Population Distribution Consider: a normal population consisting of 10,000 academic scores The mean, µ, and standard deviation, , of the population could be found in MATLAB: Problem: we don’t want to collect/process this much data. Solution: take a sample, and use the sample statistics to estimate the population parameters.

Lingard & van Alphen April , Point Estimates & Confidence Intervals Each sample statistic is a point estimate of the corresponding population parameter. A confidence intervals quantify our level of confidence or belief in the point estimates; show an interval of probable values for the population parameter; indicate the precision of the point estimate. ( ) Wide confidence interval Narrow confidence interval

Lingard & van Alphen April , Confidence Intervals and Levels Confidence level, %: A measure of the degree of reliability of the confidence interval A confidence level of 95% implies that 95% of all samples (taken from the population) would produce an interval that includes the population parameter being estimated Intuitively: the higher the confidence level, the more likely that the population parameter lies within the interval.

Lingard & van Alphen April , Example: Meaning of the 95% Confidence Interval About the Sample Mean Suppose that we take multiple samples from the population, and obtain the sample mean and standard deviation for each. Example: take 100 samples, each of size 100, from the population “pop” Sample 1: Calculate sample mean , sample st. dev. s 1 … Sample 100: Calculate sample mean , sample st. dev. s 100 Calculate the 95% confidence interval for each of the 100 sample means; and Plot one horizontal line for each confidence interval (and thus for each sample)

Lingard & van Alphen April , The Meaning of a Confidence Interval Population Sample #100, n = 100 Normal 10,000 Scores Mean:  = S.D.:  = x = s = 9.44 Sample #1, n = 100 … x = s = 10.08

Lingard & van Alphen April , % Confidence Intervals (CI) About the Mean for Data: “pop” 95 out of 100 confidence intervals include the population mean, µ = st sample: x = CI = (47.7, 51.4)

Lingard & van Alphen April , Finding a Confidence Interval for the Mean For populations with: normal distributions; or non-normal distributions with large sample size (n ≥ 30) Finding the 95% Confidence Interval for the Mean For a given sample of size n: Compute x and s Determine the 95% confidence interval using the equations for the endpoints in (*) above ( ) (*)

Lingard & van Alphen April , Confidence Intervals – Arbitrary Confidence Levels To obtain a confidence interval with an arbitrary confidence level. Returning to endpoint equations in (*), replacing 1.96 by the more general z*: Obtain the z* value from: Confidence Interval z* values  Confidence Levels % % % %

Lingard & van Alphen April , Data Sets for Sample Calculation - Confidence Intervals - Set of 10,000 scores, ranging from 10 to 90 pop datlarge Set of 95 scores, ranging from 1 to 10 Select sample

Lingard & van Alphen April , MATLAB Example: 95% Confidence Intervals for datlarge The normfit MATLAB command finds the sample mean and sample standard deviation, and the 95% confidence interval about the mean. [xbar, s, xbarci] = normfit(datlarge) xbar = sample mean(5.99) s = sample standard deviation(1.94) xbarci = confidence interval for sample mean: (5.59, 6.39) % Confidence Interval for mean:

Lingard & van Alphen April , Confidence Interval: Effect of Sample Size Let B be a bound on the “acceptable” estimation error – i.e., on the acceptable difference between the sample mean and the population mean. To ensure that: for (Conf. Level)% of the samples, the difference between the sample mean and population mean is no bigger than B: the required sample size is: (use the z* associated with the desired confidence level)

Lingard & van Alphen April , Example: Selecting Sample Size (Based on data: ‘’pop”) Suppose that we want a 95% confidence that our estimate of the mean population score is within 2 points of the true mean population score Calculate n with: s = 10, B = 2, z* = 1.96 Problem: s, the sample standard deviation is not available until the sample is selected. Alternative (Rule of Thumb): Divide the range (largest possible value minus smallest possible value) by 4, for a conservative estimate of s (usually too large)

Lingard & van Alphen April , 2010 Selecting Sample Size - Required n Values Bound, B 90% Confidence Level Bound, B 80% Confidence Level Sample Variance, s B Sample Variance, s B

Lingard & van Alphen April , Verifying Alleged Improvements (Did We “Close the Loop?”) In assessment, we may want to determine whether some change that we have implemented has resulted in program improvement. Example: Find the difference in the mean scores on an assessment exam/question - one given before, and one after, a curriculum revision The statistic used as a point estimate for the difference in means for a population (µ 1 - µ 2 ) is the difference in means of samples of the two populations,

Lingard & van Alphen April , Sampling Distribution of the Difference of Two Means If both population distributions are normal, the sampling distribution of is normal. If both samples sizes are large, the sampling distribution of will be approximately normal irrespective of the two population distributions (CLT) Equation for the left and right end-points of the confidence interval: Choose z* based on the desired confidence level.

Lingard & van Alphen April , Suppose that we want to estimate the difference between two sets of test scores – one given before a curriculum change, and one after. Find the point estimate for the difference, and a 99% confidence interval. Example: Verifying Alleged Improvements datlarge: before datlarge2: after x 2 : 7.16 s 2 : 1.81 x 1 : 5.99 s 1 : 1.94

Lingard & van Alphen April , Repeating: Point Estimate for  2 –  1 : Confidence Interval: Example: Verifying Alleged Improvement x 2 : 7.16 s 2 : 1.81 x 1 : 5.99 s 1 : % Confidence Interval for difference in means: 99% confident: “after” – “before” = pos.  “after” > “before”  improvement

Lingard & van Alphen April , About the Assumptions … We have assumed either: normal population, or non-normal population, with large sample size (n >30) Different formulas are available For confidence intervals for the mean and the difference of two means if the underlying population is not normal and the sample size is small; and For confidence intervals about different statistics (sample proportion, sample median, etc.)

Lingard & van Alphen April , Overview of Statistics Statistics DescriptiveInferential Parameter Estimation Hypothesis Testing

Lingard & van Alphen April , Definitions Hypothesis: A statement about the parameters of one or more populations The mean of a set of scores for a certain population is greater than 75, or µ > 75 Hypothesis Testing: a method for deciding between two competing hypotheses using information from a random sample µ = 75 or µ > 75

Lingard & van Alphen April , The Null and Alternative Hypotheses Null Hypothesis (H 0 ) Statement that the value of a population parameter is equal to some claimed value. Assertion that is initially assumed to be true (based on prior belief/measurement) Alternative Hypothesis (H a ) Statement that the population parameter has a value that differs from that claimed by the null hypothesis, H 0.

Lingard & van Alphen April , Decision Making About the Hypotheses Decision: to reject or accept H 0 If the data is “consistent” with the H 0, we conclude H 0 is true. Otherwise: we conclude H 0 is false. The data has to be very inconsistent with H 0 (highly unlikely) to warrant rejecting H 0. Criminal Trial Analogy: H 0 - The defendant is innocent. The jury assumes the defendant is innocent until proven guilty. The jury rejects H 0 only if there is compelling evidence against it.

Lingard & van Alphen April , How unlikely should the data be to warrant rejection of H 0 ? Significance level,  : our cut-off point for believing the data occurred just by chance, given H 0. Often 5%, 1 %, or.1% p value: the probability that the observed sample data (or something even more extreme) could have occurred, given H 0. 1: heads $1 2: heads $2 3: heads $3 4: heads $4 5: heads $5 p = 1/32  3% hmm … Coin Toss Bet, H 0 : The coin is fair. Alpha: 5% whoa!… p <   Reject H 0 p = ½ p = ¼ p = 1/8 p = 1/16 p = 1/32

Lingard & van Alphen April , Hypothesis Testing in Assessment Situation: Historically, test scores for a particular learning outcome indicate  = 5. We “improve” the curriculum, and administer a test on that outcome to a sample of students. We would like to claim that the score has improved, based on our new test results. Hypothesis Testing: H 0 :  = 5; H a :  > 5  (1-tailed test)  =.05 “New test data”, datlarge Set of 95 scores, ranging from 1 to 10

Lingard & van Alphen April , Hypothesis Testing with MATLAB: ztest ztest: Performs a hypothesis test on the mean of a normal population with known variance (or unknown variance if n large)* [h, p] = ztest(x, m, sigma, alpha, tail) x = sample of data m = mean  under null hypothesis sigma = variance of population (use s if sigma unknown) Tail: ‘right’ for one- sided test (H a : µ > µ o ) Output h = 1  reject H 0 ; Output h = 0  accept H 0 Output p: the p-value for our data * ztest also allows you to output the confidence interval about the population mean

Lingard & van Alphen April , Hypothesis Testing with MATLAB: ztest Returning to our example - Hypotheses: H 0 :  = 5; H a :  > 5  =.05 Tail = ‘right’, since H a : u > 5 MATLAB >> std(datlarge) ans = >> [h p] = ztest(datlarge, 5, ,.05, 'right') h = 1(  reject H 0; performance did improve) p = e-007(prob. of sample data occurring if H 0 true) “Significant improvement”

Lingard & van Alphen April , Statistically Significant Improvements A result is said to be statistically significant if it is not likely to have occurred by chance. Since the p-value of a test is the probability that the data occurred by chance (given H 0 ), the smaller the p-value, the greater the statistical significance. Statistically significant at the 1% Level: The p-value of the data is less than.01. small p-value  “statistically significance”  Reject H 0 if  =.01

Lingard & van Alphen April , Summary of Assessment Applications for Statistics – What Have We Shown? Use of descriptive statistics and data visualization with histograms Verification of reliability with High correlation between 2 sets of scores Lack of significant difference between 2 sets of scores Verification of validity with High correlation between scores obtained from a new instrument and those from an established standard; Lack of significant difference between scores obtained from a new instrument and those from an established standard

Lingard & van Alphen April , Summary of Assessment Applications for Statistics Simplifying Program Assessment with Sampling Choosing an appropriate sample size for estimating the mean Finding confidence levels for the (point) estimate Claiming Improvement – estimates of the difference in means Finding confidence levels for the (point) estimate Determining whether the difference is significant Applications discussed in the context of test scores also occur in the context of surveys

Lingard & van Alphen April , 2010 Questions??? & Contact Information van Alphen) “Not everything that counts can be counted, and not everything that can be counted counts.” (Sign hanging in Einstein's office at Princeton) “98% of all statistics are made up.” ~Author Unknown