Hien D Nguyen.  Eleven Atlanta educators found guilty of participating in conspiracy to cheat on student standardized tests and charged with racketeering.

Slides:



Advertisements
Similar presentations
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Advertisements

Statistical Techniques I
Hypothesis testing Another judgment method of sampling data.
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
Decision Errors and Power
Using State Longitudinal Data Systems for Education Policy Research : The NC Experience Helen F. Ladd CALDER and Duke University Caldercenter.org
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Software Quality Control Methods. Introduction Quality control methods have received a world wide surge of interest within the past couple of decades.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Inferences About Process Quality
Lecture 6. Hypothesis tests for the population mean  Similar arguments to those used to develop the idea of a confidence interval allow us to test the.
Today Concepts underlying inferential statistics
Chapter 7 Correlational Research Gay, Mills, and Airasian
Classroom Assessment A Practical Guide for Educators by Craig A
Statistical hypothesis testing – Inferential statistics I.
Issues in Experimental Design Reliability and ‘Error’
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Chapter 10 Hypothesis Testing
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Fundamentals of Hypothesis Testing: One-Sample Tests
Psy B07 Chapter 8Slide 1 POWER. Psy B07 Chapter 8Slide 2 Chapter 4 flashback  Type I error is the probability of rejecting the null hypothesis when it.
Portfolio Management Lecture: 26 Course Code: MBF702.
Student Engagement Survey Results and Analysis June 2011.
Investment Analysis and Portfolio management Lecture: 24 Course Code: MBF702.
T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Classroom Assessment A Practical Guide for Educators by Craig A
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Statistical Review We will be working with two types of probability distributions: Discrete distributions –If the random variable of interest can take.
Fundamentals of Data Analysis Lecture 10 Management of data sets and improving the precision of measurement pt. 2.
Ch.4 DISCRETE PROBABILITY DISTRIBUTION Prepared by: M.S Nurzaman, S.E, MIDEc. ( deden )‏
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
1 Lecture 19: Hypothesis Tests Devore, Ch Topics I.Statistical Hypotheses (pl!) –Null and Alternative Hypotheses –Testing statistics and rejection.
Testing Hypotheses about Differences among Several Means.
Caveon Test Security Audit for Cesar Chavez Academy – Oral Report December 5, 2009 Commissioned by Colorado Department of Education.
Stats/Methods I JEOPARDY. Jeopardy Validity Research Strategies Frequency Distributions Descriptive Stats Grab Bag $100 $200$200 $300 $500 $400 $300 $400.
Nonparametric Statistics. In previous testing, we assumed that our samples were drawn from normally distributed populations. This chapter introduces some.
“Value added” measures of teacher quality: use and policy validity Sean P. Corcoran New York University NYU Abu Dhabi Conference January 22, 2009.
Psy 230 Jeopardy Measurement Research Strategies Frequency Distributions Descriptive Stats Grab Bag $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500.
Copyright © 2010, SAS Institute Inc. All rights reserved. How Do They Do That? EVAAS and the New Tests October 2013 SAS ® EVAAS ® for K-12.
Different Cheating Methods Used by Teachers. As the stakes over standardized testing increase— including, most recently, taking student progress on tests.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Copyright © 2009 Cengage Learning 22.1 Chapter 22 Decision Analysis.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 23 Decision Analysis.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
CREATE – National Evaluation Institute Annual Conference – October 8-10, 2009 The Brown Hotel, Louisville, Kentucky Research and Evaluation that inform.
Chapter 6: Analyzing and Interpreting Quantitative Data
Chapter 8: Simple Linear Regression Yang Zhenlin.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Copyright © 2010, SAS Institute Inc. All rights reserved. How Do They Do That? EVAAS and the New Tests October 2013 SAS ® EVAAS ® for K-12.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
10 March 2016Materi ke-3 Lecture 3 Statistical Process Control Using Control Charts.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.
Overview of Caveon Data Forensics
Classroom Assessment A Practical Guide for Educators by Craig A
Distribution of the Sample Means
Overview of New State Data Forensics Analysis March 2011
Office of Education Improvement and Innovation
CONCEPTS OF ESTIMATION
One-Way Analysis of Variance
11E The Chi-Square Test of Independence
Chapter 9 Hypothesis Testing: Single Population
Cheating on electronic exams
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Hien D Nguyen

 Eleven Atlanta educators found guilty of participating in conspiracy to cheat on student standardized tests and charged with racketeering (maximum sentence is 20 years)  These were the remaining 35 people indicted in March 2013 in one of the largest American school cheating scandals  The defendants consisted of former administrators, principals, and teachers  Article Link: educators-convicted-in-cheating-scandal

 Pressure from state/local governments to improve student performance on high stakes standardized assessments  Pressure from bonuses or dismissal. In California, bonuses can be as high as $25,000 per teach in schools with large test score gains. In charter schools, teacher are at-will employees so there are pressure to show results! 3

 2010: At an elementary school outside of Houston, teachers distributed a detailed study guide based on the state science test. They tubed the exam to see the questions without breaking the seal  2009: At a charter school in Springfield, MA, the principal told teachers to look over students’ shoulders and point out wrong answers as they took the state tests  2009: the Georgian state school board investigated 191 schools on math/reading tests after computer scanners detected many classrooms in which wrong to right erasures were outside of statistical norm 4

 SCOREc captures how well class c scores on the test, relative to how the same students have done on past standardized tests and will do on future tests: SCOREc = {low,high}  ANSWERSc measures how unusual are the pattern of answers given by students in class c (e.g. are there are unusual blocks of answers, or an especially high degree of correlation across student responses): ANSWERSc ={typical,unusual} 5

 Suppose there are two types of classrooms: those in which teachers cheat, and those in which they do not  Define CHEATc equal to one if cheating occurs, and zero otherwise 6

 (A1) Had cheating classrooms not cheated, their distribution of the two outcome measures SCORE and ANSWERS, would be identical to that of non-cheating classrooms  (A2) Second, we assume that although cheating behavior is not directly observed, cheating increases the probability that a classroom will have a high average test score and an unusual pattern of answer strings: 7

 (A3) Define Snc as the probability that a non- cheating class has a high value of SCORE and Anc as the probability that a non-cheating class has an unusual value for ANSWERS  For purposes of exposition, let us assume that for non-cheating classrooms, the two measures SCORE and ANSWERS are uncorrelated, then it follows that: 8

 (L1) The average fraction of high test scores among classes with typical answer strings provides an upper bound on the probability that non-cheating classrooms will have high test scores  Similarly, the observed fraction of unusual answer strings among classes with low test score fluctuations is an upper bound on the probability that non-cheating classrooms will have unusual answer strings  The reason these values are upper bounds is because some classrooms that have “low” test scores or “typical” answer strings may actually be cheaters that our methods fail to detect 9

 Denote the total number of classrooms as N and the total number of classrooms that have both high test scores and unusual answer strings as Nhu  Then, a lower bound on the number of cheating classrooms is how many extra rooms there are with both high test scores and unusual answer strings, relative to the number that would be expected if no classrooms cheated: 10

 represents a lower bound on the number of cheating classrooms for two reasons  First, some cheating classrooms will not be detected by our measures and so will not register as having high test scores and unusual strings  Second, by L1, the probabilities of high test scores or unusual answer strings among non- cheating classes are upper bounds on the true values 11

 Calculations like those in equation (1) provide the basis for our estimation of the number of cheating classrooms  One important caveat to note is that we cannot identify any individual classroom as cheating or not cheating with perfect certainty. The probability that a class with high test score fluctuations and unusual answer strings is cheating is given by: 12

As the thresholds for what constitutes a “high” test score or an “unusual” answer strings are made more stringent, will decline and, consequently, our level of certainty rises that any particular classroom exhibiting these characteristics is cheating. In essence, raising these thresholds will decrease the number of false positives in our estimates. 13

 An obvious potential indicator of teacher cheating is a classroom that experiences unexpectedly large gains in test scores relative to how those same students tested in the previous year  Since test score gains that result from cheating do not represent real gains in knowledge, there is no reason to expect the gains to be sustained on future exams taken by these students (unless, of course, next year’s teachers also cheat on behalf of the students) 14

 In practice, the choice of a cutoff for what represents an “unexpectedly” large test score gain or loss is somewhat arbitrary  A simple approach is to rank each classroom’s average test score gains relative to all other classrooms in that same subject, grade, and year, and construct the following statistic: where rank_gaincbt is the percentile rank for class c in subject b in year t 15

 Teacher cheating, particularly if accomplished by the teacher actually changing answers on test forms, is likely to leave a discernible trail in student answer strings  There are four different measures of how suspicious a classroom’s answer strings are in determining whether a classroom may be cheating 16

 The first measure focuses on the most unlikely block of identical answers given by students on consecutive questions  The second measure of suspicious answer strings involves the overall degree of correlation in student answers across the test When a teacher changes answers on test forms, it presumably increases the uniformity of student test forms across students in the class. This measure is meant to capture more general patterns of similarity in student responses that goes beyond just identical blocks of answers 17

 The third indicator of potential cheating is a high variance in the degree of correlation across questions; that is, on some questions students’ answers are highly correlated, but on other questions they are not  The fourth indicator compares the answers that students in one classroom give compared to other students in the system who take the identical test and get the exact same score; if students in a class systematically miss the easy questions while correctly answering the hard questions, this may be an indication of cheating 18

The overall measure of suspicious answer strings is constructed in a manner parallel to our measure of unusual test score fluctuations. Within a given subject, grade, and year, we rank classrooms on each of these four indicators, and then take the sum of squared ranks across the four measures: 19

For additional rigorous information on statistical model of teacher cheating and analysis of data on specific cases, visit the following website: 20