Data Forensics: A Compare and Contrast Analysis of Multiple Methods Christie Plackner.

Slides:



Advertisements
Similar presentations
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Advertisements

DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Exam 1 review: Quizzes 1-6.
Canonical Correlation
Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables.
Factor Analysis Continued
Chapter Nineteen Factor Analysis.
Simple Multiple Line Fitting Algorithm Yan Guo. Motivation To generate better result than EM algorithm, to avoid local optimization.
© LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Statistics for Managers Using Microsoft® Excel 5th Edition
Factor Analysis There are two main types of factor analysis:
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Education 795 Class Notes Factor Analysis II Note set 7.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
T-tests and ANOVA Statistical analysis of group differences.
Chapter 2 Dimensionality Reduction. Linear Methods
PCA Example Air pollution in 41 cities in the USA.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Slide 1 The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics.
Advanced Correlational Analyses D/RS 1013 Factor Analysis.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,
Thursday AM  Presentation of yesterday’s results  Factor analysis  A conceptual introduction to: Structural equation models Structural equation models.
Objectives The student will be able to:  use Sigma Notation  find the mean absolute deviation of a data set SOL: A
Chapter 16 Data Analysis: Testing for Associations.
INDE 6335 ENGINEERING ADMINISTRATION SURVEY DESIGN Dr. Christopher A. Chung Dept. of Industrial Engineering.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Lecture 12 Factor Analysis.
Examining Data. Constructing a variable 1. Assemble a set of items that might work together to define a construct/ variable. 2. Hypothesize the hierarchy.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/20/12 Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Multivariate Data Analysis Chapter 3 – Factor Analysis.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Principle Components Analysis A method for data reduction.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Principal Component Analysis
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Stat240: Principal Component Analysis (PCA). Open/closed book examination data >scores=as.matrix(read.table(" hs.leeds.ac.uk/~charles/mva-
Canonical Correlation Analysis (CCA). CCA This is it! The mother of all linear statistical analysis When ? We want to find a structural relation between.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Exploratory Factor Analysis
Exploring Microarray data
Analysis of Survey Results
Factor analysis Advanced Quantitative Research Methods
Measuring latent variables
BA 275 Quantitative Business Methods
Regression Computer Print Out
Measuring latent variables
Measuring latent variables
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Principal Components Analysis
Factor Analysis (Principal Components) Output
Examining Data.
Statistics Standard: S-ID
Measuring latent variables
Presentation transcript:

Data Forensics: A Compare and Contrast Analysis of Multiple Methods Christie Plackner

Outlier Score Applied to most of the methods Statistical probabilities were transformed into a score of 0 to = statistically unusual 2

Erasure Analysis Wrong-to-right (WR) erasure rate higher than expected from random events The baseline for the erasure analysis is the state average One sample t-test 3

Scale Score Changes Scale score changes statistically higher or lower than the previous year Cohort and Non-cohort One sample t-test 4

Performance Level Changes Large changes in proportion in performance levels across years Cohort and Non-cohort Log odds ratio –adjusted to accommodate small sample size –z test 5

Measurement Model Misfit Performed better or worse than expected Rasch residuals summed across operational items Adjusted for unequal school sizes 6

Subject Regression Large deviations from expected scores Within year – reading and mathematics Across year – cohort within a subject One sample t-test 7

Modified Jacob and Levitt Only method not resulting in a school receiving a score Combination of two indicators: –unexpected test score fluctuations across years using a cohort of students, and –unexpected patterns in student answers Modified application of Jacob and Levitt (2003) –2 years of data –Sample size 8

Principal Component Analysis Does each method contribute to the overall explained variance? Can the methods be reduced for a more efficient approach? 9

Multiple Methods 1.Erasure Analysis (mER) 2.Scale score changes using non-cohort groups (mSS) 3.Scale score changes using cohort groups (mSC) 4.Performance level changes using non-cohort groups (mPL) 5.Performance level changes using cohort groups (mPLC) 6.Model misfit using Rasch Residuals (mRR) 7.Across subject regression using reading scores to predict mathematic scores (mRG) 8.Within subject regression using a cohort’s previous year score to predict current score (mCR) 9.Index 1 of the Modified Jacob and Levitt evaluating score changes (mMJL1) 10.Index 2 of the Modified Jacob and Levitt evaluating answer sheet patterns (mMJL2). 10

Principal Component Analysis Grade 4 mathematics exam 10 methods Method MeanStd. DeviationAnalysis N mSS mPL mRG mRR mER mSC mPLC mMJL mMJL mCR

Method Correlations mSSmPLmRGmRRmERmSCmPLCmMJL1mMJL2mCR mSS1.000 mPL mRG mRR mER mSC mPLC mMJL mMJL mCR

Principal Component Statistics Component Initial Eigenvalues Extraction Sums of Squared Loadings Total% of VarianceCumulative %Total% of VarianceCumulative %

Scree Plot 14

Loading Matrix Component mCR mSC mPLC mMJL mMJL mRG mSS mPL E-5 mRR mER

Simplified Loading Matrix +/- greater than 1/2 the maximum value in the component (+)/(-) is between ¼ to ½ the maximum mCR+(-) mSC+(-) mPLC+(-) mMJL2++ mMJL1+.+ mRG(+)++ mSS(+)_+ mPL_+ mRR(-)+ mER(+)+

Principal Component Statistics Component Initial Eigenvalues Extraction Sums of Squared Loadings Total% of VarianceCumulative %Total% of VarianceCumulative %

Scree Plot 18

Reducing Variable Set Determine how many components to retain –Cumulative percentage of total variation –Eigenvalues –The scree plot Method Number of Retained Components 90% Cumulative Variance7 70% Cumulative Variance4 Eigenvalue4 Scree Plot2 19

Reducing Variable Set Select one method to represent a component Selecting methods within components –Positive selection Retain highest loading method with components –Discarded principal components Remove highest loading method with 20

Reducing Variable Set Selection Method Positive Selection Discarded Principal Components Number of Components 4242 mCRXXXX mSC mPLC mMJL2 mMJL1XXXX mRG mSSXX mPL mRRXX mER Cohort regression* Modified J&L, Index 1* Non-cohort scale score change Model misfit 21

Conclusion All methods seem to account for variation in detecting test taking irregularities Accounting for the most –Cohort regression –Cohort scale score change –Cohort performance level change Method reduction results the same 22

Discussion Different component selection methodologies Closer examination of variables –Remove cohort regression or cohort scale score change –Combine the J&L indexes Remove erasures 23