Getting Started with Large Scale Datasets Dr. Joni M. Lakin Dr. Margaret Ross Dr. Yi Han.

Slides:



Advertisements
Similar presentations
Standardized Scales.
Advertisements

Some (Simplified) Steps for Creating a Personality Questionnaire Generate an item pool Administer the items to a sample of people Assess the uni-dimensionality.
9. Weighting and Weighted Standard Errors. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
NAEP 2008 Trends in Academic Progress in Reading and Mathematics.
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
The t-test:. Answers the question: is the difference between the two conditions in my experiment "real" or due to chance? Two versions: (a) “Dependent-means.
Statistics for the Social Sciences
15a.Accessing Data: Frequencies in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Longitudinal Data Analysis: Why and How to Do it With Multi-Level Modeling (MLM)? Oi-man Kwok Texas A & M University.
Psych 524 Andrew Ainsworth Data Screening 2. Transformation allows for the correction of non-normality caused by skewness, kurtosis, or other problems.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
SW388R7 Data Analysis & Computers II Slide 1 Analyzing Missing Data Introduction Problems Using Scripts.
The Usage of IDB Analyzer: From our Research on Homework Saki Ikoma Penn State EDTHP
Example of Simple and Multiple Regression
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid Using the IEA IDB Analyzer Correlations & Regression.
1 Using the Syntax window AKA Learning a new language!
Multivariate Statistical Data Analysis with Its Applications
Covariance and correlation
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
1 The Nation’s Report Card: 2007 Writing. 2 Overview of the 2007 Writing Assessment Given January – March 2007 – 139,900 eighth-graders – 27,900 twelfth-graders.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Welcome The challenges of the new National Curriculum & Life without Levels.
CALIFORNIA DEPARTMENT OF EDUCATION Jack O’Connell, State Superintendent of Public Instruction Results of the 2005 National Assessment of Educational Progress.
Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
+ General Education Assessment Spring 2014 Quantitative Literacy.
2005 NAEP Results Mathematics San Diego City Schools Board of Education Workshop January 17, 2006 S D CS.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid Using the IEA IDB Analyzer Percentages & Means.
DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
Analyses using SPSS version 19
35th Annual National Conference on Large-Scale Assessment June 18, 2005 How to compare NAEP and State Assessment Results NAEP State Analysis Project Don.
THE 2005 NAEP HIGH SCHOOL TRANSCRIPT STUDY. THE 2005 HIGH SCHOOL TRANSCRIPT STUDY Today ’ s Presentations.
EDCI 696 Dr. D. Brown Presented by: Kim Bassa. Targeted Topics Analysis of dependent variables and different types of data Selecting the appropriate statistic.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
What Use Are International Assessments for States? 30 May 2008 Jack Buckley Deputy Commissioner National Center for Education Statistics Institute of Education.
One-Way Analysis of Covariance (ANCOVA)
The Theory of Sampling and Measurement. Sampling First step in implementing any research design is to create a sample. First step in implementing any.
Statistics for the Social Sciences Psychology 340 Spring 2010 Introductions & Review of some basic research methods.
School-level Correlates of Achievement: Linking NAEP, State Assessments, and SASS NAEP State Analysis Project Sami Kitmitto CCSSO National Conference on.
Jack Buckley Commissioner National Center for Education Statistics May 4, 2011.
Analytical Example Using NHIS Data Files John R. Pleis.
14b. Accessing Data Files in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Using State Tests to Measure Student Achievement in Large-Scale Randomized Experiments IES Research Conference June 28 th, 2010 Marie-Andrée Somers (Presenter)
California State University, Sacramento Nancy Shulock Institute for Higher Education Leadership & Policy Presentation to Conference: Policy Challenges.
D/RS 1013 Data Screening/Cleaning/ Preparation for Analyses.
Analysis of Experiments
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
Tutorial I: Missing Value Analysis
Multiple Imputation using SAS Don Miller 812 Oswald Tower
Obtaining International Benchmarks for States Through Statistical Linking: Presentation at the Institute of Education Sciences (IES) National Center for.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Statistical Weights and Methods for Analyzing HINTS Data HINTS Data Users Conference January 21, 2005 William W. Davis, Ph.D. Richard P. Moser, Ph.D. National.
1 SPSS MACROS FOR COMPUTING STANDARD ERRORS WITH PLAUSIBLE VALUES.
Analyzing Data. Learning Objectives You will learn to: – Import from excel – Add, move, recode, label, and compute variables – Perform descriptive analyses.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
STATISTICAL TESTS USING SPSS Dimitrios Tselios/ Example tests “Discovering statistics using SPSS”, Andy Field.
Statistical analysis.
Reliability Analysis.
Multiple Imputation using SOLAS for Missing Data Analysis
Statistical analysis.
Focus on the Learner: Who are the students states serve?
Multiple Regression.
Week 11 Slides.
Reliability Analysis.
The Use of Test Scores in Secondary Analysis
Presentation transcript:

Getting Started with Large Scale Datasets Dr. Joni M. Lakin Dr. Margaret Ross Dr. Yi Han

Presentation Files Are Available: (Under “Conference materials and resources” at the bottom of the page)

Opening questions How many of you primarily use SPSS for data analysis? How many are comfortable with using syntax (in SPSS or other programs)? How many already have plans to use a specific dataset? How many just curious about what’s available?

What Data is Available? Dr. Yi Han

U.S. National Datasets NCES

U.S. National Datasets Restricted use licenses

International Datasets

PISAPIAAC

Accessing Data and Getting Started Dr. Margaret Ross See PDFs

Key Issues in Working with Large Datasets Dr. Joni Lakin

Key issues 1. Statistical weighting in SPSS 2. Practical significance and large samples 3. Matrix sampling 4. Plausible values SPSS skills that make working with large datasets easier: 5. Keeping and managing syntax 6. Merging datasets 7. Checking for duplicate cases 8. Missing data imputation

1. Statistical weighting in SPSS Weights allow us to better approximate the full population If African American students are 18% of population but 9% of my sample, I could weight each AA student 2.0 (so each observation is included twice in analyses) to get results that better reflect population-level effects. Types of weights Scale weights = multiplies observations to create a weighted sample of same size as population Proportional weights = may be below 1 to keep overall sample size the same as the sample Note When you’re reporting results, you can report weighted sample size, but you should also report unweighted sample sizes too

Using weights These “weight” values are already in large datasets

ELS:2002 Race UNWEIGHTED ELS:2002 Race WEIGHTED Freq.% Amer. Indian/Alaska Native130.8 Asian, Hawaii/Pac. Islander Black or African American Hispanic, no race specified Hispanic, race specified More than one race White, non-Hispanic Total Freq.% Amer. Indian/Alaska Native Asian, Hawaii/Pac. Islander Black or African American Hispanic, no race specified Hispanic, race specified More than one race White, non-Hispanic Total

2. Practical significance and large datasets Because of large sample size, many negligible effects (and ALL correlations) will be significant Must consider effect sizes and practical significance ELS:2002 variables Independent Samples Test tdfSig. Math test score <.001 Reading test score <.001 Mathematics self-efficacy <.001 English self-efficacy scale Wow!! All significant!!

Practical significance and large datasets Actually negligible differences for reading and small differences for math ELS:2002 variables Independent Samples Test tdfSig.Cohen’s d Math test score < Reading test score < Mathematics self-efficacy < English self-efficacy scale

3. Matrix sampling (be aware of…) Used in large-scale assessments when Large domain being sampled (e.g., world history) Need to cover many topics in limited time Individual estimates of the constructs are less important than aggregate estimates (state level achievement) Usually requires IRT (item response theory) scoring methods to allow for comparable scores across examinees completing different items Table from von Davier et al.,

4. Plausible values Can result from matrix sampling (with IRT models), bootstrapping, and missing data imputation In matrix sampling, individual estimates of skills are less reliable and plausible values better capture this error variance compared to single scores Results in multiple estimates of the student’s true score on the construct (will appear as multiple variables) Poor practice = averaging plausible values before analysis Produces biased estimates (von Davier et al., see notes) Better practice = using methods that analyze the different estimates together and produce standard error bars Refer to von Davier et al. link in notes

5. Keeping and managing syntax From any command window, can select “Paste” Makes sure analyses start with the same data selections: Sample weights, split files, selecting relevant cases Good for keeping record of computed and recoded variables

6. Merging datasets Add cases = add more participants’ data Add variables = add variables for same participants from another dataset

Merging datasets--Adding variables Have to exclude duplicate variables from one dataset Check that values are really identical (if not, change variable name) Use Key Variables to match cases

7. Checking for duplicate cases

Duplicate cases output Will appear as a new variable “PrimaryLast” Will need to decide how to handle on case-by-case basis Merging datasets incorrectly can result in duplicates If variables are identical, delete one If variables are different, check that identification variables are correct

8. Missing data Methods that bias results: Mean substitution, listwise or pairwise deletion Methods that can provide less biased estimates Single imputation regression (better than above, but restricts variability) Expectation-maximization (EM)—best of SPSS options, works well when data is missing at random Analyze  Missing Value Analysis Be sure to read up on “missing completely at random, missing at random”, and “missing not at random”

Other Resources Dr. Lakin

AERA Research Grants and Dissertation Grants “The program seeks to stimulate research on U.S. education issues using data from the large-scale, national and international data sets supported by the National Center for Education Statistics (NCES), NSF, and other federal agencies, and to increase the number of education researchers using these data sets.” Suggestions based on personal observations and the RFP: Must use a strong quasi-experimental design ( Schneider et al., Estimating Causal Effects: Using Experimental and Observational Designs ) Regression discontinuity, propensity score matching, etc. Bringing in new quantitative approaches for other fields also very appealing (economics, epidemiology, etc.) Check past grants to see which datasets are “neglected” (more recent datasets better) Prefer ideas that involve multiple datasets in meaningful research are more successful Analyses of international datasets have been more successful recently

Other opportunities IES Research Grants do fund secondary data analyses with Exploration grant goals (any subject area) IES data training workshops AERA annual meeting usually has data training events: PDC02: Analyzing NAEP Assessment Data with Plausible Values… PDC13: Advanced Analysis using Adult International Large Scale Assessment Databases PDC16: Using NAEP Data on the Web for Educational Policy Research Several on quantitative methods (including propensity scores) AERA Institute on Statistical Analysis for Education Policy (summer)Institute on Statistical Analysis for Education Policy IES/NCES hosts STATS-DC conferences and summer institutes to train researchers in using specific datasets

Q&A Presentation files are available from (Under “Conference materials and resources”)