Prashant Loyalka(Stanford University)

Slides:



Advertisements
Similar presentations
Performance Assessment
Advertisements

1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
Chapter 4 – Reliability Observed Scores and True Scores Error
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
Item Response Theory in Health Measurement
The challenges of equating tests between Russia and the UK Higher School of Economics, Moscow, Alina Ivanova, Elena Kardanova (HSE, Moscow,
Educational Outcomes: The Role of Competencies and The Importance of Assessment.
In the name of Allah. Development and psychometric Testing of a new Instrument to Measure Affecting Factors on Women’s Behaviors to Breast Cancer Prevention:
Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.
Measurement and Data Quality
Copyright © 2001 by The Psychological Corporation 1 The Academic Competence Evaluation Scales (ACES) Rating scale technology for identifying students with.
Comparing Generic Student Learning Outcomes among Fresh Graduates in the Workplace Comparing Generic Student Learning Outcomes among Fresh Graduates in.
DEVELOPING ALGEBRA-READY STUDENTS FOR MIDDLE SCHOOL: EXPLORING THE IMPACT OF EARLY ALGEBRA PRINCIPAL INVESTIGATORS:Maria L. Blanton, University of Massachusetts.
Classroom Management Self-efficacy in a Teacher Preparation Program Presentation at NERA, October 2013 University of Connecticut - Neag School of Education.
Pleasant Hill Elementary 2012 ~ 2013 Common Core Standards.
Measuring of student subject competencies by SAM: regional experience Elena Kardanova National Research University Higher School of Economics.
1 Issues in Assessment in Higher Education: Science Higher Education Forum on Scientific Competencies Medellin-Colombia Nov 2-4, 2005 Dr Hans Wagemaker.
Cross-Cultural Comparability of SAM-math results Irina Brun Elena Kardanova National Research University Higher School of Economics, Institute of Education,
Assessing SAGES with NSSE data Office of Institutional Research September 25 th, 2007.
LECTURE 2 EPSY 642 META ANALYSIS FALL CONCEPTS AND OPERATIONS CONCEPTUAL DEFINITIONS: HOW ARE VARIABLES DEFINED? Variables are operationally defined.
Students’ and Faculty’s Perceptions of Assessment at Qassim College of Medicine Abdullah Alghasham - M. Nour-El-Din – Issam Barrimah Acknowledgment: This.
Reliability & Validity
Quantitative SOTL Research Methods Krista Trinder, College of Medicine Brad Wuetherick, GMCTE October 28, 2010.
Measuring and reporting outcomes for BTOP grants: the UW iSchool approach Samantha Becker Research Project Manager U.S. IMPACT Study 1UW iSchool evaluation.
Evaluating Survey Items and Scales Bonnie L. Halpern-Felsher, Ph.D. Professor University of California, San Francisco.
Question paper 1997.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Item Response Theory in Health Measurement
Ensuring quality and validity of measurement with SAM tests Kardanova Elena National Research University Higher School of Economics.
Assessing Learning Outcomes Polices, Progress and Challenges 1.
Applied Opinion Research Training Workshop Day 3.
Aligning Assessments to Monitor Growth in Math Achievement: A Validity Study Jack B. Monpas-Huber, Ph.D. Director of Assessment & Student Information Washington.
Measuring Mathematics Self Efficacy of students at the beginning of their Higher Education Studies With the TransMaths group BCME Manchester Maria.
Using Psychometric Analysis to Drive Mathematics Standardized Assessment Decision Making Mike Mazzarella George Mason University.
The challenges of equating tests between Russia and Scotland Higher School of Economics, Moscow, Alina Ivanova, Elena Kardanova, Irina.
Kenneth C. C. Yang The University of Texas at El Paso Presented at 2016 Sun Conference TEACHING INFORMATION LITERACY SKILLS IN COLLEGE CLASSROOMS: EMPIRICAL.
Research And Evaluation Differences Between Research and Evaluation  Research and evaluation are closely related but differ in four ways: –The purpose.
Defining 21st Century Skills: A Frameworks for Norfolk Public Schools NORFOLK BOARD OF EDUCATION Fall 2009.
Models and Instruments for CDIO Assessment Content Validity – mapping the CDIO syllabus to questionnaire items  The role of specificity: Difficulty, context,
Introduction to Survey Research
Elena Kardanova Higher School of Economics, Moscow
INTERNATIONAL ASSESSMENTS IN QATAR TIMSS
Research Methodologies
Lecture 5 Validity and Reliability
Oleh: Beni Setiawan, Wahyu Budi Sabtiawan
A nationwide US student survey
Director of Policy Analysis and Research
Consequential Validity
Test Validity.
NSSE Results for Faculty
ASSESSMENT OF STUDENT LEARNING
Educational Analytics
Item Analysis: Classical and Beyond
Seminar on the importance of Education Research and Innovation
JET Education Services: Innovations in Teacher Support and Curriculum Development Presentation to the Care and Support for Teaching and Learning Regional.
Instructional Practices in the Early Grades that Foster Language & Comprehension Development Timothy Shanahan University of Illinois at Chicago
Statistics and Research Desgin
© 2012 The McGraw-Hill Companies, Inc.
Analyzing Reliability and Validity in Outcomes Assessment Part 1
An Affirmative Development Coaching Competency Questionnaire
Using Data for Improvement
Quantitative Methods for Business Studies
Young Lives, University of Oxford
Investigating item difficulty change by item positions under the Rasch model Luc Le & Van Nguyen 17th International meeting of the Psychometric Society,
Common Core State Standards
Analyzing Reliability and Validity in Outcomes Assessment
Item Analysis: Classical and Beyond
NextGen STEM Teacher Preparation in WA State
Item Analysis: Classical and Beyond
  Using the RUMM2030 outputs as feedback on learner performance in Communication in English for Adult learners Nthabeleng Lepota 13th SAAEA Conference.
Presentation transcript:

International Study of Learning in Higher Education: China, Russia, and the US Prashant Loyalka(Stanford University) Elena Kardanova (Higher School of Economics, Moscow) With a host of great collaborators (Lydia et al.) AERA, 2016

Outline of presentation Brief background of the study Test development Preliminary results

Outline of presentation Brief background of the study Test development Preliminary results

A major goal of university systems is to produce skilled graduates (Spellings, 2006; Alexander, 2000) Skilled graduates can contribute towards the productivity and innovation  higher economic growth (Goldin and Katz, 2008; Autor et al., 2003; Bresnahan et al., 2002; Bresnahan, 1999; Katz and Krueger, 1998) Failing to produce skilled graduates may hinder the capacity of countries to compete in the global knowledge economy  stifle growth (Hanushek and Woessman, 2012; Hanushek and Woessman, 2008)

What are the skills (competencies) that students are supposed to have learned during university? Academic skills such as math, science, language, major-specific skills (Pascarella & Terenzini, 2004) Higher order thinking skills such as critical thinking—perceived by US colleges and employers among the most important skills for college graduates to become effective contributors in the global workforce (ETS, 2013; AAC&U, 2011; Casner-Lotto and Barrington, 2006)

Although there is high and increasing interest from researchers and policymakers, few studies have examined whether students are learning these skills during university A couple of US studies show students make modest gains in academic and higher order thinking skills (Pascarella et al., 2011; Arum and Roska, 2011) There are very few international comparison studies (with limited representativeness) (Zlatkin-Troitschanskaia et al., 2014)

Our project - 2 main goals: 1) Assess and compare university student skills (levels and gains) within and across countries 2) Examine which factors help students develop skills

How to fulfill these goals How to fulfill these goals? Examining students in three of the world’s largest economies: RUSSIA USA CHINA

We are in the process of extending the study to other countries How to meet goals? Examining students in three of the world’s largest economies: RUSSIA JAPAN, KOREA USA CHINA INDIA BRAZIL SOUTH AFRICA We are in the process of extending the study to other countries

More specifically… Randomly select universities (and classes/students within each university) in each country – so the sample is nationally representative Focus on engineering majors (CS and EE) Assess skills over time academic skills (math, physics, basic computing) major-specific skills (e.g. computer science) higher order skills (ETS critical thinking & quantitative literacy) Survey students, professors, administrators Use quasi-experimental methods to examine factors leading to skill gains

3 stages of the project Pilot is Finished Stage 1) Pilot Stage (Fall, 2014): Developing and validating our assessments 10 institutions (~2,500 students) in China 10 institutions (~2,500 students) in Russia Institutions in US (data collected by ETS) Pilot is Finished

3 stages of the project Stage 2) Baseline Stage (2015): Nationally representative (random) sample of 36 institutions (10K grade 1 and 3 CS and EE students) in China Nationally representative (random) sample of 34 institutions (5K grade 1 and 3 CS and EE students) in Russia Institutions in the US (data collected by ETS) Stage 3) Follow-up Stage (2016-2017): Same 36 institutions (10K grade 2 and 4 students) in China Same 34 institutions (5K grade 2 and 4 students) in Russia

3 stages of the project Stage 2) Baseline Stage (2015): Nationally representative (random) sample of 36 institutions (10K grade 1 and 3 CS and EE students) in China Nationally representative (random) sample of 34 institutions (5K grade 1 and 3 CS and EE students) in Russia Institutions in the US (data collected by ETS) Stage 3) Follow-up Stage (2016-2017): Same 36 institutions (10K grade 2 and 4 students) in China Same 34 institutions (5K grade 2 and 4 students) in Russia Baseline is finished

Outline of presentation Brief background of the study Test development Preliminary results

Developing assessment instruments Objective: to develop instruments that assess and compare EE and CS engineering students’ skill gains in mathematics and physics within and across countries. These instruments should have the following properties Be valid, reliable (have desirable psychometric properties), and fair Can be horizontally and vertically scaled to measure and compare skill levels and skill gains across and within countries One of the major challenges we faced was to ensure the cross-national equivalence of measurements.

Steps for the development of valid and cross-nationally comparable assessment instruments: Select comparable EE and CS majors across China, Russia, and the United States Select content and sub-content areas in math and physics (with experts) Collect and verify items (with experts) Conduct a small-scale pilot study Conduct a large pilot survey Conduct a psychometric analysis

1. Selected comparable EE and CS majors across China, Russia, and the United States Because EE/CS is divided into more specialized majors in China and Russia, we selected majors that have consistent coursework/curricula across universities within each country common core curricula between Russia and China substantial overlap in coursework/curricula with EE/CS majors in the United States

2. Selected content and sub-content areas in math and physics (with experts) We developed content maps in math and physics that contain: content areas taught in high school and in college in each country the relative weight of the content areas in each country’s national curriculum Interviewed 12 experts* in each country (6 profs from elite institutions, 6 from non-elite institutions) Experts adjusted content maps to reflect what EE/CS students learn in math & physics (by grade 1 & by grade 3)

3. Collected and verified items (with experts) Step 1: We collected test items that fairly reflected the content areas in the content map Step 2: Translation and back translation Step 3: To make sure the items were valid, relevant, clear, and of suitable difficulty, we interviewed the 12 experts from each country Step 4: We analysed the consistency of expert ratings (Cronbach’s Alpha, the correlations between the ratings, multifaceted Item Response Theory)

4. Conducted a small-scale pilot We checked the small-scale pilot tests for language ambiguity, formatting, etc… We did this by giving the pilot test to 40 grade 1 students 40 grade 3 students (in each country) Based on the results of the expert evaluations and the small-scale pilot study, the instruments were prepared for a large-scale pilot.

5. Large scale piloting The end of October 2014; 11 universities in China and 10 universities in Russia, both elite and non-elite universities, located both large and small cities across the country ; 1,797 students in China and 1,802 students in Russia; 45 MC items in each of 4 tests (math and physics for grades 1 and 3) Paper and pencil format; Two 55-minute sessions (one for math and one for physics, random order) We also gave students a short questionnaire asking them about their background (gender, rural-urban status, age etc.)

6. Ensuring psychometric quality and a common scale between grades and countries The dichotomous Rasch model (Wright and Stone, 1979) was used to conduct item analyses as well as tests of dimensionality and reliability Winsteps software (Linacre, 2011) Particular attention was paid to differential item functioning (DIF) to provide evidence concerning the cross-national comparability of the test results and to ascertain the possibility of creating a common scale between the two grades and across the two countries. DIF: when test participants with the same ability level who belong to different groups (e.g. gender, country) have varying chances to complete the item correctly.

Analytical approach: analyses Fit analysis Unweighted and weighted mean square statistics (Wright and Stone, 1979) DIF analysis t-statistic, MH method, LR method (Smith, 2004, 2011; Zwick et al., 1999; Zumbo &Thomas, 1996) Dimensionality analysis PCA of standardized residuals (Linacre, 1998; Smith, 2002; Ludlow, 1985) Reliability study Person reliability index, separation index (Wright and Stone, 1979; Smith, 2001) Linking of measures Simultaneous calibration and separate calibration (Wright & Bell, 1984; Wolfe, 2004)

Two stages of data psychometric analysis First stage The data for each grade were analyzed separately to discover whether it would be possible to construct a common scale across countries within each grade. Second stage The data for the two grades were analyzed simultaneously, using common items included in both grades as a link, to determine whether it would be possible to place on a common scale all the parameters for the two grades and for the two countries.

The grade 1 mathematics test: fit analysis 8 items of 45 were deleted (low discrimination and/or mis-fitting the model) For the rest of the analysis we consider the reduced set of 37 items for the grade 1 mathematics test

Cross-country DIF analysis (Math test, grade 1): ETS approach 24 items are DIF free 13 items with DIF : 7 items in favour of China, 6 items in favour of Russia This suggests the test is not entirely fair for different countries Solution: We used 24 DIF-free items for linking between the two countries, 13 items displaying DIF were split and considered as unique items for Russia and unique items for China

Psychometric analysis. Math test, grade 1, Russia+China The person reliability is 0.85 (classical reliability α = 0.83). The person separation index is 2.39 The test is essential unidimensional Other tests showed similar results

Summary of test development For each country and each grade we have constructed the tests. They are unidimensional, reliable, and fair. We have constructed common scales for both countries for grade 1 and for grade 3 We have constructed common scales for both grades for each country We have constructed a common scale for both countries and for both grades. Therefore, it gives us a basis for making international comparisons.

Outline of presentation Brief background of the study Test development Preliminary results

BASELINE Comparison of academic skills across countries (entering freshman) China China Russia Russia

BASELINE Comparison of academic skills across countries (juniors) China This is a lower bound estimate of the difference since Russia has dropouts (while China does not) and we can assume that dropouts are lower-achieving students Russia Russia

Comparison of academic skills across countries China’s entering freshman have MUCH higher levels of math/physics skills than Russia China’s juniors have somewhat higher levels of math/physics skills than Russia Our initial results also indicate that students in China make little progress in math/physics from their freshman to junior years. This is in contrast with students in Russia Consistent results between pilot and baseline

BASELINE Comparison of critical thinking skills across countries (grade 1 & 3) China China Russia Russia

How are we planning on using the assessment data to improve student learning? An example from China

What is the impact of faculty behavior on student learning in universities? One important question: If a professor spends more time on research, does this help or hurt student learning? On the one hand, it helps because research strengthens teaching On the other hand, it hurts because research takes time away from teaching

This is a topic of great interest … with no good answer Over 60 studies tried to tie faculty research to student learning Analyses from ALL studies are CORRELATIONAL—no one has looked at the CAUSAL relationship Only one study looks at impacts on achievement In fact, this gap in the literature is so stark that multiple editorials have been published in Science and Nature pointing to the need for better data to answer this question

We sought to estimate a causal impact How did we achieve this? We conducted a “quasi” experiment using data from China

We compare the learning outcomes of “twins” Another twin with professors that did NOT spend a lot of time on research One twin with professors that DID spend a lot of time on research

In fact, we took the analysis one step further… We also observed learning differences for the same student across these two situations (controlling for outside factors): Situation A: DID have professors who spent a lot of time on research Situation B: did NOT have professors who spent a lot of time on research

How do we measure professor research commitment? Two measures of faculty research commitment: Research time- The proportion of time the professor devotes to research out of all working hours Publication intensity-The number of academic publications (books, journal articles etc.) the professor publishes per year

RESEARCH TIME Take classes from professors that spend 35% of time on research

PUBLICATION INTENSITY Take classes from professors that publish 4 articles per year

In Sum, Our study found that professor research time has a negative impact on student learning in China This has major implications for reforming faculty incentives between teaching and research in Chinese universities

Next steps We are analyzing the baseline data and will have more to report soon At the same time, we are preparing for the follow-up assessment/survey and extending the study to other countries

Thank you!