Prashant Loyalka(Stanford University)

International Study of Learning in Higher Education: China, Russia, and the US
Prashant Loyalka(Stanford University) Elena Kardanova (Higher School of Economics, Moscow) With a host of great collaborators (Lydia et al.) AERA, 2016

Outline of presentation
Brief background of the study Test development Preliminary results

A major goal of university systems is to produce skilled graduates (Spellings, 2006; Alexander, 2000) Skilled graduates can contribute towards the productivity and innovation  higher economic growth (Goldin and Katz, 2008; Autor et al., 2003; Bresnahan et al., 2002; Bresnahan, 1999; Katz and Krueger, 1998) Failing to produce skilled graduates may hinder the capacity of countries to compete in the global knowledge economy  stifle growth (Hanushek and Woessman, 2012; Hanushek and Woessman, 2008)

What are the skills (competencies) that students are supposed to have learned during university?
Academic skills such as math, science, language, major-specific skills (Pascarella & Terenzini, 2004) Higher order thinking skills such as critical thinking—perceived by US colleges and employers among the most important skills for college graduates to become effective contributors in the global workforce (ETS, 2013; AAC&U, 2011; Casner-Lotto and Barrington, 2006)

Although there is high and increasing interest from researchers and policymakers, few studies have examined whether students are learning these skills during university A couple of US studies show students make modest gains in academic and higher order thinking skills (Pascarella et al., 2011; Arum and Roska, 2011) There are very few international comparison studies (with limited representativeness) (Zlatkin-Troitschanskaia et al., 2014)

Our project - 2 main goals:
1) Assess and compare university student skills (levels and gains) within and across countries 2) Examine which factors help students develop skills

How to fulfill these goals
How to fulfill these goals? Examining students in three of the world’s largest economies: RUSSIA USA CHINA

We are in the process of extending the study to other countries
How to meet goals? Examining students in three of the world’s largest economies: RUSSIA JAPAN, KOREA USA CHINA INDIA BRAZIL SOUTH AFRICA We are in the process of extending the study to other countries

More specifically… Randomly select universities (and classes/students within each university) in each country – so the sample is nationally representative Focus on engineering majors (CS and EE) Assess skills over time academic skills (math, physics, basic computing) major-specific skills (e.g. computer science) higher order skills (ETS critical thinking & quantitative literacy) Survey students, professors, administrators Use quasi-experimental methods to examine factors leading to skill gains

3 stages of the project Pilot is Finished
Stage 1) Pilot Stage (Fall, 2014): Developing and validating our assessments 10 institutions (~2,500 students) in China 10 institutions (~2,500 students) in Russia Institutions in US (data collected by ETS) Pilot is Finished

3 stages of the project Stage 2) Baseline Stage (2015):
Nationally representative (random) sample of 36 institutions (10K grade 1 and 3 CS and EE students) in China Nationally representative (random) sample of 34 institutions (5K grade 1 and 3 CS and EE students) in Russia Institutions in the US (data collected by ETS) Stage 3) Follow-up Stage ( ): Same 36 institutions (10K grade 2 and 4 students) in China Same 34 institutions (5K grade 2 and 4 students) in Russia

3 stages of the project Stage 2) Baseline Stage (2015):
Nationally representative (random) sample of 36 institutions (10K grade 1 and 3 CS and EE students) in China Nationally representative (random) sample of 34 institutions (5K grade 1 and 3 CS and EE students) in Russia Institutions in the US (data collected by ETS) Stage 3) Follow-up Stage ( ): Same 36 institutions (10K grade 2 and 4 students) in China Same 34 institutions (5K grade 2 and 4 students) in Russia Baseline is finished

Developing assessment instruments
Objective: to develop instruments that assess and compare EE and CS engineering students’ skill gains in mathematics and physics within and across countries. These instruments should have the following properties Be valid, reliable (have desirable psychometric properties), and fair Can be horizontally and vertically scaled to measure and compare skill levels and skill gains across and within countries One of the major challenges we faced was to ensure the cross-national equivalence of measurements.

Steps for the development of valid and cross-nationally comparable assessment instruments:
Select comparable EE and CS majors across China, Russia, and the United States Select content and sub-content areas in math and physics (with experts) Collect and verify items (with experts) Conduct a small-scale pilot study Conduct a large pilot survey Conduct a psychometric analysis

1. Selected comparable EE and CS majors across China, Russia, and the United States
Because EE/CS is divided into more specialized majors in China and Russia, we selected majors that have consistent coursework/curricula across universities within each country common core curricula between Russia and China substantial overlap in coursework/curricula with EE/CS majors in the United States

2. Selected content and sub-content areas in math and physics (with experts)
We developed content maps in math and physics that contain: content areas taught in high school and in college in each country the relative weight of the content areas in each country’s national curriculum Interviewed 12 experts* in each country (6 profs from elite institutions, 6 from non-elite institutions) Experts adjusted content maps to reflect what EE/CS students learn in math & physics (by grade 1 & by grade 3)

3. Collected and verified items (with experts)
Step 1: We collected test items that fairly reflected the content areas in the content map Step 2: Translation and back translation Step 3: To make sure the items were valid, relevant, clear, and of suitable difficulty, we interviewed the 12 experts from each country Step 4: We analysed the consistency of expert ratings (Cronbach’s Alpha, the correlations between the ratings, multifaceted Item Response Theory)

4. Conducted a small-scale pilot
We checked the small-scale pilot tests for language ambiguity, formatting, etc… We did this by giving the pilot test to 40 grade 1 students 40 grade 3 students (in each country) Based on the results of the expert evaluations and the small-scale pilot study, the instruments were prepared for a large-scale pilot.

5. Large scale piloting The end of October 2014;
11 universities in China and 10 universities in Russia, both elite and non-elite universities, located both large and small cities across the country ; 1,797 students in China and 1,802 students in Russia; 45 MC items in each of 4 tests (math and physics for grades 1 and 3) Paper and pencil format; Two 55-minute sessions (one for math and one for physics, random order) We also gave students a short questionnaire asking them about their background (gender, rural-urban status, age etc.)

6. Ensuring psychometric quality and a common scale between grades and countries
The dichotomous Rasch model (Wright and Stone, 1979) was used to conduct item analyses as well as tests of dimensionality and reliability Winsteps software (Linacre, 2011) Particular attention was paid to differential item functioning (DIF) to provide evidence concerning the cross-national comparability of the test results and to ascertain the possibility of creating a common scale between the two grades and across the two countries. DIF: when test participants with the same ability level who belong to different groups (e.g. gender, country) have varying chances to complete the item correctly.

Analytical approach: analyses
Fit analysis Unweighted and weighted mean square statistics (Wright and Stone, 1979) DIF analysis t-statistic, MH method, LR method (Smith, 2004, 2011; Zwick et al., 1999; Zumbo &Thomas, 1996) Dimensionality analysis PCA of standardized residuals (Linacre, 1998; Smith, 2002; Ludlow, 1985) Reliability study Person reliability index, separation index (Wright and Stone, 1979; Smith, 2001) Linking of measures Simultaneous calibration and separate calibration (Wright & Bell, 1984; Wolfe, 2004)

Two stages of data psychometric analysis
First stage The data for each grade were analyzed separately to discover whether it would be possible to construct a common scale across countries within each grade. Second stage The data for the two grades were analyzed simultaneously, using common items included in both grades as a link, to determine whether it would be possible to place on a common scale all the parameters for the two grades and for the two countries.

The grade 1 mathematics test: fit analysis
8 items of 45 were deleted (low discrimination and/or mis-fitting the model) For the rest of the analysis we consider the reduced set of 37 items for the grade 1 mathematics test

Cross-country DIF analysis (Math test, grade 1): ETS approach
24 items are DIF free 13 items with DIF : 7 items in favour of China, 6 items in favour of Russia This suggests the test is not entirely fair for different countries Solution: We used 24 DIF-free items for linking between the two countries, 13 items displaying DIF were split and considered as unique items for Russia and unique items for China

Psychometric analysis. Math test, grade 1, Russia+China
The person reliability is 0.85 (classical reliability α = 0.83). The person separation index is 2.39 The test is essential unidimensional Other tests showed similar results

Summary of test development
For each country and each grade we have constructed the tests. They are unidimensional, reliable, and fair. We have constructed common scales for both countries for grade 1 and for grade 3 We have constructed common scales for both grades for each country We have constructed a common scale for both countries and for both grades. Therefore, it gives us a basis for making international comparisons.

BASELINE Comparison of academic skills across countries (entering freshman)
China China Russia Russia

BASELINE Comparison of academic skills across countries (juniors)
China This is a lower bound estimate of the difference since Russia has dropouts (while China does not) and we can assume that dropouts are lower-achieving students Russia Russia

Comparison of academic skills across countries
China’s entering freshman have MUCH higher levels of math/physics skills than Russia China’s juniors have somewhat higher levels of math/physics skills than Russia Our initial results also indicate that students in China make little progress in math/physics from their freshman to junior years. This is in contrast with students in Russia Consistent results between pilot and baseline

BASELINE Comparison of critical thinking skills across countries (grade 1 & 3)
China China Russia Russia

How are we planning on using the assessment data to improve student learning? An example from China

What is the impact of faculty behavior on student learning in universities?
One important question: If a professor spends more time on research, does this help or hurt student learning? On the one hand, it helps because research strengthens teaching On the other hand, it hurts because research takes time away from teaching

This is a topic of great interest … with no good answer
Over 60 studies tried to tie faculty research to student learning Analyses from ALL studies are CORRELATIONAL—no one has looked at the CAUSAL relationship Only one study looks at impacts on achievement In fact, this gap in the literature is so stark that multiple editorials have been published in Science and Nature pointing to the need for better data to answer this question

We sought to estimate a causal impact
How did we achieve this? We conducted a “quasi” experiment using data from China

We compare the learning outcomes of “twins”
Another twin with professors that did NOT spend a lot of time on research One twin with professors that DID spend a lot of time on research

In fact, we took the analysis one step further…
We also observed learning differences for the same student across these two situations (controlling for outside factors): Situation A: DID have professors who spent a lot of time on research Situation B: did NOT have professors who spent a lot of time on research

How do we measure professor research commitment?
Two measures of faculty research commitment: Research time- The proportion of time the professor devotes to research out of all working hours Publication intensity-The number of academic publications (books, journal articles etc.) the professor publishes per year

RESEARCH TIME Take classes from professors that spend 35% of time on research

PUBLICATION INTENSITY
Take classes from professors that publish 4 articles per year

In Sum, Our study found that professor research time has a negative impact on student learning in China This has major implications for reforming faculty incentives between teaching and research in Chinese universities

Next steps We are analyzing the baseline data and will have more to report soon At the same time, we are preparing for the follow-up assessment/survey and extending the study to other countries

Thank you!

Prashant Loyalka(Stanford University)

Similar presentations

Presentation on theme: "Prashant Loyalka(Stanford University)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Prashant Loyalka(Stanford University)

Similar presentations

Presentation on theme: "Prashant Loyalka(Stanford University)"— Presentation transcript:

Similar presentations

About project

Feedback