Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discussion Measuring Social-Emotional Learning at Scale: Early Evidence from California’s CORE Districts Martin West, Harvard Graduate School of Education.

Similar presentations


Presentation on theme: "Discussion Measuring Social-Emotional Learning at Scale: Early Evidence from California’s CORE Districts Martin West, Harvard Graduate School of Education."— Presentation transcript:

1 Discussion Measuring Social-Emotional Learning at Scale: Early Evidence from California’s CORE Districts Martin West, Harvard Graduate School of Education Patrick Kyllonen Educational Testing Service Princeton, NJ Conference on Measuring and Assessing Skills 2017 Session: Self Reports or Modifications to Self Reports, 2 Research Network on the Determinants of Life Course Capabilities and Outcomes Center for Economics of Human Development, The University of Chicago March 2, 2017

2 CORE First use of noncognitive measurement for accountability in the U.S. Other countries are experimenting with noncognitive accountability measurement (e.g., Chile, Mexico, Brazil) CORE requirements Measurable (reliable, valid), Actionable (schools can improve), Meaningful (changing them will change life outcomes) Core measures—Why these? Self Management Growth Mindset Social Awareness Global Self Efficacy

3 High Stakes Low Stakes Advantages Disadvantages Disadvantages
More resources More attention to results by stakeholders and policy makers Positive consequences, e.g., Curriculum reform Disadvantages Negative consequences, “teaching to the test” Low Stakes Disadvantages Fewer resources Interest within the classroom, but less interest outside the classroom Advantages Minimizes negative consequences Still can affect school culture (anecdotes from teachers regarding the Mission Skills Assessment and the language used to describe kids, and the focus on noncognitive skill development)

4 Self Management Please answer how often you did the following during the past 30 days. During the past 30 days... Almost never Once in a while Some- times Often Almost all the time I came to class prepared. (C) I remembered and followed directions. I got my work done right away instead of waiting until the last minute. I paid attention, even when there were distractions. (C,-N) I worked independently with focus. I stayed calm even when others bothered or criticized me. (N) I allowed others to speak without interruption. (A, -E) I was polite to adults and peers. (A) I kept my temper in check (-N) Multidimensional scales—lower reliability, harder to interpret—is a low scoring student Neurotic or Unconscientious? How do I improve if I don’t have the correct diagnosis?

5 Growth Mindset Global Self-Efficacy
Please indicate how true each of the following statements is for you: Not at all true A little true Some- what true Mostly true Com- pletely true My intelligence is something that I can’t change very much. Challenging myself won’t make me any smarter. There are some things I am not capable of learning. If I am not naturally smart in a subject, I will never do well in it. Global Self-Efficacy How confident are you about the following at school? Not at all con- fident A little con- fident Some- what con- fident Mostly con- fident Com- pletely con- fident I can earn an A in my classes. I can do well on all my tests, even when they’re difficult. I can master the hardest topics in my classes. I can meet all the learning goals my teachers set. Are response categories correctly ordered (e.g., somewhat vs. mostly; a little vs. somewhat)? Do all students understand the order? Do differences between categories reflect differences in trait level (sum scores assume they do)

6 Social Awareness Please answer how often you did the following during the past 30 days. During the past 30 days... How carefully did you listen to other people's point of view? Not carefully at all Slightly carefully Somewhat carefully Quite carefully Extremely carefully How much did you care about other people's feelings? Did not Care at All Cared a little bit Cared Somewhat Cared quite a bit Cared a tremendous amount How often did you compliment others’ accomplishments? Almost never Once in a while Sometimes Often Almost all the time How well did you get along with students who are different from you? Did not get along at all Got along a little bit Got along somewhat Got along pretty well Got along extremely well How clearly were you able to describe your feelings? Not at all clearly Slightly clearly Somewhat clearly Quite clearly Extremely clearly When others disagreed with you, how respectful were you of their views? Not at all respectful Slightly respectful Somewhat respectful Quite respectful Extremely respectful To what extent were you able to stand up for yourself without putting others down? Not at all A little bit Somewhat Quite a bit A tremendous amount To what extent were you able to disagree with others without starting an argument? What does this mean? What would it mean to “improve” on this indicator, from “cared a little bit” to “cared somewhat”?

7 Items vs. scales Items Scales E.g., NAEP Clear what’s changing
But there is measurement error Scales E.g., PISA, CORE Reduces measurement error relative to the skill But it’s less clear what you’re measuring, particularly with violations of unidimensionality SOURCE: NAEP TH GRADE MATHEMATICS QUESTIONNAIRE

8 Single vs. Multiple Measurement Approaches
Economical Skill measurement confounded by measurement approach Multiple Takes more time Or harder to collect the data (e.g., administrative records) Better measure of the latent skill, in principle

9 Anchoring Vignettes Depend on the “quality” of the vignettes
They have to be understandable (cognitive ability issues) They have to be brief (motivation issues) Non-parametric scoring should distribute recoded scores evenly With > 1 vignettes, there should be minimal ties and order violations (violations are correlated with proficiency) Cronbach’s alpha is not a good reliability coefficient—it’s an overestimate--because the AVs have a built in dependency around the AV (M. von Davier) Can use test-retest, or a variance components approach using multiple vignettes Depend on the quality of the scale Unidimensional, reliable Ideally, should reduce reference group effects For reporting on growth, imagine: In practice (data from a Latin American school study), students’ vignette ratings change from 4th to 5th grade 4th grade 5th grade % students rating selves higher than a particular vignette 30% 50%

10 Conclusions Anchoring vignettes are still promising and can solve many problems (comparability, growth metric), but writing them is still an art Using multiple measures unconfounds trait and method effects Single items are sometimes useful for reporting (e.g., public opinion surveys) Whether the findings from a single item would be useful is a good diagnostic for whether the item should be administered (compare with cognitive items—often used in interpretation of scale scores) Unidimensionality is highly desirable for score interpretation! E.g., Differences between people, differences over time Attention should be given to scale categories (category order assumption [a little vs. somewhat], interval scale property for sum scores) High stakes and low stakes uses have pros and cons


Download ppt "Discussion Measuring Social-Emotional Learning at Scale: Early Evidence from California’s CORE Districts Martin West, Harvard Graduate School of Education."

Similar presentations


Ads by Google