The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt.

The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Executive Summary Purpose of Briefing: To provide an overview of recent research conducted on the new SAT. Research and Analysis designed to meet three demands: 1. Provide baseline data concerning score use and other characteristics of the new SAT compared to old SAT 2. Respond to questions from stakeholders concerning the new SAT 3. Develop a base of new knowledge on the Writing test and impact of changes to the SAT

Research to Date on the New SAT Score Change for PSAT/NMSQT Test Takers Construct Comparability and Continuity in the SAT Essay Reliability Effect of New SAT Length on Performance (Fatigue) Consequences of adding writing to K-12 instruction Impact of Taking Advanced Math Courses on New SAT Math Items Standardized Differences on Ethnic Subgroups Relationship Between Essay Features and Essay Scores Effects of Short-Term Coaching on Writing Test performance Discrepant Scores between CR and Writing

Score Change for PSAT/NMSQT (P/N) Test Takers (Oh, Wright, & Zanna, 2005) Analyzed score changes and repeater patterns for P/N; results used to develop table of expected SAT score ranges for P/N Score Report Plus. Based on test-takers who took both 2003 and 2004 P/N, and those who took both 2004 P/N and spring 2005 (March, May or June) SAT

P/N Score Change Study Highlights On average, 2003 sophomores repeating the P/N as juniors improved their reading score by 3.3 points, their math score by about 4.4 points, and their writing score by about 4.1 points. On average, 2004 juniors taking the P/N received junior-year SAT scores that were 2.5 points higher in reading, 1.9 points higher in math, and 1.4 and 1.3 points higher in writing (MC and composite, respectively). The correlations between the old (2003) and new (2004) P/N scores ranged from.82 to.86 for the three subtests. The correlations between the new P/N and the new SAT ranged between.81 and.87.

P/N Score Change

Construct Comparability and Continuity in the SAT (Oh & Sathy, 2006) Study assessed whether the changes to the SAT had an impact on the constructs measured by the test. Results are based on factor analysis of data from a sample of students taking both the previous version and new version of the SAT during the 2003 field trial.

Highlights of Results from Construct Comparability Study Critical Reading Exploratory Factor Analysis revealed at least 2 distinct factors, one comprising items related to sentence completion and analogy items and one comprising critical reading and passage-based reading items. This finding suggests that the construct continuity for the sentence completion item type and passage-based reading/critical reading item types are maintained in the new SAT. Results from a 2-factor model without analogy items provided best fit to the data.

Highlights of Results from Construct Comparability Study, Continued Math Results suggested that the new math test is essentially unidimensional, as was the previous version. Tests of dimensionality revealed a small yet statistically reliable secondary factor related to geometry items in both the old and new SAT.

Score Equity Assessment of Transition from SAT I to the new SAT (Dorans, Cahn, Jiang, & Liu, 2006) Study assessed whether the changes in the College-Bound Senior 2006 means were due to population shifts or to changes to the SAT Using operational data from the first year of new SAT administration, Score Equity Assessment was used to estimate what the subgroup means would have been had the SAT not changed.

Highlights of Results from Score Equity Assessment Study, Continued Linkages between the new SAT and the old were examined for population invariance across gender groups. Results suggested that the equating functions were invariant across gender groups, providing support for the comparability of scores from the old SAT to the new SAT.

Essay Reliability within the SAT Reasoning Test (Allspach & Walker, 2005) Study designed to estimate various forms of reliability associated with the SAT essay. 3,776 juniors from 35 high schools participated in the study. Four different essay prompts used. Students wrote on two different essay prompts at two different times, about 2 weeks apart. Essays were read by raters trained similarly as in operational SAT essay readings.

Essay Reliability Study Type of reliability estimates: Single-rater (inter-rater) reliability – Correlation between observed scores from 2 raters scoring the same essay. Represents consistency of any given rater in scoring an essay. Double-rater reliability – Correlation between total essay scores from two pairs of raters scoring the same essay. Represents consistency in scoring method itself when 2 raters are used. Observed essay reliability – Correlation between examinees’ total scores on 2 different essays. Represents proportion of true (writing ability) in essay score.

Highlights of Results from Essay Reliability Study The average single-rater reliability coefficients across the 4 prompts was approximately.79. The average double- rater reliability was about.88. The average observed essay reliability was about.67 70% of scores between 6-8; 80% of scores between 6-9 Reader agreement: 56% exact 96.5% +/- 1 pt 3.5% > +/- 2 pts (go to third reader)

Investigating the Effect of New SAT Test Length on the Performance of Regular SAT Examinees (Wang, 2006) Using the data from the March 2005 SAT administration, a recent study examined test-taker performance on eight SAT sections which were presented to examinees in different orders and in different positions. The study looked at the average percent of items answered correctly and the average number of items omitted for different sections of the test.

If the increased length of the SAT caused test-taker fatigue, we would expect: The percent of items answered correctly to decrease for the later sections of the test, when the students would be feeling fatigue. The students’ omit rates to increase for later sections of the test.

The average percent of items correct was consistent throughout the entire test: The results were similar for gender, racial/ethnic, and language groups, and for different levels of ability as measured by total SAT score.

The average omit rate was NOT higher at the end of the test: The average omit rate for the last 6 items was also NOT higher at the end of the test:

Summary of Fatigue Study Findings: Study conducted on March 05 SAT and replicated on Oct 06 administration. Results also compared to SAT I and no changes were detected. On average, students got the same percent of items correct on later sections of the test as on earlier sections. On average, students did not omit a larger number of items on later sections of the test. These findings provide evidence that any fatigue that students may have felt did not impair their performance in any way.

The Impact of Taking Advanced Math Courses on Performance on the New SAT Math Items (Deng & Kobrin, 2006) Evaluated whether taking more advanced math courses in high school gives students an advantage on the new SAT items testing Algebra II content. Study analyzed new SAT field trial data. Standardized mean differences on average item performance for the old and new content across groups of students with various course-taking patterns. DIF analyses to explore whether items functioned similarly for students of equal ability with different course taking patterns.

Math Course-taking Study: Summary of Results Students who took one or more advanced courses scored higher than those who did not take any advanced course or just planned to do so. Students who planned to take one or more advanced courses scored higher than those who did not plan to take any advanced course. Items measuring the new content were more sensitive to the effects of taking advanced math courses than items that measure the old content. Several sub-content areas within Algebra II and Geometry had large percentage of items showing DIF.

The Relationship Between Essay Features and Essay Scores (Kobrin, Deng, & Shaw) This study investigated: the relationship between several features of SAT essay responses and essay scores. whether essay scores are predictable from features of the prompt. subgroup differences (racial/ethnic, gender, and language) in the frequency of essay response features and their correlation with essay scores.

Essay Research Study—Phase I Phase I focused on essay length and scores 2,820 essays were sampled from 6 different SAT forms (both east & west coast prompts) that were administered in March, May, & June of ’05. Examined the relationship between essay score and: number of words number of paragraphs whether students reached the 2 nd page whether students wrote in first-person (used the pronoun “I”).

Phase I Results: Correlation of Length with Essay and SAT-W Scores (Kobrin, Deng & Shaw, Under Review) The range of correlations with essay scores across the six prompts was.57 to.68 for number of words and.27 to.38 for number of paragraphs.

More Phase I Results Reaching the Second Page Students who reached the second page scored about 2 pts higher than those who did not. After controlling for # of words, this was reduced to less than one pt (.7). Using First-Person About 50% of students used first-person. The mean score for students using first-person was 6.9 compared to 7.3 for students not using first-person. There was substantial variation across prompts in the use of first-person responses. Some prompts appeared more conducive to a first-person response than others, but the voice used appeared to have very little impact on essay score.

Effects of Short-Term Coaching on Standardized Writing Tests (Hardison & Sackett, 2006) Can coaching increase scores on the SAT essay? Does that coaching increase scores only on the specific essay, or does it also increase the test-taker’s actual writing ability that the test is intended to measure?

Methods for Short-Term Coaching Study Six Ph.D. students were hired to develop coaching strategies for a training program, similar to those offered by test-prep companies. 50 first-year college students participated in 9 hour training program (training group); 49 students did not receive training (control group). Both groups completed pretest and posttest essays from CLEP. Participants also completed two additional essays developed to mimic writing tasks that a student might encounter in a college setting.

Results of Short-Term Coaching Study After controlling for ability (using ACT scores), students receiving training did indeed score significantly higher on essay. Coaching was particularly effective for those with lower writing performance, but actually led to a decrease in scores for high-performers. Coaching also produced significant improvement in performance on the generalizability tests when compared to the control group. Results suggest that SAT essays may be susceptible to coaching, but score inflation may reflect at least some improvement in overall writing ability.

Students with discrepant CR and W scores Correlation between SAT CR and W about.84 100,000 students had a significant discrepancy between scores. Of these 50% had a CR score that was 1 SD > than W (63% male); 50% had a W score 1 SD > CR (63% female). No significant difference among students in HSGPA. Results by ethnicity and best language not significant: Whites > CR; Asians > Writing English Speakers > CR; ELL > Writing

Update on New SAT Scores: College Bound Seniors 2006

What comes next? Research planned or in progress… Impact of SES on SAT & College Success Validity Study of the SAT Reasoning Test Consequential Validity of SAT Writing New SAT/ACT Concordance Evaluating Formula Scoring vs Right Scoring Placement validity of Math and Writing tests

Comparison of 2006 College-Bound Seniors with Previous Cohorts Sub-group200420052006 Gender: Male 47 46 Female 53 54 Race/Ethnicity: No Response 19109 American Indian or Alaskan Native 111 Asian or Pacific Islander 899 Black 10 Mexican or Mexican American 454 Puerto Rican 111 Other Hispanic or Latino 345 White 5156 Other 344 Best Language: No Response 1676 English 748283 English and Another 788 Another Language 233

2004, 2005, & 2006 College-Bound Seniors 200420052006 ‘04 to ’06 Changes ‘05 to ’06 Changes Highest Verbal 515.2516.2 511.8-3.4-4.4 Highest Math 525.7527.2 525.8 0.1-1.4 Highest Composite 1040.91043.41037.6-3.3-5.8 Latest Verbal 507.9508.4503.5 -4.4-4.9 Latest Math 518.0519.8517.8 -0.2-2.0 Latest Composite 1025.91028.21021.3-4.6-6.9 Highest Composite (single admin) 1035.41037.81031.9-3.5-5.9

Major Changes in College-Bound Seniors Cohort Scores 1973 -10 pts (7V, 3 M) 1975 -16 pts (9V, 7M) 1985 +8 pts (5V, 3M) 1990 -5 pts (4V, 1M) 1995 +7 pts (5V, 2M) 2003 +6 pts (3V, 3M) Math has not dropped 2 pts in 1 year since 1978 The last time Verbal dropped more than 1 pt was: 2002 -2 pts 1990 -4pts

20042005200604-06 Diff05-06 Diff 1-time test takers N 636,655645,629682,00545,35036,376 % 44.943.846.51.62.7 Mean Verbal 485.3485.4480.3-5.0-5.1 Mean Math 490.3490.8487.6-2.7-3.2 2-time test-takers N 542,589563,028545,1732,584-17,855 % 38.2 37.2 Mean Verbal 526.4527.1521.0-5.4-6.1 Mean Math 535.8537.9535.6-0.2-2.3 3-time test-takers N 195,215216,883187,194-8,021-29,689 % 13.814.712.8-1.9 Mean Verbal 528.0527.4531.13.13.7 Mean Math 551.4551.9561.510.19.6 Retesting Patterns and Scores

Overall Retesting Changes on SAT 2004-06 04050605-0406-0506-04 Total Students 1,429,00 7 1,475,62 3 1,464,74 4 46,616 (3.3%) -10,879 (-.7%) 35,737 (2.5%) Total Tests* 2,492,68 3 2,630,38 8 2,547,36 7 +137,70 5 (5.5%) -83,021 (-3.2%) +54,684 (2.2%)

CB Srs Score Changes and Subgroups CR -5 (males -8, females -3) – largest drop since 1994 Math -2 (males -2, females -2) Underrepresented minorities show overall gains: Income < $20k CR +2, M +1 Non English Speaking CR +5, M +2 Private school CR-11, M-4 Non-Response rate (no change in %) but large decrease in scores Score gaps decrease In CR among all ethnic minorities except Other Hispanic (no change) In Math for Asian, Black, and Puerto Rican subgroups Score decline evidence in first SAT taken First SAT in 05 (CR 498, M 507.9); in 06 (CR 494.8, M 508.5) 06-05 (CR- 3.2, M+.6) No difference in age testing between first time test takers (mean 17.2 yrs) HSGPA increases in 06.03 increase in GPA from 05 to 06; mean is 3.33 (with 43% of students having a HSGPA >A-)

Ethnic Differences Slightly Reduced in CR (Effect sizes) Group200420052006 Critical ReadingWriting Asian-.01.03.06.14 Black-.7-.66-.61-.63 Mexican Am. -.51-.49-.43-.41 Puerto Rican -.46-.42-.39-.45 Latin Am.-.42-.4 -.43 White.18.21.2 Females-.07 -.03.1

Ethnic Differences Slightly Reduced in Math (Effect Sizes) Group200420052006 Asian.52 Black-.8-.77 Mexican Am.-.53-.5-.46 Puerto Rican -.58-.55-.54 Latin Am.-.46-.44-.48 White.11.14.16 Females-.32-.3

Students who take a Core Curriculum or More Significantly outperform those taking less than a Core Curriculum Number of Students SAT Scores

Core vs Non-Core Core = 4 yrs of English, 3 yrs of Math (with Algebra), 3 yrs of Science, 3 yrs of Social Studies. 2005 N (%) CRM2006 N (%) CRM06-05CR (06- 05) M (06- 05) Core + 909,049 (77.3) 522530903,452 (77.0) 519531-5,597 (-0.3) -3+1 Core - 267,278 (22.7) 476488270,728 (23.0) 470483+3,450 (+0.3) -6-5

The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt.

Similar presentations

Presentation on theme: "The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt.

Similar presentations

Presentation on theme: "The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt."— Presentation transcript:

Similar presentations

About project

Feedback