Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at

Slides:

Advertisements

Similar presentations

Performance Assessment

Advertisements

Addressing the Testing Challenge with a Web-Based E-Assessment System that Tutors as it Assesses Mingyu Feng, Worcester Polytechnic Institute (WPI) Neil.

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 23, 2012.

Bridgette Parsons Megan Tarter Eva Millan, Tomasz Loboda, Jose Luis Perez-de-la-Cruz Bayesian Networks for Student Model Engineering.

The Q-matrix method: A new artificial intelligence tool for data mining Dr. Tiffany Barnes Kennedy 213, PhD - North Carolina State University.

1 Some issues and applications in cognitive diagnosis and educational data mining Brian W. Junker Department of Statistics Carnegie Mellon University

Educational data mining overview & Introduction to Exploratory Data Analysis Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction.

Knowledge Inference: Advanced BKT Week 4 Video 5.

Improving learning by improving the cognitive model: A data- driven approach Cen, H., Koedinger, K., Junker, B. Learning Factors Analysis - A General Method.

Brian Junker Carnegie Mellon 2006 MSDE / MARCES Conference 1 Using On-line Tutoring Records to Predict End-of-Year Exam Scores Experience with the Assistments.

Modeling Student Knowledge Using Bayesian Networks to Predict Student Performance By Zach Pardos, Neil Heffernan, Brigham Anderson and Cristina Heffernan.

Supporting (aspects of) self- directed learning with Cognitive Tutors Ken Koedinger CMU Director of Pittsburgh Science of Learning Center Human-Computer.

Effective Skill Assessment Using Expectation Maximization in a Multi Network Temporal Bayesian Network By Zach Pardos, Advisors: Neil Heffernan, Carolina.

Scaling up a Web-Based Intelligent Tutoring System Jozsef Patvarczki, Shane Almeida, and Neil Heffernan Computer Science Department Our research team has.

Computer Science Department Jeff Johns Autonomous Learning Laboratory A Dynamic Mixture Model to Detect Student Motivation and Proficiency Beverly Woolf.

The ASSISTment Project Trying to Reduce Bottom-out hinting: Will telling students how many hints they have left help? By Yu Guo, Joseph E. Beck& Neil T.

Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d.

MCAS-Alt: Alternate Assessment in Massachusetts Technical Challenges and Approaches to Validity Daniel J. Wiener, Administrator of Inclusive Assessment.

Conclusion Our prediction model did a good job at predict 8 th grade math proficiency. It can be used to estimate 10 th grade score fairly well, too. But.

On-demand learning-embedded benchmark assessment using classroom-accessible technology Discussant Remarks: Mark Wilson UC, Berkeley.

Searching for Patterns: Sean Early PSLC Summer School 2007 Question: Which is a better predictor of performance in a cognitive tutor, error rate or assistance.

Sept. 29 th, 2005 Investigating Learning over Time Mingyu Feng Neil Heffernan Longitudinal Analysis on Assistment Data.

Using Mixed-Effects Modeling to Compare Different Grain-Sized Skill Models Mingyu Feng, Worcester Polytechnic Institute Neil T. Heffernan, Worcester Polytechnic.

+ Doing More with Less : Student Modeling and Performance Prediction with Reduced Content Models Yun Huang, University of Pittsburgh Yanbo Xu, Carnegie.

A Value-Based Approach for Quantifying Scientific Problem Solving Effectiveness Within and Across Educational Systems Ron Stevens, Ph.D. IMMEX Project.

Educational data mining overview & Introduction to Exploratory Data Analysis with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer.

Learning Goals, Scales and Learning Activities

Worcester Polytechnic Institute Towards Assessing Students’ Fine Grained Knowledge: Using an Intelligent Tutor for Assessing Mingyu Feng August 18 th,

Determining the Significance of Item Order In Randomized Problem Sets Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of.

Integrating Assessment with Instruction: A Look Forward Ken Koedinger Pittsburgh Science of Learning Center Human-Computer Interaction and Psychology.

COPYRIGHT WESTED, 2010 Calipers II: Using Simulations to Assess Complex Science Learning Diagnostic Assessments Panel DRK-12 PI Meeting - Dec 1–3, 2010.

PSLC DataShop Introduction Slides current to DataShop version John Stamper DataShop Technical Director.

Office of Institutional Research, Planning and Assessment January 24, 2011 UNDERSTANDING THE DIAGNOSTIC GUIDE.

Assessing Students’ Performance Longitudinally: Item Difficulty Parameter vs. Skill Learning Tracking Mingyu Feng, Worcester Polytechnic Institute Neil.

Case Study – San Pedro Week 1, Video 6. Case Study of Classification  San Pedro, M.O.Z., Baker, R.S.J.d., Bowers, A.J., Heffernan, N.T. (2013) Predicting.

Bayesian Hierarchical Models of Individual Differences in Skill Acquisition Dr Jeromy Anglim Deakin University 22 nd May 2015.

The Formative Assessment Cycle Solve a selection of problems of a given skill Analysis Students are instantly told if their answers on ASSISTment are correct.

How useful are fraction bars for understanding fraction equivalence and addition? A difficulty factors assessment with 5 th, 6 th, and 7 th graders Eliane.

Intelligent Tutoring System for CS-I and II Laboratory Middle Tennessee State University J. Yoo, C. Pettey, S. Yoo J. Hankins, C. Li, S. Seo Supported.

Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 2, 2012.

Diagnostics Mathematics Assessments: Main Ideas  Now typically assess the knowledge and skill on the subsets of the 10 standards specified by the National.

Using handheld computers to support the collection and use of reading assessment data Naomi Hupert.

Evidence-based Practice Chapter 3 Ken Koedinger Based on slides from Ruth Clark 1.

Educational Data Mining: Discovery with Models Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Ken Koedinger CMU Director of PSLC Professor of Human-Computer.

LANSING, MI APRIL 11, 2011 Title IIA(3) Technical Assistance #2.

Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 February 4, 2013.

This material is approved for public release. Distribution is limited by the Software Engineering Institute to attendees. Sponsored by the U.S. Department.

Applying the Redundancy Principle ( Chapter 7) And using e-learning data for CTA Ken Koedinger 1.

Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University.

Core Methods in Educational Data Mining HUDK4050 Fall 2015.

Core Methods in Educational Data Mining HUDK4050 Fall 2015.

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 25, 2012.

Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 6, 2012.

Data-Driven Education

Michael V. Yudelson Carnegie Mellon University

How to interact with the system?

Special Topics in Educational Data Mining

Create a Strong Start ACT® Aspire ®.

Using Bayesian Networks to Predict Test Scores

Mingyu Feng Neil Heffernan Joseph Beck

Detecting the Learning Value of Items In a Randomized Problem Set

Addressing the Assessing Challenge with the ASSISTment System

Shasta County Curriculum Leads November 14, 2014 Mary Tribbey Senior Assessment Fellow Interim Assessments Welcome and thank you for your interest.

The Behavior of Tutoring Systems

Neil T. Heffernan, Joseph E. Beck & Kenneth R. Koedinger

How to interact with the system?

Educational Data Mining Success Stories

Core Methods in Educational Data Mining

Presentation transcript:

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 2 Uncertainty, Prediction and Teacher Feedback using an Online System that Teaches as it Assesses Brian W. Junker Thanks to Neil Heffernan, Ken Koedinger, Mingyu Feng, Beth Ayers, Nathaniel Anozie, Zach Pardos, and many others Funding from US Department of Education, National Science Foundation (NSF), Office of Naval Research, Spencer Foundation, and the US Army

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 3 The ASSISTments Project Web-based 8 th grade mathematics tutoring system ASSIST with, and ASSESS, progress toward Massachusetts Comprehensive Assessment System Exam (MCAS) –Guide students through problem solving with MCAS released items –Predict students’ MCAS scores at end of year –Provide feedback to teachers (what to teach next?) (Generalize to other States…) Over 50 workers at Carnegie Mellon, Worcester Polytechnic Institute, Carnegie Learning, Worcester Public Schools

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 4 The ASSISTment Tutor Main Items: Released MCAS or “morphs” Incorrect Main  “Scaffold” Items –“One-step” breakdowns of main task –Buggy feedback, hints on request, etc. All items coded by transfer model (Q-matrix) for knowledge components (KC’s) Student records contain responses, timing data, bugs/hints, etc. System tracks students through time, provides teacher reports per student & per class. –Predict MCAS Scores –KC Feedback: learned/not- learned, etc.

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 5 This talk draws on two recent reports Predicting MCAS Scores – Summary/Review: Junker, B. W. (2006)."Using on-line tutoring records to predict end-of-year exam scores: experience with the ASSISTments project and MCAS 8th grade mathematics". To appear in Lissitz, R. W. (Ed.), Assessing and modeling cognitive development in school: intellectual growth and standard settings. Maple Grove, MN: JAM Press. KC Feedback – Some Current Progress: Anozie, N. & Junker, B. W. (2007). "Investigating the utility of a conjunctive model in Q matrix assessment using monthly student records in an online tutoring system". Paper to be presented at the Annual Meeting of the National Council on Measurement in Education, April 12, Chicago IL (K4; Thursday 8:15-10:15 Intercontinental Seville East). (These and all papers for this session are available at

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 6 Challenges: Predicting MCAS The exact content of the MCAS exam is not known until months after it is given The ASSISTments themselves are ongoing throughout the school year as students learn (from teachers, from ASSISTment interactions, etc.).

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 7 Methods: Predicting MCAS Regression approaches [Feng et al, 2006; Anozie & Junker, 2006; Ayers & Junker, 2006/2007]: –Percent Correct on Main Questions –Percent Correct on Scaffold Questions –Rasch proficiency on Main Questions –Online metrics (efficiency and help-seeking; e.g. Campione et al., 1985; Grigorenko & Sternberg, 1998) –Both end-of-year and “month-by-month” models Bayes Net (DINA Model) approaches: –Predicting KC-coded MCAS questions from Bayes Nets (DINA model) applied to ASSISTments [Pardos, et al., 2006]; –Regression on number of KC’s mastered in DINA model [Anozie 2006] HLM-style growth curve models –At the KC level [Feng, Heffernan & Koedinger, 2006] –At the total score level [Feng, Heffernan, Mani & Heffernan, 2006]

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 8 Results: Predicting MCAS PredictorsdfCV-MADCV-RMSERemarks PctCorrMain months, main questions only #Skills of 77 learned (DINA) months, mains and scaffolds Rasch Proficiency months, main questions only PctCorrMain + 4 metrics months; 5 summaries each month Rasch Profic + 5 metrics months, main questions only Feng et al. (in press) estimate best-possible (11% of 54pt raw score) from split-half experiments with MCAS Ayers & Junker (2007) reliability calculation suggests approximate bounds 1.05· MAD · 6.53.

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 9 Conclusions: Predicting MCAS Tradeoff: –Greater model complexity (DINA) can help [Pardos et al, 2006; Anozie, 2006]; –Accounting for question difficulty (Rasch), plus online metrics, does as well [Ayers & Junker, 2007] Limits of what we can accomplish for prediction –MCAS reliability  ¼ 0.91 –Typical ASSISTments  ¼ 0.81 –If ASSISTments were perfectly reliable, approx. bound on MAD would be cut in half (3.40)

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 10 Goal: KC Feedback Providing feedback on –individual students –groups of students Current teacher report: For each skill, report percent correct on all items for which that skill is hardest. Can we do better?

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 11 Challenges: KC Feedback Different transfer models are used and expected by different stakeholders: –The MCAS itself is scaled using unidimensional IRT / Pct Correct –Description and design of the MCAS is based on Five-strand model of mathematics (Number & Operations, Algebra, Geometry, Measurement, Data Analysis & Probability) 39 “learning standards” nested within the five strands. –ASSISTment researchers have developed a transfer model involving up to 106 KC’s (WPI-106, Pardos et al., 2006) nested within the 39 learning standards Scaffolding can be designed as optimal measures of single KC’s; or as optimal tutoring aids –When more than one transfer model is involved, scaffolds fail to line up with at least one of them! Different students work through ASSISTments at different rates

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 12 Methods: KC Feedback Conjunctive binary-skills Bayes Net (Macready & Dayton, 1977; Haertel, 1989; Maris, 1999; Junker & Sijtsma DINA, 2001; etc.) P(Congruence) 11 P(Equation-Solving) 22 P(Perimeter) 33 GateP(Question) True1-s 1 Falseg1g1 GateP(Question) True1-s 2 Falseg2g2 GP(Question) True1-s 3 Falseg3g3 Pardos et al (2006): tend to prefer more KC’s for prediction; Anozie & Junker (2007): inference about KC inference (106 KC’s)

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 13 Results: KC Feedback Average percent of KC’s mastered: 30-40% February dip reflects a recording error for main questions Can also break this down by individual KC (next slide)

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 14 Results: KC Feedback

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 15 Results: KC Feedback Prediction based on ‘ideal response’ (P[guess] = P[slip] =0) Split-half cross-val accuracy 68-73% Enough to help teachers decide what to teach next.

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 16 Digression: Question & Transfer Model Characteristics Which graph contains the points in the table? 1.Quadrant of (-2,-3)? 2.Quadrant of (-1,-1)? 3.Quadrant of (1,3)? 4.[Repeat main] XY Main Item: Scaffolds: Guess (posterior boxplots) Slip (posterior boxplots)

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 17 Conclusions Different transfer models for different purposes seem necessary. For unidimensional prediction, unidimensional IRT, augmented with “assistance metrics”, works well –Account for question difficulty, help-seeking behavior –We are close to best-possible prediction error A finer grained model like the DINA model is needed for individual and group diagnostics –Individual diagnosis uncertainty can be large –Group diagnosis seems good enough to help teachers decide what to teach next –Scaffold questions: teaching or assessment?

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 18 Future Work Transfer model / KC’s –Discovering and improving the transfer model? –Different transfer models for different purposes – “play together”? Experimental design to improve KC inferences Account for learning over time –Prior distributions for skills based on past performance? –Markov Learning Model for each skill? Compare with crediting/blaming hardest KC –Accuracy of inference? –Speed of computation?

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 19

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 20 RMSE and MAD bounds (Ayers & Junker, 2007) Let Then And

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 21 Dynamic Models: Anozie and Junker (2006) More months helps more than more metrics First 5 online metrics retained for final model(s)

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 22 Full Set of Online Metrics

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 23 Dynamic Models: Anozie and Junker (2006) Look at changing influence of online metrics on MCAS prediction over time –Compute monthly summaries of all online metrics (not just %- correct) –Build linear prediction model for each month, using all current and prev. months’ summaries To enhance interpretation, variable selection –by metric, not by monthly summary –include/exclude metrics simultaneously in all monthly models

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 24 KC’s in DINA analysis

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 25 Results: KC Feedback Top shows posterior CI’s for one skill; middle and bottom are ‘sample sizes’ More data or consistent evidence  smaller CI Less data, or inconsistent evidence  larger CI Experimental Design? How many questions? Which skills? Etc.

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 26

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 27 Methods: KC Feedback Pardos et al (2006) first tried DINA for MCAS prediction –Compared the 1-KC, 5-KC, 39-KC and 106- KC models –Found 39 KC’s did best; 106 KC’s 2 nd best Anozie & Junker (2007) apply DINA with an eye toward feedback to teachers etc.

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 28 Static Prediction Models Feng et al. (2006 & to appear): –Online testing metrics Percent correct on main/scaffold/both items “assistance score” = (errors+hints)/(number of scaffolds) Time spent on (in-)correct answers etc. –Compare paper & pencil pre/post benchmark tests Ayers and Junker (2006): –Rasch & LLTM (linear decomps of item difficulty) –Augmented with online testing metrics Pardos et al. (2006); Anozie (2006): –Binary-skills conjunctive Bayes nets –DINA models (Junker & Sijtsma, 2001; Maris, 1999; etc.)

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 29 The ASSISTment Architectures Extensible Tutor Architecture –Scalable from simple pseudo-tutors with few users to model-tracing tutors and 1000’s of users –Curriculum Unit Items organized into multiple curricula Sections within curriculum: Linear, Random, Experimental, etc. –Problem & Tutoring Strategy Units Task organization & user interaction (e.g. main item & scaffolds, interface widgets, …) Task components mapped to multiple transfer models –Logging Unit Fine-grained human-computer interaction trace Abstracting/coarsening mechanisms Web-based Item Builder –Used by classroom teachers to develop content –Support for building curricula, mapping tasks to transfer models, etc. Relational Database and Network Architecture supports –User Reports (e.g., students, teachers, coaches, administrators) –Research Data Analysis Razzaq et al. (to appear) overview

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 30 Two Assessment Goals To predict end-of-year MCAS scores To provide feedback to teachers (what to teach next?) But there are some complications…

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment Data Tutoring tasks –493 main items –1216 scaffold items Students –912 eighth-graders in two middle schools Skills Models (Transfer Models / Q Matrices) –1 “Proficiency”: Unidimensional IRT –5 MCAS “strands”: Number/Operations, Algebra, Geometry, Measurement, Data/Probability –39 MCAS learning standards: nested in the strands –77 active skills: “WPI April 2005” (106 potential)

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 32 Static Models: Feng et al. (2006 & to appear) What is related to raw MCAS (0-54 pts)? P&P pre/post benchmark tests Online metrics: –Pct Correct on Mains –Pct Correct on Scaffolds –Seconds Spent on Incorrect Scaffolds –Avg Number of Scaffolds per Minute –Number of Hints Plus Incorrect Main Items –etc. All annual summaries PredictorCorr P & P Tests SEP-TEST 0.75 MARCH-TEST 0.41 ASSISTment Online Metrics MAIN_PERCENT_CORRECT 0.75 MAIN_COUNT 0.47 TOTAL_MINUTES 0.26 PERCENT_CORRECT 0.76 QUESTION_COUNT 0.20 HINT_REQUEST_COUNT AVG_HINT_REQUEST HINT_COUNT AVG_HINT_COUNT BOTTOM_OUT_HINT_COUNT AVG_BOTTOM_HINT ATTEMPT_COUNT 0.08 AVG_ATTEMPT AVG_QUESTION_TIME AVG_ITEM_TIME -0.39

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 33 Static Models: Feng et al. (2006 & to appear) Stepwise linear regression Mean Abs Deviation Within-sample MAD = Raw MCAS = 0-54, so Within-sample Pct Err = MAD/54 =10.25% (uses Sept P&P Test) PredictorCoefficient (Const)26.04 Sept_Test0.64 Pct_Correct_All24.21 Avg_Attempts Avg_Hint_Reqs-2.28

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 34 Static Models: Ayers & Junker (2006) Compared two IRT models on ASSISTment main questions: –Rasch model for 354 main questions. –LLTM: Constrained Rasch model decompose main question difficulty by skills in the WPI April Transfer Model (77 skills). Replace “Percent Correct” with IRT proficiency score in linear predictions of MCAS

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 35 Static Models: Ayers & Junker (2006) Rasch fits much better than LLTM –  BIC = -3,300 –  df = +277 Attributable to –Transfer model? –Linear decomp of item difficulties? Residual and difficulty plots suggest transfer model fixes.

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 36 Static Models: Ayers & Junker (2006) Focus on Rasch, predict MCAS with where  = proficiency, Y=online metric 10-fold cross-validation vs. 54-pt raw MCAS: PredictorsVariablesCV- MADCV % Error % Corr Main  (proficiency)  + 5 Online Metrics

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 37 Static Models: Pardos et al. (2006) Compared nested versions of binary skills models (coded both ASSISTments and MCAS): g i = 0.10, s i = 0.05, all items;  k = 0.5, all skills Inferred skills from ASSISTments; computed expected score for 30-item MCAS subset MODELMean Absolute Deviance (MAD)% ERROR (30 items) 39 MCAS standards skills (WPI Apr) MCAS strands Binary Skill

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 38 Static Models: Anozie (2006) Focused on 77 active skills in WPI April Model Estimated  k ’s, g i ’s and s i ’s using flexible priors Predicted full raw 54-pt MCAS score as linear function of (expected) number of skills learned Months of DataCV MADCV % Err

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 39 Dynamic Prediction Models Razzaq et al. (to appear): evidence of learning over time Feng et al. (to appear): student or item covariates plus linear growth curves (a la Singer & Willett, 2003) Anozie and Junker (2006): changing influence of online metrics over time

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 40 Dynamic Models: Razzaq et al. (to appear) ASSISTment system is sensitive to learning Not clear what is the source of learning here…

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 41 Dynamic Models: Feng et al. (to appear) Growth-Curve Model I: Overall Learning Growth-Curve Model II: Learning in Strands School was a better predictor (BIC) than Class or Teacher; possibly because School demographics dominate the intercept. Sept_Test is a good predictor of baseline proficiency. Baseline and learning rates varied by Strand.

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 42 Dynamic Models: Anozie and Junker (2006)

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 43 Dynamic Models: Anozie and Junker (2006) Recent main question performance dominates – proficiency?

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 44 Dynamic Models: Anozie and Junker (2006) Older performance on scaffolds similar to recent – learning?

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 45 Summary of Prediction Models ModelVariablesCV-MADCV % ErrorCV-RMSE PctCorrMain #Skills of 77 learned 1? Rasch Proficiency 1? PctCorrMain + 4 metrics 35 ( = 5 x 7 ) Rasch Profic + 5 metrics 6? Feng et al. (in press) compute the split-half MAD of the MCAS and estimate ideal % Error ~ 11%, or MAD ~ 6 points. Ayers & Junker (2006) compute reliabilities of the ASSISTment sets seen by all students and estimate upper and lower bounds for optimal MAD: 0.67 MAD 5.21.

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 46 New Directions We have some real evidence of learning –We are not yet modeling individual student learning Current teacher report: For each skill, report percent correct on all items for which that skill is hardest. –Can we do better? Approaches now getting underway: –Learning curve models –Knowledge-tracing models

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 47 New Directions: Cen, Koedinger & Junker (2005) Inspired by Draney, Pirolli & Wilson (1995) –Logistic regression for successful skill uses –Random intercept (baseline proficiency) –fixed effects for skill and skill*opportunity Difficulty factor: skill but not skill*opportunity Learning factor: skill and skill*opportunity –Part of Data Shop at Feng et al. (to appear) fit similar logistic growth curve models to ASSISTment items

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 48 New Directions: Knowledge Tracing Combine knowledge tracing approach of Corbett, Anderson and O’Brien (1995) with DINA model of Junker and Sijtsma (2001) Each skill represented by a two state (unlearned/learned) Markov process with absorbing state at “learned”. Can locate time during school year when each skill is learned. Work just getting underway (Jiang & Junker).

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 49 Discussion ASSISTment system –Great testbed for online cognitive modeling and prediction technologies –Didn’t mention reporting and “gaming detection” technologies –Teachers positive, students impressed Ready-Fire-Aim –Important! Got system up and running, lots of user feedback & buy-in –But… E.g. lack of control over content and content- rollout (content balance vs MCAS?) –Given this, perhaps only crude methods needed/possible for MCAS prediction?

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 50 Discussion Multiple skill codings for different purposes –Exam prediction vs. teacher feedback; state to state. Scaffolds –Dependence between scaffolds and main items –Forced-scaffolding: main right  scaffolds right –Content sometimes skills-based, sometimes tutorial We are now building some true one-skill decomps to investigate stability of skills across items Student learning over time –Clearly evidence of that! –Some experiments not shown here suggest modest but significant value-added for ASSISTments –Starting to model learning, time-to-mastery, etc.

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 51 References Anozie, N. (2006). Investigating the utility of a conjunctive model in Q-matrix assessment using monthly student records in an online tutoring system. Proposal submitted to the National Council on Measurement in Education 2007 Annual Meeting. Anozie, N.O. & Junker, B. W. (2006). Predicting end-of-year accountability assessment scores from monthly student records in an online tutoring system. American Association for Artificial Intelligence Workshop on Educational Data Mining (AAAI-06), July 17, 2006, Boston, MA. Ayers, E. & Junker, B.W. (2006). Do skills combine additively to predict task difficulty in eighth-grade mathematics? American Association for Artificial Intelligence Workshop on Educational Data Mining (AAAI-06), July 17, 2006, Boston, MA. Ayers, E. & Junker, B. W. (2006). IRT modeling of tutor performance to predict end of year exam scores. Working paper. Corbett, A. T., Anderson, J. R., & O'Brien, A. T. (1995) Student modeling in the ACT programming tutor. Chapter 2 in P. Nichols, S. Chipman, & R. Brennan, Cognitively Diagnostic Assessment. Hillsdale, NJ: Erlbaum. Draney, K. L., Pirolli, P., & Wilson, M. (1995). A measurement model for a complex cognitive skill. In P. Nichols, S. Chipman, & R. Brennan, Cognitively Diagnostic Assessment. Hillsdale, NJ: Erlbaum. Feng, M., Heffernan, N. T., & Koedinger, K. R. (2006). Predicting state test scores better with intelligent tutoring systems: developing metrocs to measure assistance required. In Ikeda, Ashley & Chan (Eds.) Proceedings of the Eighth International Conference on Intelligent Tutoring Systems. Springer-Verlag: Berlin. pp Feng, M., Heffernan, N., Mani, M., & Heffernan, C. (2006). Using mixed effects modeling to compare different grain-sized skill models. AAAI06 Workshop on Educational Data Mining, Boston MA. Feng, M., Heffernan, N. T., & Koedinger, K. R. (in press). Addressing the testing challenge with a web-based E-assessment system that tutors as it assesses. Proceedings of the 15 th Annual World Wide Web Conference. ACM Press (Anticipated): New York, Hao C., Koedinger K., & Junker B. (2005). Automating Cognitive Model Improvement by A*Search and Logistic Regression. In Technical Report (WS-05-02) of the AAAI-05 Workshop on Educational Data Mining, Pittsburgh, Junker, B.W. & Sijtsma K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement 25: Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika 64, Pardos, Z. A., Heffernan, N. T., Anderson, B., & Heffernan, C. L. (2006). Using Fine Grained Skill Models to Fit Student Performance with Bayesian Networks. Workshop in Educational Data Mining held at the Eighth International Conference on Intelligent Tutoring Systems. Taiwan Razzaq, L., Feng, M., Nuzzo-Jones, G., Heffernan, N.T., Koedinger, K. R., Junker, B., Ritter, S., Knight, A., Aniszczyk, C., Choksey, S., Livak, T., Mercado, E., Turner, T.E., Upalekar. R, Walonoski, J.A., Macasek. M.A., & Rasmussen, K.P. (2005). The Assistment Project: Blending Assessment and Assisting. In C.K. Looi, G. McCalla, B. Bredeweg, & J. Breuker (Eds.) Proceedings of the 12th Artificial Intelligence In Education. Amsterdam: ISO Press. pp Razzaq, L., Feng, M., Heffernan, N. T., Koedinger, K. R., Junker, B., Nuzzo-Jones, G., Macasek, N., Rasmussen, K. P., Turner, T. E. & Walonoski, J. (to appear). A web-based authoring tool for intelligent tutors: blending assessment and instructional assistance. In Nedjah, N., et al. (Eds). Intelligent Educational Machines within the Intelligent Systems Engineering Book Series (see Singer, J. D. & Willett, J. B. (2003). Applied Longitudinal Data Analysis: Modeling Change and Occurrence. Oxford University Press, New York. Websites: