Download presentation
Presentation is loading. Please wait.
Published byDuane Francis Modified over 9 years ago
1
Worcester Polytechnic Institute Towards Assessing Students’ Fine Grained Knowledge: Using an Intelligent Tutor for Assessing Mingyu Feng August 18 th, 2009 Ph.D. Dissertation Committee: Prof. Neil T. Heffernan (WPI) Prof. Carolina Ruiz (WPI) Prof. Joseph E. Beck (WPI) Prof. Kenneth R. Koedinger (CMU)
2
2 Motivation – the need Concerns about poor student performance on new state tests High-stakes standards-based tests are required by the No Child Left Behind (NCLB) Act Student performance are not satisfactory Massachusetts (2003, 20% failed 10 th grade math on the first try) Worcester Secondary teachers are asked to be data-driven MCAS test reports Formative assessment and practice tests Provided by Northwest Evaluation Association; Measured Progress; Pearson Assessments, etc.
3
333 Motivation – the problems I: Formative assessment takes time from instruction NCLB or NCLU (No Child Left Untested)? Every hour spent assessing students is an hour lost from instruction Limited classroom time compels teachers to make a choice
4
44 Motivation – the problems II: Performance reports are not satisfactory Teachers want more frequent and more detailed reports Confrey, J., Valenzuela, A., & Ortiz, A. (2002). Recommendation to the Texas State Board of Education on the Setting of TAKS Standards: A Call to Responsible Action. At http://www.syrce.org/State_Board.htmhttp://www.syrce.org/State_Board.htm
5
5 Main Contributions Improved assessment system by taking into account how much assistance students need (WWW’06; ITS’06; EDM’08; UMUAI Journal’09 (nominated for James Chen award)) Established a way to track and predict performance longitudinally over multiple years (WWW’06; EDM’08) Rigorously evaluated the effectiveness of the skill models of various granularities (AAAI’06 EDM Workshop; TICL’07; IEEE Journal’09) Used data mining approach to evaluate effectiveness of individual contents (AIED’09) Used data mining to refine existing skill models (EDM’09; in preparation) Developed an online reporting system deployed and used by real teachers (AIED’05; Book chapter’07; TICL Journal’06; JILR Juornal’07)
6
6 Roadmap Motivation Contributions Background - ASSISTment Using tutoring system as an assessor Dynamic assessment Longitudinal modeling Cognitive diagnostic modeling Conclusion & general implications
7
77 ASSISTments System A web-based tutoring system that assists students in learning mathematics and gives teachers assessment of their students’ progress Teachers like ASSISTments Students like ASSISTments
8
8 We break multi-step items (original question) into scaffolding questions Attempt: student take an action to answer a question Response: the correctness of student answer (1/0) Hint Messages: given on demand that give hints about what step to do next Buggy Message: a context sensitive feedback message Skill: a piece of knowledge required to answer a question An ASSISTment
9
99 Facts about ASSISTments 5000+ students have used the system regularly More than 10 million data records collected Other features Learning experiments; authoring tools, account and class management toolkit … The dissertation uses data of about 1000 students who used ASSISTments during 2004-2006 AIED’05: Razzaq, L., Feng, M., Nuzzo-Jones, G., Heffernan, N.T., Koedinger, K. R., Junker, B., Ritter, S., Knight, A., Aniszczyk, C., Choksey, S., Livak, T., Mercado, E., Turner, T.E., Upalekar. R, Walonoski, J.A., Macasek. M.A., Rasmussen, K.P. (2005). The Assistment Project: Blending Assessment and Assisting. In C.K. Looi, G. McCalla, B. Bredeweg, & J. Breuker (Eds.) Proceedings of the 12th International Conference on Artificial Intelligence in Education, pp. 555-562. Amsterdam: ISO Press. Book Chapter: Razzaq, L., Feng, M., Heffernan, N., Koedinger, K., Nuzzo-Jones, G., Junker, B., Macasek, M., Rasmussen, K., Turner, T., & Walonoski, J. (2007). Blending Assessment and Instructional Assistance. In Nedjah, Mourelle, Borges and Almeida (Eds). Intelligent Educational Machines within the Intelligent Systems Engineering Book Series. pp.23-49. Springer Berlin / Heidelberg.
10
10 Roadmap Motivation Contributions Background - ASSISTments Using tutoring system as an assessor Dynamic assessment Longitudinal modeling Cognitive diagnostic modeling Conclusion & general implications
11
11 A Grade Book Report JILR Journal: Feng, M. & Heffernan, N. (2007). Towards Live Informing and Automatic Analyzing of Student Learning: Reporting in the Assistment System. Journal of Interactive Learning Research. 18 (2), pp. 207-230. Chesapeake, VA: AACE. TICL Journal: Feng, M., Heffernan, N.T. (2006). Informing Teachers Live about Student Learning: Reporting in the Assistment System. Technology, Instruction, Cognition, and Learning Journal. Vol. 3. Old City Publishing, Philadelphia, PA. 2006. Where does this score come from?
12
Automated Assessment Big idea: use data collected while a student uses ASSISTment to assess him Lots of types of data available (last screen just used % correct on original questions) Lots of other possible measures Why should we be more complicated? Worcester Polytechnic Institute 12
13
13 A Grade Book Report Static – does not distinguish “Tom” and “Jack” Average – ignores development over time Uninformative – not informative for classroom instruction Dynamic assessment Longitudinal modeling Cognitive diagnostic assessment
14
14 Dynamic Assessment – the idea Brown, A. L., Bryant, N.R., & Campione, J. C. (1983). Preschool children’s learning and transfer of matrices problems: Potential for improvement. Paper presented at the Society for Research in Child Development meetings, Detroit. Dynamic testing began before computerized testing (Brown, Bryant, & Campione, 1983).
15
15 Dynamic vs. Static Assessment Developing dynamic testing metrics # attempts # minutes to come up with an answer; # minutes to complete an ASSISTment # hint requests; # hint-before-attempt requests; #bottom-out hints % correct on scaffolds # problems solved “Static” measure correct/wrong on original questions
16
16 Dynamic A ssessment – data 2004-2005 Data Sept, 2004 – May, 2005 391 students Online data 267 minutes (sd. = 79); 9 days; 147 items (sd. = 60) 8 th grade MCAS scores (May, 2005) 2005-2006 Data Sept, 2005 – May, 2006 616 students Online data 196 minutes (sd. = 76); 6 days; 88 items (sd. = 42) 8 th grade MCAS scores (May, 2006)
17
17 Three linear stepwise regression models 17 Dynamic Assessment - modeling 1-parameter IRT proficiency estimate All online metrics 1-parameter IRT proficiency estimate + all online metrics The standard test model The assistance model The mixed model 1-parameter IRT: One parameter item response theory model MCAS Score
18
18 Bayesian Information Criterion (BIC) Widely used model selection criterion Resolves overfitting problem by introducing a penalty term for the number of parameters Formula Prefer model with lower BIC Mean Absolute Deviation (MAD) Cross-validated prediction error Function Prefer model with lower MAD 18 Dynamic Assessment - evaluation Raftery, A. E. (1995). Bayesian model selection in social research. Sociological Methodology, 25, 111-163.
19
19 Dynamic Assessment - results 1-parameter IRT proficiency estimate All online metrics 1-parameter IRT proficiency estimate + all online metrics The standard test model The assistance model The mixed model Model MADBIC Correlation with 2005 8th grade MCAS Model MADBIC Correlation with 2005 8th grade MCAS The standard test model6.40 -295 0.733 The assistance model 5.46 -402 0.821 p=0.001 Model MADBIC Correlation with 2005 8th grade MCAS The standard test model6.40 -295 0.733 The assistance model 5.46 -402 0.821 The mixed model 5.04 -450 0.841 p=0.001
20
20 Dynamic Assessment – what variables are important?
21
21 Dynamic Assessment - robustness See if model can generalize Test model on other year’s data
22
Compare Models from Two Years Worcester Polytechnic Institute 22 Which metrics are stable across years? 2004-2005 data2005-2006 data (Constant) 32.4143.284 IRT_Proficiency_Estimate 26.832.944 Scaffold_Percent_Correct 20.42721.327 Avg_Question_Time -0.17-0.102 Avg_Attempt -10.5 Avg_Hint_Request -3.217 Question_Count 0.072 Avg_Item_Time 0.045 Total_Attempt -0.044
23
23 Dynamic Assessment - conclusion ASSISTments data enables us to assess more accurately The relative success of the assistance model over the standard test model highlights the power of the dynamic measures Feng, M., Heffernan, N.T, Koedinger, K.R. (2006a). Addressing the Testing Challenge with a Web-Based E- Assessment System that Tutors as it Assesses. In Proceedings of the 15th International World Wide Web Conference. pp. 307-316. New York, NY: ACM Press. 2006. Best Student Paper Nominee. Feng, M., Heffernan, N.T., & Koedinger, K.R. (2009). Addressing the assessment challenge in an online System that tutors as it assesses. User Modeling and User-Adapted Interaction: The Journal of Personalization Research (UMUAI journal). 19(3), 2009.
24
24 Roadmap Motivation Contributions Background - ASSISTments Using tutoring system as an assessor Dynamic assessment Longitudinal modeling Cognitive diagnostic modeling Conclusion & general implications
25
25 Can we have our cake and eat it, too? Most large standardized tests are unidimensional or low-dimensional. Yet, teachers need fine grained diagnostic reports (Militello, Sireci, & Schweid, 2008; Wylie, & Ciofalo, 2008; Stiggins, 2005) Can we have our cake and eat it, too? Militello, M., Sireci, S., & Schweid, J. (2008). Intent, purpose, and fit: An examination of formative assessment systems in school districts. Paper presented at the American Educational Research Association, New York City, NY. Wylie, E. C., & Ciofalo, J. (2008). Supporting teachers' use of individual diagnostic items. Teachers College Record. Retrieved from http://www.tcrecord.org/PrintContent.asp?ContentID=15363 on October 13, 2008.http://www.tcrecord.org/PrintContent.asp?ContentID=15363 Stiggins, R. (2005). From formative assessment to assessment FOR learning: A path to success in standards-based schools. Phi Delta Kappan, 87(4), 324-328.
26
26 Cognitive Diagnostic Assessment McCalla & Greer (1994) pointed out that the ability to represent and reason about knowledge at various levels of detail is important for robust tutoring. Griel, Wang & Zhou (2008) proposed one direction for future research is to increase understanding of how to select an appropriate grain size or level of analysis Can we use MCAS test results to help select the right grain-sized model from a series of models of different granularities? McCalla, G. I. and Greer, J. E. (1994). Granularity- based reasoning and belief revision in student models. In Greer, J. E. and McCalla, G. I., (eds), Student Modeling: The Key to Individualized Knowledge-Based Instruction, pages 39-62. Springer-Verlag, Berlin. Gierl, M.J., Wang, C., & Zhou, J. (2008). Using the attribute hierarchy method to make diagnostic inferences about examinees’ cognitive skills in Algebra on the SAT. Journal of Technology, Learning, and Assessment, 6(6).
27
27 Building Skill Models Math WPI - 1 WPI - 5 Patterns, Relations, and Algebra Geometry Measurement Number Sense and Operations Data Analysis, Statistics and Probability … Using- measurement -formulas- and- techniques Setting-up- and-solving- equation Understanding -pattern Understanding- data- presentation- techniques Understanding- and-applying- congruence-and- similarity Converting- from-one- measure-to- another understanding- number- representations WPI - 39 … … … … WPI - 78 Ordering-fractions Equation- solving Equation-concept Inducing-function Plot-graph XY-graph Congruence Similar-triangles Perimeter Area Circle-graph Unit-conversion Equivalent- Fractions- Decimals-Percents … … … …… … …
28
28 Building Skill Models Math WPI - 5 WPI - 1 Patterns, Relations, and Algebra Geometry Measurement Number Sense and Operations Data Analysis, Statistics and Probability … Using- measurement -formulas- and- techniques Setting-up- and-solving- equation Understanding -pattern Understanding- data- presentation- techniques Understanding- and-applying- congruence-and- similarity Converting- from-one- measure-to- another understanding- number- representations WPI - 39 … … … … WPI - 78 Ordering-fractions Equation- solving Equation-concept Inducing-function Plot-graph XY-graph Congruence Similar-triangles Perimeter Area Circle-graph Unit-conversion Equivalent- Fractions- Decimals-Percents … … … …… … …
29
29 Cognitive Diagnostic Assessment – data 2004-2005 Data Sept, 2004 – May, 2005 447 students Online data: 7.3 days; 87 items (sd. = 35) Item level response of 8 th grade MCAS test (May, 2005) 2005-2006 Data Sept, 2005 – May, 2006 474 students Online data: 5 days; 51 items (sd. = 24) Item level 8 th grade MCAS scores (May, 2006) All online and MCAS items have been tagged with all four skill models
30
30 Cognitive Diagnostic Assessment - modeling Fit mixed-effects logistic regression model Predict MCAS score Extrapolate the fitted model in time to the month of the MCAS test Obtain probability of getting each MCAS question correct, based upon skill tagging of the MCAS item Sum up probabilities to get total score 30 -- X ijkt is the 0/1 response of student i on question j tapping skill k in month t -- Month t is elapsed month in the study; 0 for September, 1 for October, and so on -- β 0k and β 1k : respective fixed effects for baseline and rate of change in probability of correctly answering a question tapping skill k. -- β 00 and β 10 : the group average incoming knowledge level and rate of change -- β 0 and β 1 : the baseline level of achievement and rate of change of the student Longitudinal model (e.g. Singer & Willett, 2003)
31
Absolute Difference WPI-1WPI-5WPI-39WPI-78 1.692.152.824.53 2.342.853.334.87 … 0.540.771.152.74 0.591.301.883.70 1.330.580.021.86 31 How do I Evaluate Models? 04-05 Data Real MCAS score ASSISTment Predicted Score Skill ModelsWPI-1WPI-5WPI-39WPI-78 Mary 25.0023.3122.8522.1820.47 Tom 32.0029.6629.1528.6727.13 … Sue 29.0028.4628.2327.8526.26 Dick 28.0027.4126.7026.1224.30 Harry 22.0023.3322.5822.0220.14 MAD4.424.374.224.11 %Error13.00%12.85%12.41%12.09% Paired two-sample t-test
32
32 P =0.21P <0.001 P =0.006 Comparing Models of Different Granularities 4.67 13.70% 4.36 12.83% P =0.10 1-parameter IRT model 04-05 Data WPI-1WPI-5WPI-39WPI-78 MAD 4.424.374.224.11 %Error 13.00%12.85%12.41%12.09% > > > > > > 05-06 Data WPI-1WPI-5WPI-39WPI-78 MAD 6.586.514.834.99 %Error 19.37%19.14%15.10%14.70% P <0.001 P =0.03
33
The Effect of Scaffolding - hypothesis Only using original questions makes it hard to decide which skill to “blame” Scaffolding questions aid in diagnosis by directly assessing a single skill Hypotheses Using responses to scaffolding questions will improve prediction accuracy Scaffolding questions are more useful for fine grained models 33
34
The Effect of Scaffolding - results 04-05 Data Only original questions used WPI-1 14.91% WPI-5 14.06% WPI-39 15.29% WPI-78 17.75% 34 Original + Scaffolding questions used 13.00% 12.85% 12.41% 12.09% 05-06 Data Only original questions used WPI-1 20.05% WPI-5 19.88% WPI-39 18.68% WPI-78 16.91% Original + Scaffolding questions used 19.37% 19.14% 15.10% 14.70%
35
35 Cognitive Diagnostic Assessment - usage Results presented in a nested structure of different granularities to serve a variety of stake-holders
36
36 Cognitive Diagnostic Assessment - conclusion Fine-grained models do the best job estimating student skill level overall Not necessarily the best for all consumers (e.g. principals) Need ability to diagnosis (e.g. scaffolding questions) Scaffolding questions Helps improve overall prediction accuracy More useful for fine-grained models Feng, M., Heffernan, N.T, Mani, M. & Heffernan C. (2006). Using Mixed-Effects Modeling to Compare Different Grain-Sized Skill Models. In Beck, J., Aimeur, E., & Barnes, T. (Eds). Educational Data Mining: Papers from the AAAI Workshop. Menlo Park, CA: AAAI Press. pp. 57-66. Feng, M, Heffernan, N., Heffernan, C. & Mani, M. (2009). Using mixed-effects modeling to analyze different grain-sized skill models. IEEE Transactions on Learning Technologies Special Issue on Real-World Applications of Intelligent Tutoring Systems. (Featured article of the issue) Pardos, Z., Feng, M. & Heffernan, N. T. & Heffernan-Lindquist, C. (2007).Analyzing fine-grained skill models using bayesian and mixed effect methods. In Luckin & Koedinger (Eds.) Proceedings of the 13th Conference on Artificial Intelligence in Education. Amsterdam, Netherlands: IOS Press.pp.626-628.
37
37 Future Work - Skill Model Refinement We found that WPI-78 is good enough to better predict a state test than some less fine-grained models However, WPI-78 may have some mis-taggings Expert-built models are subject to the risk of “expert blind spot” Our best-guess in a 7-hour coding session A best guess model should be iteratively tested and refined
38
38 Skill Model Refinement - approaches Human experts manually update hand-crafted models (1,000+ items ) * (100+ skills) Not practical to do it often Data mining can help Skills or items with high residuals Skills consistently over-predicted or under-predicted “Un-learned” skills (i.e. negative slopes from mixed- effects models) Feng, M., Heffernan, N., Beck, J, & Koedinger, K. (2008). Can we predict which groups of questions students will learn from? In Beck & Baker (Eds.). Proceedings of the 1st International Conference on Education Data Mining. Montreal, 2008.
39
39 Searching for better models automatically Learning Factor Analysis (LFA) (Koedinger, & Junker, 1999) A semi-automated method Three parts Difficulty factors associated with problems A combinatorial search space by applying operators (add, split, merge) on the base model A statistical model that evaluate how a model fit the data Can we increase the efficiency of LFA? Skill Model Refinement - approaches Human identify difficulty factors through task analysis Auto-methods search for better models based upon factors
40
40 Suggesting Difficulty Factors Some items in a random sequence cause significantly less learning than others Hypothesis Problems that “don’t help” students learn might be teaching a different skill(s) Create factor tables Preliminary results show some validity Feng, M., Heffernan, N., & Beck, J. (2009). Using learning decomposition to analyze instructional effectiveness in the ASSISTment system. In Dimitrova, Mizoguchi, du Boulay, & Graesser (Eds), Proceedings of the 14th International Conference on Artificial Intelligence in Education (AIED-2009). Amsterdam, Netherlands: IOS Press. Brighton, UK. SkillFactor Circle-areaHigh Circle-areaHigh Circle-areaHigh Circle-areaLow
41
41 Roadmap Motivation Contributions Background - ASSISTments Using tutoring system as an assessor Dynamic assessment Longitudinal modeling Cognitive diagnostic modeling Conclusion & general implications
42
42 Conclusion of the Dissertation The dissertation establishes novel assessment methods to better assess students in tutoring systems Assess students better by analyzing their learning behaviors when using the tutor Assess students longitudinally by tracking learning over time Assess students diagnostically by modeling fine- grained skills
43
43 Comments from the Education Secretary Secretary of Education, Arne Duncan weighed in (in Feb 2009) on the NCLB Act, and called for continuous assessment Duncan says he is concerned about overtesting but he thinks states could solve the problem by developing better tests. He also wants to help them develop better data management systems that help teachers track individual student progress. "If you have great assessments and real-time data for teachers and parents that say these are [the student's] strengths and weaknesses, that's a real healthy thing," he says. Ramírez, E., & Clark, K. (Feb., 2009). What Arne Duncan Thinks of No Child Left Behind: The new education secretary talks about the controversial law and financial aid forms. (Electronic version) Retrieved on March 8th, 2009 from http://www.usnews.com/articles/education/2009/02/05/what-arne-duncan-thinks-of-no-child-left-behind.html. http://www.usnews.com/articles/education/2009/02/05/what-arne-duncan-thinks-of-no-child-left-behind.html
44
44 General implication Continuous assessment systems are possible to build (we built one) Save classroom instruction time by assessing students during tutoring Track individual progress and help stakeholders get student performance information Provide teachers with fine-grained, cognitively diagnostic feedbacks to be “data-driven”
45
45 A metaphor for this shift Committee on the Foundations of Assessment Board on Testing and Assessment Center for Education National Research Council James W. Pellegrino Naomi Chudowsky Robert Glaser (page 284). Businesses don’t close down periodically to take inventory of stock any more Bar code; auto-checkout Non-stopped business Richer information
46
46 Acknowledgement My advisor Neil Heffernan Committee members Ken Koedinger Carolina Ruiz Joe Beck The ASSISTment team My family Many more…
47
Worcester Polytechnic Institute Thanks! Questions?
48
48 Backup slides
49
49 Motivation – the problems III: The “moving” target problem Testing and instruction have been separate fields of research with their own goals Psychometric theory assumes a fixed target for measurement ITS wants student ability to “move”
50
50 More Contributions Working systems www.ASSISTment.org www.ASSISTment.org The reporting system that gives cognitive diagnostic reports to teachers in a timely fashion Establish an easy approach to detect the effectiveness of individual tutoring content AIED’05: Razzaq, L., Feng, M., Nuzzo-Jones, G., Heffernan, N.T., Koedinger, K. R., Junker, B., Ritter, S., Knight, A., Aniszczyk, C., Choksey, S., Livak, T., Mercado, E., Turner, T.E., Upalekar. R, Walonoski, J.A., Macasek. M.A., Rasmussen, K.P. (2005). The Assistment Project: Blending Assessment and Assisting. In C.K. Looi, G. McCalla, B. Bredeweg, & J. Breuker (Eds.) Proceedings of the 12th International Conference on Artificial Intelligence in Education, pp. 555-562. Amsterdam: ISO Press. Book Chapter: Razzaq, L., Feng, M., Heffernan, N., Koedinger, K., Nuzzo-Jones, G., Junker, B., Macasek, M., Rasmussen, K., Turner, T., & Walonoski, J. (2007). Blending Assessment and Instructional Assistance. In Nedjah, Mourelle, Borges and Almeida (Eds). Intelligent Educational Machines within the Intelligent Systems Engineering Book Series. pp.23-49. Springer Berlin / Heidelberg. JILR Journal: Feng, M. & Heffernan, N. (2007). Towards Live Informing and Automatic Analyzing of Student Learning: Reporting in the Assistment System. Journal of Interactive Learning Research. 18 (2), pp. 207-230. Chesapeake, VA: AACE. TICL Journal: Feng, M., Heffernan, N.T. (2006). Informing Teachers Live about Student Learning: Reporting in the Assistment System. Technology, Instruction, Cognition, and Learning Journal. Vol. 3. Old City Publishing, Philadelphia, PA. 2006. AIED’09: Feng, M., Heffernan, N.T., Beck, J. (2009). Using learning decomposition to analyze instructional effectiveness in the ASSISTment system. In Dimitrova, Mizoguchi, du Boulay, and Grasser (Eds), Proceedings of the 14th International Conference on Artificial Intelligence in Education (AIED-2009). pp. 523-530. Amsterdam, Netherlands: IOS Press.
51
51 Evidence 62%50%37%
52
52 Evidence 1.Congruence 2.Perimeter 3.Equation-Solving
53
53 Terminology MCAS Item/question/problem Response Original question Scaffolding question Hint message Bottom-out hint Buggy message Attempt Skill/knowledge component Skill model/cognitive model/Q-matrix Single mapping model Multi-mapping model
54
54
55
55 Worcester Polytechnic Institute 55 The reporting system I developed the first reporting system for ASSISTments in 2004 that is online, live, and gives detailed feedback at a grain size for guiding instruction
56
56 The grade book “It’s spooky; he’s watching everything we do”. – a student
57
57 Identifying difficult steps
58
58 Informing hard skills
59
59 Linear Regression Model An approach to modeling relationship between one or more variables (y) and one or more variables (X) Y depends linearly on X How linear regression works? Minimizing sum-of-squares Example of linear regression with one independent variable Stepwise regression Forward; backward; Combination Worcester Polytechnic Institute 59
60
60 1-Parameter IRT Model Item response theory (IRT) model relates the probability of an examinee's response to a test item to an underlying ability in a logistic function 1-PL IRT model where β n is the ability of person n and δ i is the difficulty of item i. I used BI-LOG MG to run the model and get estimate of student ability and item difficulty Worcester Polytechnic Institute 60
61
61 Dynamic assessment - The models
62
62 Dynamic assessment - The models
63
63 Dynamic assessment – The models
64
64 Dynamic assessment - Validation
65
65 Longitudinal Modeling - data Average %correct on original questions over time (FAKE data) What does our real data look like?real data
66
66
67
67
68
68 What do we get from (linear) mixed effects models? Average population trajectory for the specified group Trajectory indicated by two parameters intercept: slope: The average estimated score for a group at time j is One trajectory for every single student Each student got two parameters to vary from the group average Intercept: slope: The estimated score for student i at time j is Longitudinal Modeling - methodology Singer, J. D. & Willett, J. B. (2003). Applied Longitudinal Data Analysis: Modeling Change and Occurrence. Oxford University Press, New York.
69
69 Longitudinal Modeling - results BIC: Bayesian Information Criterion (the lower, the better) Feng, M., Heffernan, N.T, Koedinger, K.R. (2006a) Addressing the Testing Challenge with a Web-Based E- Assessment System that Tutors as it Assesses. In Proceedings of the 15th International World Wide Web Conference. pp. 307-316. New York, NY: ACM Press. 2006. Best Student Paper Nominee. Feng, M., Heffernan, N.T, Koedinger, K.R. (2006b). Predicting State Test Scores Better with Intelligent Tutoring Systems: Developing Metrics to Measure Assistance Required. In Ikeda, Ashley & Chan (Eds.). Proceedings of the 8th International Conference on Intelligent Tutoring Systems. Springer-Verlag: Berlin. pp. 31-40. 2006.
70
70 Mixed effects models Individuals in the population are assumed to have their own subject-specific mean response trajectories over time The mean response is modeled as a combination of population characteristics (fixed effects) and subject-specific effects that are unique to a particular individual (random effects) It is possible to predict how individual response trajectories change over time Flexibility in accommodating imbalance in longitudinal data Methodological features: 1) 3 or more waves of data 2) an outcome variable (dependent variable) whose values change systematically over time 3) A sensible metric for time that is the fundamental predictor in the longitudinal study
71
71 Sample longitudinal data
72
72 Comparison of Approaches Ayers & Junker (2006) Estimate student proficiency using 1-PL IRT model LLTM (linear logistic test model) Main question difficulty decomposed into K skills 1-PL IRT fits dramatically better Only main questions used Additive, non-temporal WinBUGS Worcester Polytechnic Institute 72
73
73 Comparison of Approaches Pardos et al. (2006) Conjunctive Bayes nets Non-temporal Scaffolding used Bayes Net Toolbox (Murphy, 2001) DINA model (Anozie, 2006) Worcester Polytechnic Institute 73
74
74 Comparison of Approaches Feng, Heffernan, Mani & Heffernan (2006) Logistic mixed-effects model (Generalized Linear Mixed- effects Model, GLMM) Temporal X i j is the 0/1 response of student i on question j tapping KC k in month t, R lme4 library Worcester Polytechnic Institute 74 Month t is elapsed month in the study; β 0k and β 1k are respective fixed effects for baseline and rate of change in probability of correctly answering a question tapping KC k.
75
75 Comparison of Approaches Comparing to LLTM in Ayers & Junker (2006) Student proficiency depends on time Question difficulty depends on KC and time Assign only the most difficult skill instead of full Q-matrix mapping of multiple skills as in LLTM Scaffolding used to gain identifiability Ayers & Junker (2006) use regression to predict MCAS after obtaining estimate of student ability (θ) (MAD= 10.93%) No such regression process in my work logit(p=1) = θ – 0; estimated score = full score * p Higher MAD, but provide diagnostic information Worcester Polytechnic Institute 75
76
76 Comparison of Approaches Comparing to Bayes nets and conjunctive models Bayes: probability reasoning; conjunctive GLMM: linear learning; max-difficulty reduction Computationally much easier and faster Results are still comparable GLMM is better than Bayes nets when WPI-1, WPI-5 used GLMM is comparable with Bayes nets when WPI-39 or WPI- 78 used WPI-39: GLMM 12.41%, Bayes: 12.05% WPI-78: GLMM 12.09%, Bayes: 13.75% Worcester Polytechnic Institute 76
77
77 Cognitive Diagnostic Assessment – BIC results BIC #data points are different Items tagged with more than one skill will be duplicated in the data Finer grained models have more multi-mappings, and thus, more data points (higher BIC) WPI-5 better than WPI-1; WPI-78 better than WPI-39 Calculate MAD as the evaluation gauge Worcester Polytechnic Institute 77 Model WPI-1WPI-5WPI-39WPI-78 04-05 Data 173445.2170359.9170581.7165711.4 05-06 Data 39210.5739174.2954696.454299.54 3085-2224870 36-15522399
78
78 Analyzing Instructional Effectiveness Feng, M., Heffernan, N., & Beck, J. (2009). Using learning decomposition to analyze instructional effectiveness in the ASSISTment system. In Dimitrova, Mizoguchi, du Boulay, & Graesser (Eds), Proceedings of the 14th International Conference on Artificial Intelligence in Education (AIED-2009). Amsterdam, Netherlands: IOS Press. Brighton, UK. Prior encounters 1 0 0 1 Correct ? 1 1 1 0 t1t1 011Tom 010 000 000 t4t4 t3t3 t2t2 ItemStudent Detect relative instructional effectiveness among items in the same GLOP using learning decomposition.
79
79 Searching Results Among 38 GLOPs, LFA found significant better models for 12 Shall I be happy? “Sanity” check: random assigned factor tables #items in GLOP (#GLOPs) Learning- suggested factors Random factor table 2 (11)55 3 (5) 4 (7)31 5-11 (15)4 (5, 6, 8, 9)1 (5) Further works need to be done Quantitatively measure whether and how data analysis results can be helpful for subject-matter experts Explore the automatic factor assigning approach on more data for other systems Contrast with human experts as controlled condition
80
80 Guess which item is the most difficult one? Log likelihood -532.6-524 Bayesian Information Criterion 1,079.21,065.99 Num of skills12 Num of parameters24 Coefficients1.099, 0.137 1.841, 0.100; -0.927, 0.055 Item ID Square- root Factor- High 89410 4111 467311 11711
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.