A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science.

Slides:

Advertisements

Similar presentations

Random Forest Predrag Radenković 3237/10

Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.

Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수

Iowa State University Department of Computer Science, Iowa State University Artificial Intelligence Research Laboratory Center for Computational Intelligence,

10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :

O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik.

A C OMPUTATIONAL M ODEL OF A CCELERATED F UTURE L EARNING THROUGH F EATURE R ECOGNITION Nan Li Computer Science Department Carnegie Mellon University Building.

H IDDEN C ONCEPT D ETECTION IN G RAPH - B ASED R ANKING A LGORITHM FOR P ERSONALIZED R ECOMMENDATION Nan Li Computer Science Department Carnegie Mellon.

x – independent variable (input)

Decision Tree Rong Jin. Determine Milage Per Gallon.

Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.

An Introduction to Machine Learning In the area of AI (earlier) machine learning took a back seat to Expert Systems Expert system development usually consists.

Grammar induction by Bayesian model averaging Guy Lebanon LARG meeting May 2001 Based on Andreas Stolcke’s thesis UC Berkeley 1994.

Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.

+ Doing More with Less : Student Modeling and Performance Prediction with Reduced Content Models Yun Huang, University of Pittsburgh Yanbo Xu, Carnegie.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Classification and Prediction: Regression Analysis

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Alert Correlation for Extracting Attack Strategies Authors: B. Zhu and A. A. Ghorbani Source: IJNS review paper Reporter: Chun-Ta Li ( 李俊達 )

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.

 Knowledge Acquisition  Machine Learning. The transfer and transformation of potential problem solving expertise from some knowledge source to a program.

Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.

A N A RCHITECTURE AND A LGORITHMS FOR M ULTI -R UN C LUSTERING Rachsuda Jiamthapthaksin, Christoph F. Eick and Vadeerat Rinsurongkawong Computer Science.

Compiler course 1. Introduction. Outline Scope of the course Disciplines involved in it Abstract view for a compiler Front-end and back-end tasks Modules.

Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,

Simulated Student: Building Cognitive Model by Demonstration Noboru Matsuda School of Computer Science Carnegie Mellon University.

Learning from Observations Chapter 18 Through

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,

Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

Noboru Matsuda Human-Computer Interaction Institute

L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing.

G ENETIC P ROGRAMMING Ranga Rodrigo March 17,

LEARNING USER PLAN PREFERENCES OBFUSCATED BY FEASIBILITY CONSTRAINTS Nan Li, William Cushing, Subbarao Kambhampati, and Sungwook Yoon School of Computing.

Automatic Generation of Programming Feedback: A Data-Driven Approach Kelly Rivers and Ken Koedinger 1.

SimStudent: A computational model of learning for Intelligent Authoring and beyond Noboru Matsuda Human-Computer Interaction Institute Carnegie Mellon.

1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.

1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.

CS Inductive Bias1 Inductive Bias: How to generalize on novel data.

Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science.

SimStudent: Building a Cognitive Tutor by Teaching a Simulated Student Noboru Matsuda Human-Computer Interaction Institute Carnegie Mellon University.

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Data Mining and Decision Support

Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University.

RULES Patty Nordstrom Hien Nguyen. "Cognitive Skills are Realized by Production Rules"

Stochastic Grammars: Overview Representation: Stochastic grammar Representation: Stochastic grammar Terminals: object interactions Terminals: object interactions.

Of An Expert System.  Introduction  What is AI?  Intelligent in Human & Machine? What is Expert System? How are Expert System used? Elements of ES.

1Computer Sciences Department. 2 Advanced Design and Analysis Techniques TUTORIAL 7.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 6, 2012.

 Knowledge Acquisition  Machine Learning. The transfer and transformation of potential problem solving expertise from some knowledge source to a program.

1.2 Linear Equations and Rational Equations. Terms Involving Equations 3x - 1 = 2 An equation consists of two algebraic expressions joined by an equal.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

Machine Learning with Spark MLlib

Chapter 7. Classification and Prediction

CS 9633 Machine Learning Inductive-Analytical Methods

Machine Learning Basics

Issues in Decision-Tree Learning Avoiding overfitting through pruning

Vincent Aleven & Kirsten Butcher

Simulated Student: Building Cognitive Model by Demonstration

Data Science in Industry

Julie Booth, Robert Siegler, Ken Koedinger & Bethany Rittle-Johnson

Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,

Presentation transcript:

A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science Department Carnegie Mellon University

2 S TUDENT M ODEL A set of knowledge components (KCs) Encoded in intelligent tutors to model how students solve problems Example: What to do next on problems like 3x=12 A key factor behind instructional decisions in automated tutoring systems

3 S TUDENT M ODEL C ONSTRUCTION Traditional Methods Structured interviews Think-aloud protocols Rational analysis Previous Automated Methods Learning factor analysis (LFA) Proposed Approach Use a machine-learning agent, SimStudent, to acquire knowledge 1 production rule acquired => 1 KC in student model (Q matrix) Require expert input. Highly subjective. Require expert input. Highly subjective. Within the search space of human- provided factors. Independent of human- provided factors.

4 A B RIEF R EVIEW OF S IM S TUDENT A machine-learning agent that acquires production rules from examples & problem solving experience given a set of feature predicates & functions

5 P RODUCTION R ULES Skill divide (e.g. -3x = 6) What: Left side (-3x) Right side (6) When: Left side (-3x) does not have constant term => How: Get-coefficient (-3) of left side (-3x) Divide both sides with the coefficient Each production rule is associated with one KC Each step (-3x = 6) is labeled with one KC, decided by the production applied to that step Original model required strong domain-specific operators, like Get- coefficient  Does not differentiate important distinctions in learning (e.g., -x=3 vs -3x = 6)

6 D EEP F EATURE L EARNING Expert vs Novice (Chi et al., 1981) Example: What’s the coefficient of -3x? Expert uses deep functional features to reply -3 Novice may use shallow perceptual features to reply 3 Model deep feature learning using machine learning techniques Integrate acquired knowledge into SimStudent learning Remove dependence on strong operators & split KCs into finer grain sizes

7 F EATURE R ECOGNITION AS PCFG I NDUCTION Underlying structure in the problem  Grammar Feature  Non-terminal symbol in a grammar rule Feature learning task  Grammar induction Student errors  Incorrect parsing

8 L EARNING P ROBLEM Input is a set of feature recognition records consisting of An original problem (e.g. -3x) The feature to be recognized (e.g. -3 in -3x) Output A probabilistic context free grammar (PCFG) A non-terminal symbol in a grammar rule that represents target feature

9 A T WO -S TEP PCFG L EARNING A LGORITHM Greedy Structure Hypothesizer: Hypothesizes grammar rules in a bottom-up fashion Creates non-terminal symbols for frequently occurred sequences E.g. – and 3, SignedNumber and Variable Viterbi Training Phase: Refines rule probabilities Occur more frequently  Higher probabilities Generalizes Inside-Outside Algorithm (Lary & Young, 1990)

10 E XAMPLE OF P RODUCTION R ULES B EFORE AND A FTER INTEGRATION Extend the “What” Part in Production Rule Original: Skill divide (e.g. -3x = 6) What: Left side (-3x) Right side (6) When: Left side (-3x) does not have constant term => How: Get coefficient (-3) of left side (-3x) Divide both sides with the coefficient (-3) Extended: Skill divide (e.g. -3x = 6) What: Left side ( -3, -3x) Right side (6) When: Left side (-3x) does not have constant term => How: Get coefficient (-3) of left side (-3x) Divide both sides with the coefficient ( -3 ) Fewer operators Eliminate need for domain-specific operators Fewer operators Eliminate need for domain-specific operators

11 Original: Skill divide (e.g. -3x = 6) What: Left side (-3x) Right side (6) When: Left side (-3x) does not have constant term => How: Get coefficient (-3) of left side (-3x) Divide both sides with the coefficient (-3)

12 E XPERIMENT M ETHOD SimStudent vs. Human-generated model Code real student data 71 students used a Carnegie Learning Algebra I Tutor on equation solving SimStudent: Tutored by a Carnegie Learning Algebra I Tutor Coded each step by the applicable production rule Used human-generated coding in case of no applicable production Human-generated model: Coded manually based on expertise

13 H UMAN - GENERATED VS S IM S TUDENT KC S Human- generated Model SimStude nt Comment Total # of KCs1221 # of Basic Arithmetic Operation KCs 413Split into finer grain sizes based on different problem forms # of Typein KCs44Approximately the same # of Other Transformation Operation KCs (e.g. combine like terms) 44Approximately the same

14 H OW WELL TWO MODELS FIT WITH REAL STUDENT DATA Used Additive Factor Model (AFM) An instance of logistic regression that Uses each student, each KC and KC by opportunity interaction as independent variables To predict probabilities of a student making an error on a specific step

divide simSt-divide simSt-divide A N E XAMPLE OF S PLIT IN D IVISION Human-generated Model divide: Ax=B & -x=A SimStudent simSt-divide: Ax=B simSt-divide-1: -x=A Ax=B -x=A

16 P RODUCTION R ULES FOR D IVISION Skill simSt-divide (e.g. - 3x = 6) What: Left side (-3, -3x) Right side (6) When: Left side (-3x) does not have constant term How: Divide both sides with the coefficient (-3) Skill simSt-divide-1 (e.g. -x = 3) What: Left side (-x) Right side (3) When: Left side (-x) is of the form -v How: Generate one (1) Divide both sides with -1

17 A N E XAMPLE WITHOUT S PIT IN D IVIDE T YPEIN Human- generated Model divide-typein SimStudent simSt-divide- typein divide-typein simSt-divide- typin

18 S IM S TUDENT VS S IM S TUDENT + F EATURE L EARNING SimStudent Needs strong operators Constructs student models similar to human-generated model Extended SimStudent Only requires weak operators Split KCs into finer grain sizes based on different parse trees Does Extended SimStudent produce a KC model that better fits student learning data?

19 R ESULTS Human-generated Model SimStudent AIC Fold Cross Validation RMSE Significance Test SimStudent outperforms the human-generated model in 4260 out of 6494 steps p < SimStudent outperforms the human-generated model across 20 runs of cross validation p < 0.001

20 S UMMARY Presented an innovative application of a machine-learning agent, SimStudent, for an automatic discovery of student models. Showed that a SimStudent generated student model was a better predictor of real student learning behavior than a human-generate model.

21 F UTURE S TUDIES Test generality in other datasets in DataShop Apply this proposed approach in other domains Stoichiometry Fraction addition

22

23 A N E XAMPLE IN A LGEBRA

24 F EATURE R ECOGNITION AS PCFG I NDUCTION Underlying structure in the problem  Grammar Feature  Non-terminal symbol in a grammar rule Feature learning task  Grammar induction Student errors  Incorrect parsing

25 L EARNING P ROBLEM Input is a set of feature recognition records consisting of An original problem (e.g. -3x) The feature to be recognized (e.g. -3 in -3x) Output A probabilistic context free grammar (PCFG) A non-terminal symbol in a grammar rule that represents target feature

26 A C OMPUTATIONAL M ODEL OF D EEP F EATURE L EARNING Extended a PCFG Learning Algorithm (Li et al., 2009) Feature Learning Stronger Prior Knowledge: Transfer Learning Using Prior Knowledge

27 A T WO -S TEP PCFG L EARNING A LGORITHM Greedy Structure Hypothesizer: Hypothesizes grammar rules in a bottom-up fashion Creates non-terminal symbols for frequently occurred sequences E.g. – and 3, SignedNumber and Variable Viterbi Training Phase: Refines rule probabilities Occur more frequently  Higher probabilities Generalizes Inside-Outside Algorithm (Lary & Young, 1990)

28 F EATURE L EARNING Build most probable parse trees For all observation sequences Select a non-terminal symbol that Matches the most training records as the target feature

29 T RANSFER L EARNING U SING P RIOR K NOWLEDGE GSH Phase: Build parse trees based on some previously acquired grammar rules Then call the original GSH Viterbi Training: Add rule frequency in previous task to the current task