Download presentation
Presentation is loading. Please wait.
Published byClifton Long Modified over 9 years ago
1
Educational data mining overview & Introduction to Exploratory Data Analysis with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University
2
Overview DataShop Overview Logging model DataShop Features Quantitative models of learning curves Power law, logistic regression Contrasting KC models Exploratory Data Analysis Exercise (start) Knowledge Component Model Editing
3
Logging & Storage Models Education technologies are “instrumented” to produce log data We encourage a standard log format XML format generalized from Ritter & Koedinger (1995) Also convert log data from other formats
4
Relational Database -- complex!
5
Example activity generating “click stream” data Geometry Cognitive Tutor: “Making Cans” problem Find the area of scrap metal left over after removing a circular area (the end of a can) from a metal square. Student enters values in worksheet Tutor provides feedback & instruction Records student’s actions & tutor responses Logs stored in files on school server or database at Carnegie Learning Later imported into DataShop
6
DataShop logging model Main constructs: Context message: the student, problem, and session with the tutor Tool message: represents an action in the tool performed by a student or tutor Tutor message: represents a tutor’s response to a student action
7
DataShop XML format: Context message Geometry Hampton 2005-2006 PACT-AREA PACT-AREA-6 MAKING-CANS Dataset name Course unit Course section Problem
8
DataShop XML format: Tool & Tutor Messages (POG-AREA QUESTION2) INPUT-CELL-VALUE 200.96 … [as above] … CORRECT
9
Example Stored Transactions Student interactions (or transactions) are stored in a relational database, can be exported as table Example: Student S01 on Making-Cans problem
10
Transactions Info for each transaction student(s), session, time, problem, problem step, attempt number, student action tutor response, number of hints, knowledge component code Logging of on-line tools (e.g., a virtual lab) does not include tutor response
11
Step & Transaction Definitions A problem-solving activity typically involves many tool & tutor messages. “Steps” represent completion of possible subgoals or pieces of a problem solution “Transactions” are attempts at a step or requests for instructional help
12
Example: data aggregated by student-step
13
Overview DataShop Overview Logging model DataShop Features Quantitative models of learning curves Power law, logistic regression Contrasting KC models Exploratory Data Analysis Exercise (start) Knowledge Component Model Editing
14
DataShop Analysis Tools Dataset Info Performance Profiler Learning Curve Error Report Export Sample Selector
15
Meta data for given dataset PI’s get ‘edit’ privileges, others must request it Meta data for given dataset PI’s get ‘edit’ privileges, others must request it 15 Papers and Files storage Dataset Metrics Problem Breakdown table Dataset Info
16
Performance Profiler Aggregate by Step Problem KC Dataset Level Aggregate by Step Problem KC Dataset Level View measures of Error Rate Assistance Score Avg # Hints Avg # Incorrect Residual Error Rate View measures of Error Rate Assistance Score Avg # Hints Avg # Incorrect Residual Error Rate Multipurpose tool to help identify areas that are too hard or easy
17
View by KC or Student, Assistance Score or Error Rate Time is represented on the x- axis as ‘opportunity’, or the # of times a student (or students) had an opportunity to demonstrate a KC Visualizes changes in student performance over time Learning Curve
18
Provides a breakdown of problem information (by step) for fine- grained analysis of problem-solving behavior Attempts are categorized by student Provides a breakdown of problem information (by step) for fine- grained analysis of problem-solving behavior Attempts are categorized by student View by Problem or KC Error Report
19
Sample Selector Filter by Condition Dataset Level Problem School Student Tutor Transaction Filter by Condition Dataset Level Problem School Student Tutor Transaction Easily create a sample/filter to view a smaller subset of data Shared (only owner can edit) and private samples
20
Export Two types of export available By Transaction By Step Anonymous, tab-delimited file Easy to import into Excel! You can also export the Problem Breakdown table and LFA values!
21
Help/Documentation Extensive documentation with examples Contextual by tool/report http://learnlab.web.cmu.edu/datashop/help Extensive documentation with examples Contextual by tool/report http://learnlab.web.cmu.edu/datashop/help Glossary of common terms, tied in with PSLC Theory wiki
22
New Features Manage Knowledge Component models Create, Modify & Delete KC models within DataShop Addition of Latency Curves to Learning Curve Reporting Time to Correct Assistance Time Problem Rollup & Export Enhanced Contextual Help
23
Overview DataShop Overview Logging model DataShop Features Quantitative models of learning curves Power law, logistic regression Contrasting KC models Exploratory Data Analysis Exercise (start) Knowledge Component Model Editing
24
Recall learning curve story Without decomposition, using just a single “Geometry” KC, no smooth learning curve. But with decomposition, 12 KCs for area concepts, a smooth learning curve. Upshot: A decomposed KC model fits learning & transfer data better than a “faculty theory” of mind
25
Learning curve analysis The Power Law of Learning (Newell & Rosenbloom, 1993) Y = a X b Y – error rate X – opportunities to practice a skill a – error rate on 1st opportunity b – learning rate After the log transformation “a” is the “intercept” or starting point of the learning curve “b” is the “slope” or steepness of the learning curve
26
More sophisticated learning curve model Generalized Power Law to fit learning curves Logistic regression (Draney, Wilson, Pirolli, 1995) Assumptions Different students may initially know more or less => use an intercept parameter for each student Students learn at the same rate => no slope parameters for each student Some productions may be more known than others => use an intercept parameter for each production Some productions are easier to learn than others => use a slope parameter for each production These assumptions are reflected in detailed math model …
27
More sophisticated learning curve model Probability of getting a step correct (p) is proportional to: - if student i performed this step = X i, add overall “smarts” of that student = i - if skill j is needed for this step = Y j, add easiness of that skill = j add product of number of opportunities to learn = T j & amount gained for each opportunity = j p Use logistic regression because response is discrete (correct or not) Probability (p) is transformed by “log odds” “stretched out” with “s curve” to not bump up against 0 or 1 (Related to “Item Response Theory”, behind standardized tests …)
28
Different representation, same model Predicts whether student is correct depending on knowledge & practice Additive Factor Model (Draney, et al. 1995, Cen, Koedinger, Junker, 2006)
29
The Q Matrix How to represent relationship between knowledge components and student tasks? Tasks also called items, questions, problems, or steps (in problems) Q-Matrix (Tatsuoka. 1983) 2* 8 is a single-KC item 2*8 – 3 is a conjunctive-KC item, involves two KCs 29 Item | KCAddSubMulDiv 2*80010 2*8 - 30110
30
30 Model Evaluation How to compare cognitive models? A good model minimizes prediction risk by balancing fit with data & complexity (Wasserman 2005) Compare BIC for the cognitive models BIC is “Bayesian Information Criteria” BIC = -2*log-likelihood + numPar * log(numOb) Better (lower) BIC == better predict data that haven’t seen Mimics cross validation, but is faster to compute
31
31 Model TitleLLBICnumPar G -2,1754,56626 Original-1,9114,27154 Item-1,7205,554254 Data: the Geometry Area Unit 24 students, 230 items, 15 KCs
32
Learning curve constrast in Physics dataset …
33
Not a smooth learning curve -> this knowledge component model is wrong. Does not capture genuine student difficulties.
34
More detailed cognitive model yields smoother learning curve. Better tracks nature of student difficulties & transfer (Few observations after 10 opportunities yields noisy data)
35
Best BIC (parsimonious fit) for Default (original) KC model Better than simpler Single-KC model And better than more complex Unique-step (IRT) model
36
Overview DataShop Overview Logging model DataShop Features Quantitative models of learning curves Power law, logistic regression Contrasting KC models Exploratory Data Analysis Exercise (start) Knowledge Component Model Editing
37
Exploratory Data Analysis Exercise Goals: 1) Get familiar with data 2) Learn/practice Excel skills Tasks: 1) create a “step table” 2) graph learning curves
38
TWO_CIRCLES_IN_SQUARE problem: Initial screen
39
TWO_CIRCLES_IN_SQUARE problem: An error a few steps later
40
TWO_CIRCLES_IN_SQUARE problem: Student follows hint & completes prob
41
Exported File Loaded into Excel
42
See handout of exercise … Do some of in next session
43
Overview DataShop Overview Logging model DataShop Features Quantitative models of learning curves Power law, logistic regression Contrasting KC models Exploratory Data Analysis Exercise (start) Knowledge Component Model Editing
44
DataShop Demo Examples of exercise KC model editing
45
END
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.