Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d.

Slides:



Advertisements
Similar presentations
Performance Assessment
Advertisements

Intro to EDM Why EDM now? Which tools to use in class Week 1, video 1.
Iain Weir, Rhys Gwynllyw & Karen Henderson CETL-MSOR 2014
Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.
Educational Data Mining Overview Ryan S.J.d. Baker PSLC Summer School 2012.
Educational Data Mining Overview Ryan S.J.d. Baker PSLC Summer School 2010.
Bridgette Parsons Megan Tarter Eva Millan, Tomasz Loboda, Jose Luis Perez-de-la-Cruz Bayesian Networks for Student Model Engineering.
Educational data mining overview & Introduction to Exploratory Data Analysis Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction.
Knowledge Inference: Advanced BKT Week 4 Video 5.
Improving learning by improving the cognitive model: A data- driven approach Cen, H., Koedinger, K., Junker, B. Learning Factors Analysis - A General Method.
Discovery with Models Week 8 Video 1. Discovery with Models: The Big Idea  A model of a phenomenon is developed  Via  Prediction  Clustering  Knowledge.
Supporting (aspects of) self- directed learning with Cognitive Tutors Ken Koedinger CMU Director of Pittsburgh Science of Learning Center Human-Computer.
Computer Science Department Jeff Johns Autonomous Learning Laboratory A Dynamic Mixture Model to Detect Student Motivation and Proficiency Beverly Woolf.
Planning for Inquiry The Learning Cycle. What do I want the students to know and understand? Take a few minutes to observe the system to be studied. What.
Neag School of Education Using Social Cognitive Theory to Predict Students’ Use of Self-Regulated Learning Strategies in Online Courses Anthony R. Artino,
1 User Centered Design and Evaluation. 2 Overview Why involve users at all? What is a user-centered approach? Evaluation strategies Examples from “Snap-Together.
Conclusion Our prediction model did a good job at predict 8 th grade math proficiency. It can be used to estimate 10 th grade score fairly well, too. But.
Searching for Patterns: Sean Early PSLC Summer School 2007 Question: Which is a better predictor of performance in a cognitive tutor, error rate or assistance.
Circle Empirical Methods for Dialogs, June Some Goals for Evaluating Dialogue Systems Kenneth R. Koedinger Human-Computer Interaction Carnegie Mellon.
1 User Centered Design and Evaluation. 2 Overview My evaluation experience Why involve users at all? What is a user-centered approach? Evaluation strategies.
+ Doing More with Less : Student Modeling and Performance Prediction with Reduced Content Models Yun Huang, University of Pittsburgh Yanbo Xu, Carnegie.
Educational Data Mining Overview John Stamper PSLC Summer School /25/2011 1PSLC Summer School 2011.
Educational data mining overview & Introduction to Exploratory Data Analysis with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer.
DataShop: An Educational Data Mining Platform for the Learning Science Community John Stamper Pittsburgh Science of Learning Center Human-Computer Interaction.
Educational Data Mining and DataShop John Stamper Carnegie Mellon University 1 9/12/2012 PSLC Corporate Partner Meeting 2012.
Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
COPYRIGHT WESTED, 2010 Calipers II: Using Simulations to Assess Complex Science Learning Diagnostic Assessments Panel DRK-12 PI Meeting - Dec 1–3, 2010.
Click to edit Master title style  Click to edit Master text styles  Second level  Third level  Fourth level  Fifth level  Click to edit Master text.
John Stamper Human-Computer Interaction Institute Carnegie Mellon University Technical Director Pittsburgh Science of Learning Center DataShop.
PSLC DataShop Introduction Slides current to DataShop version John Stamper DataShop Technical Director.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Case Study – San Pedro Week 1, Video 6. Case Study of Classification  San Pedro, M.O.Z., Baker, R.S.J.d., Bowers, A.J., Heffernan, N.T. (2013) Predicting.
A Framework for Inquiry-Based Instruction through
PSLC DataShop Introduction Slides current to DataShop version John Stamper DataShop Technical Director.
TagHelper and InfoMagnets Technologies for Exploring the effect of Language Interactions in Learning Carolyn Penstein Rosé, Jaime Arguello, Yue Cui, Rohit.
DataShop v7.1 Release Event Friday, November 1, 2013 LearnLabdatashop.org LearnLab
PSLC DataShop Introduction Slides current to DataShop version John Stamper DataShop Technical Director.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 2, 2012.
Rational/Theoretical Cognitive Task Analysis Ken Koedinger Key reading: Zhu, X., & Simon, H. A. (1987). Learning mathematics from examples and by doing.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 4, 2013.
To return to the chapter summary click Escape or close this document. Chapter Resources Click on one of the following icons to go to that resource. Image.
Evidence-based Practice Chapter 3 Ken Koedinger Based on slides from Ruth Clark 1.
Noboru Matsuda Human-Computer Interaction Institute
Educational Data Mining: Discovery with Models Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Ken Koedinger CMU Director of PSLC Professor of Human-Computer.
Online Reporting System. Understand the role and purpose of the Performance Reports in supporting student success and achievement. Understand changes.
DataShop Import Workshop Tuesday, June 14, 2011 pslcdatashop.org PSLC
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Applying the Redundancy Principle ( Chapter 7) And using e-learning data for CTA Ken Koedinger 1.
Getting from Discussion to Writing--with Maps 44th Annual Conference of the International Visual Literacy Association, October 11, Lenny.
Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Using DataShop Tools to Model Students Learning Statistics Marsha C. Lovett Eberly Center & Psychology Acknowledgements to: Judy Brooks, Ken Koedinger,
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 6, 2012.
The research process Psych 231: Research Methods in Psychology.
July 8, 2008In vivo experimentation: 1 Step by Step In Vivo Experimentation Lecture 3 for the IV track of the 2011 PSLC Summer School Philip Pavlik Jr.
Data-Driven Education
Strategies For Teaching and Grading Using Excel
Michael V. Yudelson Carnegie Mellon University
Big Data, Education, and Society
CSc4730/6730 Scientific Visualization
Introduction to PSLC DataShop
Addressing the Assessing Challenge with the ASSISTment System
Click on one of the following icons to go to that resource.
The Behavior of Tutoring Systems
Neil T. Heffernan, Joseph E. Beck & Kenneth R. Koedinger
Philip Pavlik Jr. University of Memphis
Presentation transcript:

Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University

Overview Motivation for educational data mining DataShop Learning curves to improve cognitive models Past project example Conclusion Next

What is educational data mining? “The area of scientific inquiry centered around the development of methods for making discoveries within the unique kinds of data that come from educational settings, and using those methods to better understand students and the settings which they learn in.” (Baker, under review)

What is educational data mining? More informally: using “large” data sets to answer educational and psychological questions  What “large” means is always changing Developing methods or algorithms to aid in discovery

What is educational data mining? One popular data source is “instrumented” computer tutors  Fine grained, longitudinal, often across contexts Other data sources  Records of online courses (e.g. WebCAT)  District or university-level student records Example:

Educational Data Mining is a hot topic! 2008: First International Conference on Educational Data Mining 2008: Launch of Journal of Educational Data Mining 2009: Second International Conference on Educational Data Mining  Submissions due in March

Data Mining Questions & Methods How can we reliably model student knowledge or achievement?  Bayesian Knowledge Tracing Simple type of “Bayes Net”, getting less simple all the time  Item Response Theory (IRT) Basis for standardized tests, SAT, GRE, TIMSS… Version of “logistic regression” Many variations & generalizations …  See slides of Brian Junker’s EDM08 invited talk

Data Mining Questions & Methods What’s the nature of knowledge students are learning? How can we discover cognitive models of student learning?  Learning Factors Analysis (LFA) Extends IRT to account for learning Search algorithm: Discover cognitive model(s) that capture how student learning transfers over tasks over time  Rule space, knowledge space, …

Data Mining Questions & Methods How can we model students, beyond just what they know?  Models of Choices: Metacognitive & Motivational  Help-seeking  Gaming the System  Off-Task Behavior  Self-explanation Affect Involves prediction methods such as classification, regression (not just linear regression)

Data Mining Questions & Methods What features of a tutor lead to the most learning?  Learning Decomposition Explores different rates of learning due to different forms of pedagogical support Close relative of Learning Factors Analysis

Data Mining Questions & Methods How to extract reliable inferences about causal mechanisms from correlations in data?  Causal modeling using Tetrad

Data Mining Questions & Methods And one generally useful tool for figuring out what’s going on, in any of these cases: Exploratory data analysis Summary & visualization tools in DataShop Tools in Excel Clustering algorithms Visualization packages

Overview Motivation for educational data mining DataShop Learning curves to improve cognitive models Past project example Conclusion Next

Find DataShop at learnlab.org/datashop

Video Intro of DataShop … View here:

Public datasets that you can view only. Private datasets you can’t view. us and the PI to get access. Datasets you can view or edit. You have to be a project member or PI for the dataset to appear here. DataShop – Dataset Tabs

Analysis Tools Dataset Info Performance Profiler Learning Curve Error Report Export Sample Selector

Meta data for given dataset PI’s get ‘edit’ privileges, others must request it Meta data for given dataset PI’s get ‘edit’ privileges, others must request it 18 Papers and Files storage Dataset Metrics Problem Breakdown table Dataset Info

Performance Profiler Aggregate by Step Problem KC Dataset Level Aggregate by Step Problem KC Dataset Level View measures of Error Rate Assistance Score Avg # Hints Avg # Incorrect Residual Error Rate View measures of Error Rate Assistance Score Avg # Hints Avg # Incorrect Residual Error Rate Multipurpose tool to help identify areas that are too hard or easy

View by KC or Student, Assistance Score or Error Rate Time is represented on the x- axis as ‘opportunity’, or the # of times a student (or students) had an opportunity to demonstrate a KC Visualizes changes in student performance over time Learning Curve

Provides a breakdown of problem information (by step) for fine- grained analysis of problem-solving behavior Attempts are categorized by student Provides a breakdown of problem information (by step) for fine- grained analysis of problem-solving behavior Attempts are categorized by student View by Problem or KC Error Report

Export Two types of export available By Transaction By Step Anonymous, tab-delimited file Easy to import into Excel! You can also export the Problem Breakdown table and LFA values!

Sample Selector Filter by Condition Dataset Level Problem School Student Tutor Transaction Filter by Condition Dataset Level Problem School Student Tutor Transaction Easily create a sample/filter to view a smaller subset of data Shared (only owner can edit) and private samples

Help/Documentation Extensive documentation with examples Contextual by tool/report Extensive documentation with examples Contextual by tool/report Glossary of common terms, tied in with PSLC Theory wiki

New Features Manage Knowledge Component models  Create, Modify & Delete KC models within DataShop Addition of Latency Curves to Learning Curve Reporting  Time to Correct  Assistance Time Problem Rollup & Export Enhanced Contextual Help

Overview Motivation for educational data mining DataShop Learning curves to improve cognitive models Past project example Conclusion Next

Cognitive Modeling Challenge Premise: High quality instructional design requires a high quality cognitive model of student thinking Problem: Creating such a Cognitive Model is hard to get right  Hard to program, but more importantly …  A high quality cognitive model requires a deep understanding of student thinking  Cognitive models created by intuition are often wrong (e.g., Koedinger & Nathan, 2004)

Significance of improving a cognitive model A better cognitive model means better:  Assessment  Instructional feedback & hints (model tracing)  Activity selection & pacing (knowledge tracing) Better cognitive models advance basic cognitive science

Using student data to build better cognitive models Cognitive Task Analysis methods  Think alouds, Difficulty Factors Assessment General lecture Tuesday  Peer collaboration dialog analysis TagHelper track  Data mining of student interactions with on-line tutors DataShop track

Knowledge components are the “germ theory” of transfer Germs are hidden elements that carry disease from one agent to another Knowledge components are hidden elements that carry learning experiences from one situation to another -- they account for transfer

DataShop Supports Theory Integration Makes micro theory concrete Knowledge decomposability hypothesis  Acquisition of academic competencies can be decomposed into units, called knowledge components, that yield predictions about student task performance & the transfer of learning. Not obviously true  “learning, cognition, knowing, and context are irreducibly co-constituted and cannot be treated as isolated entities or processes” (Barab & Squire, 2004)

Learning curves show performance changes over time Learning curves:  Student data  Statistical model fit (blue line) Based on micro level analysis:  learning event opportunities  Averaged across knowledge components

Not a smooth learning curve -> this knowledge component model is wrong. Does not capture genuine student difficulties.

This more specific knowledge component (KC) model (2 KCs) is also wrong -- still no smooth drop in error rate.

Ah! Now we get smoother learning curve. A more specific decomposition (12 KCs) better tracks nature of student difficulties & transfer from one problem situation to another (Rise near end due to fewer observations biased toward poorer students)

Summary: KC model as “germ theory” Without decomposition, using just a single “Geometry” KC, no smooth learning curve. But with decomposition, 12 KCs for area concepts, a smooth learning curve. Upshot: A decomposed KC model fits learning & transfer data better than a “faculty theory” of mind

Overview Motivation for educational data mining DataShop Learning curves to improve cognitive models Past project example Conclusion Next

Past Project Example Rafferty (Stanford) & Yudelson (Pitt) Analyzed a data set from Geometry Applied Learning Factors Analysis (LFA) Driving questions:  Are students learning at the same rate as assumed in prior LFA models?  Do we need different cognitive models (KC models) to account for low-achieving vs. high- achieving students?

A Statistical Model for Learning Curves Predicts whether student is correct depending on knowledge & practice Additive Factor Model (Draney, et al. 1995, Cen, Koedinger, Junker, 2006) Learning rate is different for different skills, but not for different students

Low-Start High-Learn (LSHL) group has a faster learning rate than other groups of students

Rafferty & Yudelson Results 2 Is it “faster” learning or “different” learning?  Fit with a more compact model is better for low start high learn  Students with an apparent faster learning rate are learning a more “compact”, general and transferable domain model Resulted in best Young Researcher Track paper at AIED07

Overview Motivation for educational data mining DataShop Learning curves to improve cognitive models Past project example Conclusion Next

Lots of interesting questions to be addressed with Ed Data Mining!! Assessment questions  Can on-line embedded assessment replace standardized tests?  Can assessment be accurate if students are learning during test? Learning theory questions  What are the “elements of transfer” in human learning?  Is learning rate driven by student variability or content variability?  Can conceptual change be tracked & better understood? Instructional questions  What instructional moves yield the greatest increases in learning?  Can we replace ANOVA with learning curve comparison to better evaluate learning experiments? Metacogniton & motivation questions  Can student affect & motivation be detected in on-line click stream data?  Can student metacognitive & self-regulated learning strategies be detected in on-line click stream data?

Data Mining-Data Shop Offerings Data Mining Track: Tues 9:15 Using DataShop for Exploratory Data Analysis Tues 1:30 Learning from learning curves Item Response Theory Learning Factors Analysis Wed 9:30 Discovery with Models General lecture: Tues 3:30 Educational Data Mining Bayesian models of knowledge tracing Causal models with Tetrad

Questions?

Extra slides …

Sample tutor interactions (from 1997 version) that generated Geometry Area data set used in example of learning curves …

TWO_CIRCLES_IN_SQUARE problem: Initial screen

TWO_CIRCLES_IN_SQUARE problem: An error a few steps later

TWO_CIRCLES_IN_SQUARE problem: Student follows hint & completes prob

Learning curve constrast in Physics dataset …

Not a smooth learning curve -> this knowledge component model is wrong. Does not capture genuine student difficulties.

More detailed cognitive model yields smoother learning curve. Better tracks nature of student difficulties & transfer (Few observations after 10 opportunities yields noisy data)

Best BIC (parsimonious fit) for Default (original) KC model Better than simpler Single-KC model And better than more complex Unique-step (IRT) model