Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
Effective Instructional Conversations + TuTalk Instruction Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Advertisements

Scientifically Informed Web- Based Instruction Financial and Intellectual Support: The William and Flora Hewlett Foundation Carnegie Mellon University.
Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.
Educational Data Mining Overview Ryan S.J.d. Baker PSLC Summer School 2012.
Educational Data Mining Overview Ryan S.J.d. Baker PSLC Summer School 2010.
Bridgette Parsons Megan Tarter Eva Millan, Tomasz Loboda, Jose Luis Perez-de-la-Cruz Bayesian Networks for Student Model Engineering.
What can CTAT do for you? Overview of the CTAT track Vincent Aleven, Bruce McLaren and the CTAT team 3rd Annual PSLC LearnLab Summer School Pittsburgh,
Educational data mining overview & Introduction to Exploratory Data Analysis Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction.
Knowledge Inference: Advanced BKT Week 4 Video 5.
1 LearnLab: Bridging the Gap Between Learning Science and Educational Practice Ken Koedinger Human-Computer Interaction & Psychology, CMU PI & CMU Director.
Improving learning by improving the cognitive model: A data- driven approach Cen, H., Koedinger, K., Junker, B. Learning Factors Analysis - A General Method.
Tradeoffs Between Immediate and Future Learning: Feedback in a Fraction Addition Tutor Eliane Stampfer EARLI SIG 6&7 September 13,
An Individualized Web-Based Algebra Tutor D.Sklavakis & I. Refanidis 1 An Individualized Web-Based Algebra Tutor Based on Dynamic Deep Model Tracing Dimitrios.
Supporting (aspects of) self- directed learning with Cognitive Tutors Ken Koedinger CMU Director of Pittsburgh Science of Learning Center Human-Computer.
Projects March 29, Project Requirements Think Aloud –At least two people OR Difficulty Factors Assessment –Ideally >25 (at least one class), but.
Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d.
Searching for Patterns: Sean Early PSLC Summer School 2007 Question: Which is a better predictor of performance in a cognitive tutor, error rate or assistance.
+ Doing More with Less : Student Modeling and Performance Prediction with Reduced Content Models Yun Huang, University of Pittsburgh Yanbo Xu, Carnegie.
Educational Data Mining Overview John Stamper PSLC Summer School /25/2011 1PSLC Summer School 2011.
1 Learning from Learning Curves: Item Response Theory & Learning Factors Analysis Ken Koedinger Human-Computer Interaction Institute Carnegie Mellon University.
Educational data mining overview & Introduction to Exploratory Data Analysis with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer.
DataShop: An Educational Data Mining Platform for the Learning Science Community John Stamper Pittsburgh Science of Learning Center Human-Computer Interaction.
Educational Data Mining and DataShop John Stamper Carnegie Mellon University 1 9/12/2012 PSLC Corporate Partner Meeting 2012.
Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer.
1 Studying and achieving robust learning with PSLC resources Ken Koedinger HCI & Psychology CMU Director of PSLC.
John Stamper Human-Computer Interaction Institute Carnegie Mellon University Technical Director Pittsburgh Science of Learning Center DataShop.
PSLC DataShop Introduction Slides current to DataShop version John Stamper DataShop Technical Director.
Adaptive Collaboration Support for the Web Amy Soller Institute for Defense Analyses, Alexandria, Virginia, U.S.A. Jonathan Grady October 12, 2005.
Introduction to the Cognitive Tutor Authoring Tools (CTAT) and Example-Tracing Tutors Bruce McLaren Systems Scientist, Co-Manager of the CTAT Project Team.
CSA3212: User Adaptive Systems Dr. Christopher Staff Department of Computer Science & AI University of Malta Lecture 9: Intelligent Tutoring Systems.
PSLC DataShop Introduction Slides current to DataShop version John Stamper DataShop Technical Director.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
TagHelper and InfoMagnets Technologies for Exploring the effect of Language Interactions in Learning Carolyn Penstein Rosé, Jaime Arguello, Yue Cui, Rohit.
Applying the Multimedia Principle: Use Words and Graphics Rather than Words Alone Chapter 4 Ken Koedinger 1.
Tuteurs cognitifs: La théorie ACT-R et les systèmes de production Roger Nkambou.
DataShop v7.1 Release Event Friday, November 1, 2013 LearnLabdatashop.org LearnLab
1 Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.
Rational/Theoretical Cognitive Task Analysis Ken Koedinger Key reading: Zhu, X., & Simon, H. A. (1987). Learning mathematics from examples and by doing.
The Design of a Collaborative Learning Environment in a Mobile Technology Supported Classroom, Concept of Fraction Equivalence Sui Cheung KONG Department.
Automated Assistant for Crisis Management Reflective Agent with Distributed Adaptive Reasoning RADAR.
Data Shop Introduction Ken Koedinger & Alida Skogsholm Human-Computer Interaction Institute Carnegie Mellon University.
Evidence-based Practice Chapter 3 Ken Koedinger Based on slides from Ruth Clark 1.
Noboru Matsuda Human-Computer Interaction Institute
Slide 1 Kirsten Butcher Elaborated Explanations for Visual/Verbal Problem Solving: Interactive Communication Cluster July 24, 2006.
Educational Data Mining: Discovery with Models Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Ken Koedinger CMU Director of PSLC Professor of Human-Computer.
1 USC Information Sciences Institute Yolanda GilFebruary 2001 Knowledge Acquisition as Tutorial Dialogue: Some Ideas Yolanda Gil.
DataShop Import Workshop Tuesday, June 14, 2011 pslcdatashop.org PSLC
SimStudent: A computational model of learning for Intelligent Authoring and beyond Noboru Matsuda Human-Computer Interaction Institute Carnegie Mellon.
711: Intelligent Tutoring Systems Week 1 – Introduction.
Applying the Redundancy Principle ( Chapter 7) And using e-learning data for CTA Ken Koedinger 1.
L&I SCI 110: Information science and information theory Instructor: Xiangming(Simon) Mu Sept. 9, 2004.
Computer-based Assessment Paul Horwitz The Concord Consortium Presentation to DR-K12 PI Meeting, November 10, 2009.
SimStudent: Building a Cognitive Tutor by Teaching a Simulated Student Noboru Matsuda Human-Computer Interaction Institute Carnegie Mellon University.
RULES Patty Nordstrom Hien Nguyen. "Cognitive Skills are Realized by Production Rules"
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Using DataShop Tools to Model Students Learning Statistics Marsha C. Lovett Eberly Center & Psychology Acknowledgements to: Judy Brooks, Ken Koedinger,
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 6, 2012.
July 8, 2008In vivo experimentation: 1 Step by Step In Vivo Experimentation Lecture 3 for the IV track of the 2011 PSLC Summer School Philip Pavlik Jr.
Nan Ding Adaptive Instructional System.
Score Improvement Distribution When Using Sketch Recognition Software (Mechanix) as a Tutor: Assessment of a High School Classroom Pilot Texas A&M University.
Data-Driven Education
Core Methods in Educational Data Mining
Michael V. Yudelson Carnegie Mellon University
TenMarks Outside the Classroom & Access for All
Big Data, Education, and Society
Introduction to PSLC DataShop
Score Improvement Distribution When Using Sketch Recognition Software (Mechanix) as a Tutor: Assessment of a High School Classroom Pilot Texas A&M University.
Research on Geoscience Learning
Julie Booth, Robert Siegler, Ken Koedinger & Bethany Rittle-Johnson
Philip Pavlik Jr. University of Memphis
Presentation transcript:

Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University

“Knowledge components are the germ of transfer” Goal of the week: What does Ken mean by this?

Overview Motivation for data mining  Better understanding of students => better instructional design Exploratory Data Analysis  Data Shop demo, Excel Learning curves & Learning Factors Analysis Example project from last summer

Data Mining Questions & Methods What is going on with student learning & performance?  Exploratory data analysis Summary & visualization tools in DataShop Tools in Excel: Auto filter, Pivot Tables, Solver How to reliably model student achievement?  Item Response Theory (IRT) Basis for standardized tests, SAT, GRE, TIMSS… Version of “logistic regression”

Data Mining Questions & Methods 2 What’s the nature of knowledge students are learning? How can we discover cognitive models of student learning that fit their learning curves?  Learning Factors Analysis (LFA) Extends IRT to account for learning Search algorithm: Discover cognitive model(s) that capture how student learning transfers over tasks over time What features of a tutor lead to the most learning?  Learning Decomposition Extends LFA to explore different rates of learning due to different forms of instruction How to extract reliable inferences about causal mechanisms from correlations in data?  Causal modeling using Tetrad

Overview Motivation for data mining  Better understanding of students => better instructional design Exploratory Data Analysis  Demo: DataShop, Excel Learning curves & Learning Factors Analysis Example project from last summer Next

Data Shop Demo …

Before going to DataShop, let’s look at a tutor (1997 version!) that generated the example data set we’ll look at

TWO_CIRCLES_IN_SQUARE problem: Initial screen

TWO_CIRCLES_IN_SQUARE problem: An error a few steps later

TWO_CIRCLES_IN_SQUARE problem: Student follows hint & completes prob

How to get to the DataShop: Go to & click …

PSLC’s DataShop Researchers get data access, visualizations, statistical tools Learning curves track student learning over time Discover what concepts & skills students need help with

PSLC’s DataShop Learning curves reveal over- and under-practiced knowledge components Rectangle-area has an initial low error rate, but is practiced often

Other DataShop Features Error Reports  Identify misconceptions by looking for common student errors  When do students ask for hints?  Are there alternative correct strategies? Performance Profiler Export Data  Get all or part of the data in tab-delimited file  Use your favorite analysis tools …

Exported File Loaded into Excel

Overview Motivation for data mining  Better understanding of students => better instructional design Exploratory Data Analysis  Data Shop demo, Excel Learning curves & Learning Factors Analysis Example project from last summer Next

3(2x - 5) = 9 6x - 15 = 92x - 5 = 36x - 5 = 9 Cognitive Model drives behavior of intelligent tutor systems … Cognitive Model: expert component of intelligent tutors that models how students solve problems If goal is solve a(bx+c) = d Then rewrite as abx + ac = d If goal is solve a(bx+c) = d Then rewrite as abx + c = d If goal is solve a(bx+c) = d Then rewrite as bx+c = d/a Model Tracing: Follows student through their individual approach to a problem -> context-sensitive instruction

3(2x - 5) = 9 6x - 15 = 92x - 5 = 36x - 5 = 9 Cognitive Model drives behavior of intelligent tutor systems … Cognitive Model: expert component of intelligent tutors that models how students solve problems If goal is solve a(bx+c) = d Then rewrite as abx + ac = d If goal is solve a(bx+c) = d Then rewrite as abx + c = d Model Tracing: Follows student through their individual approach to a problem -> context-sensitive instruction Hint message: “Distribute a across the parentheses.” Bug message: “You need to multiply c by a also.” Knowledge Tracing: Assesses student's knowledge growth -> individualized activity selection and pacing Known? = 85% chanceKnown? = 45%

Cognitive Modeling Challenge Problem: Intelligent Tutoring Systems depend on Cognitive Model, which is hard to get right  Hard to program, but more importantly …  A high quality cognitive model requires a deep understanding of student thinking  Cognitive models created by intuition are often wrong (e.g., Koedinger & Nathan, 2004)

Significance of improving a cognitive model A better cognitive model means:  better feedback & hints (model tracing)  better problem selection & pacing (knowledge tracing) Making cognitive models better advances basic cognitive science

How can we use student data to build better cognitive models? Cognitive Task Analysis methods  Think alouds, Difficulty Factors Assessment General lecture Tuesday  Peer collaboration dialog analysis TagHelper track  Newer: Data mining of student interactions with on-line tutors

Back to DataShop to illustrate

Use log data to test alternative knowledge representations Which “knowledge component” analysis is correct is an empirical question! Log data from tutors provides data to compare different KC analyses  Find which “germ” accounts for student learning behaviors

Not a smooth learning curve -> this knowledge component model is wrong. Does not capture genuine student difficulties.

This more specific knowledge component (KC) model (2 KCs) is also wrong -- still no smooth drop in error rate.

Ah! Now we are getting a smooth learning curve. This even more specific decomposition (12 KCs) better tracks the nature of student difficulties & transfer for one problem situation to another.

Overview Motivation for data mining  Better understanding of students => better instructional design Exploratory Data Analysis  Demo: DataShop, Excel Learning curves & Learning Factors Analysis Example project from last summer Next

Example project from 2006 Rafferty (Stanford) & Yudelson (U Pitt) Analyzed a data set from Geometry Applied Learning Factors Analysis (LFA) Driving questions:  Are students learning at the same rate as assumed in prior LFA models?  Do we need different cognitive models (KC models) to account for low-achieving vs. high- achieving students?

Rafferty & Yudelson Results 1 Different student learning rates? Yes

Rafferty & Yudelson Results 2 Is it “faster” learning or “different” learning?  Fit with a more compact model is better for low pre for high learn  Students with an apparent faster learning rate are learning a more “compact”, general and transferable domain model (Became basis of Anna Rafferty’s masters thesis)

Data Mining-Data Shop Offerings Tomorrow Lectures in 3501 Newell-Simon Hall, activities here (Wean 5202) 1. Educational data mining overview & introduction to using the DataShop  Follow-up activities: Exercise in using DataShop for exploratory data analysis Use tutor/course that generated target data set. Begin data export, data scrubbing, exploratory data analysis 2. Learning from learning curves: Item Response Theory, Learning Factors Analysis 3. Other data mining techniques: Learning decomposition, causal models with Tetrad Define metrics to address driving question, begin analysis

Questions?

What’s next? Tomorrow:  Do you know which offerings you will go to tomorrow?  Any conflicts -- two you want to go to that are at the same time?

END