Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 4, 2012.

Slides:



Advertisements
Similar presentations
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 12, 2012.
Advertisements

Knowledge Inference: Advanced BKT Week 4 Video 5.
Knowledge Engineering Week 3 Video 5. Knowledge Engineering  Where your model is created by a smart human being, rather than an exhaustive computer.
HUDM4122 Probability and Statistical Inference March 30, 2015.
Meta-Cognition, Motivation, and Affect PSY504 Spring term, 2011 January 26, 2010.
Discovery with Models Week 8 Video 1. Discovery with Models: The Big Idea  A model of a phenomenon is developed  Via  Prediction  Clustering  Knowledge.
Data Synchronization and Grain-Sizes Week 3 Video 2.
Ryan S.J.d. Baker Adam B. Goldstein Neil T. Heffernan Detecting the Moment of Learning.
Ground Truth for Behavior Detection Week 3 Video 1.
Correlation AND EXPERIMENTAL DESIGN
Special Topics in Educational Data Mining HUDK50199 Spring term, 2013 April 1, 2012.
Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.
Meta-Cognition, Motivation, and Affect PSY504 Spring term, 2011 April 13, 2011.
BACKGROUND RESEARCH QUESTIONS  Does the time parents spend with children differ according to parents’ occupation?  Do occupational differences remain.
How Science Works Glossary AS Level. Accuracy An accurate measurement is one which is close to the true value.
Today Concepts underlying inferential statistics
Educational Data Mining Overview John Stamper PSLC Summer School /25/2011 1PSLC Summer School 2011.
RESEARCH METHODS IN EDUCATIONAL PSYCHOLOGY
Chapter 7 Correlational Research Gay, Mills, and Airasian
Science and Engineering Practices
Educational Data Mining and DataShop John Stamper Carnegie Mellon University 1 9/12/2012 PSLC Corporate Partner Meeting 2012.
Experiments and Observational Studies.  A study at a high school in California compared academic performance of music students with that of non-music.
International Conference on Enhancement and Innovation in Higher Education Crowne Plaza Hotel, Glasgow 9-11 June 2015 Welcome.
Redesign of Beginning and Intermediate Algebra using ALEKS Lessons Learned Cheryl J. McAllister Laurie W. Overmann Southeast Missouri State University.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 6, 2012.
Slide 9-1 © 1999 South-Western Publishing McDaniel Gates Contemporary Marketing Research, 4e Understanding Measurement Carl McDaniel, Jr. Roger Gates Slides.
Case Study – San Pedro Week 1, Video 6. Case Study of Classification  San Pedro, M.O.Z., Baker, R.S.J.d., Bowers, A.J., Heffernan, N.T. (2013) Predicting.
Data Annotation for Classification. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 13, 2012.
The Psychology of the Person Chapter 2 Research Naomi Wagner, Ph.D Lecture Outlines Based on Burger, 8 th edition.
Political Research and Statistics 8/26/2013. Readings Bring your cd's and a flash drive to class on Thursday Pollack Textbook – Introduction – Ch: 10.
Gaming the System in the CALL Classroom Peter Gobel Kyoto Sangyo University
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 2, 2012.
Chapter 2 AP Psychology Outline
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 26, 2012.
Educational Data Mining: Discovery with Models Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Ken Koedinger CMU Director of PSLC Professor of Human-Computer.
Advanced BKT February 11, Classic BKT Not learned Two Learning Parameters p(L 0 )Probability the skill is already known before the first opportunity.
Experiment Basics: Variables Psych 231: Research Methods in Psychology.
Spring 2011 Tutor Training Modern Learning Theories and Tutoring Designed and Presented by Tem Fuller.
TEACHER EFFECTIVENESS INITIATIVE VALUE-ADDED TRAINING Value-Added Research Center (VARC)
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Learning Analytics: Process & Theory March 24, 2014.
Issues concerning the interpretation of statistical significance tests.
A Comparison of General v. Specific Measures of Achievement Goal Orientation Lisa Baranik, Kenneth Barron, Sara Finney, and Donna Sundre Motivation Research.
Question paper 1997.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 16, 2012.
Statistics for the Social Sciences Psychology 340 Spring 2010 Introductions & Review of some basic research methods.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 13, 2013.
URBDP 591 I Lecture 4: Research Question Objectives How do we define a research question? What is a testable hypothesis? How do we test an hypothesis?
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Feature Engineering Studio September 9, Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 25, 2012.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 9, 2012.
Welcome! Please arrange yourselves in groups of 6 so that group members represent: A mix of grade levels A mix of schools 1.
Core Methods in Educational Data Mining
Multivariate Analysis - Introduction
Big Data, Education, and Society
Core Methods in Educational Data Mining
Big Data, Education, and Society
Big Data, Education, and Society
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
DESIGN OF EXPERIMENTS by R. C. Baker
Core Methods in Educational Data Mining
Presentation transcript:

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 4, 2012

Today’s Class Discovery with Models

Discovery with Models: The Big Idea A model of a phenomenon is developed Via – Prediction – Clustering – Knowledge Engineering This model is then used as a component in another analysis

Can be used in Prediction The created model’s predictions are used as predictor variables in predicting a new variable E.g. Classification, Regression

Can be used in Relationship Mining The relationships between the created model’s predictions and additional variables are studied This can enable a researcher to study the relationship between a complex latent construct and a wide variety of observable constructs E.g. Correlation mining, Association Rule Mining

“Increasingly Important…” Baker & Yacef (2009) argued that Discovery with Models is a key emerging area of EDM I think that’s true, although it has been a bit slower to emerge than I might have expected

Example From the reading

Baker & Gowda (2010) Models of gaming the system, off-task behavior, and carelessness developed Applied to entire year of data from urban, rural, and suburban schools Differences in behaviors between different schools studied

Students Used Cognitive Tutor Geometry during school year Approximately 2 days a week Tutor chosen by teachers – e.g. not all classrooms used software However, software used by representative population in each school – e.g. not just gifted or special-needs students 3 schools in Southwestern Pennsylvania 10

Urban school Suburban school Rural school % African-American100%<1%2% % White0%98%97% % Hispanic0%<1% Free or Reduced-Price Lunch 99%4%Not Reported % Proficient on state math exam 20%77%Not Reported Median household income in community $26,621$60,307$32,206 % Children under poverty line 30.8%2.5%18.4% 11

Urban school Suburban school Rural school Number of students using software Average time using software 51 hrs.35 hrs.9 hrs. Total number of actions within software 245, , ,875 12

Amount of usage Not consistent between schools Clear potential confound Hard to resolve Leaving confound in place is a natural bias (matches how software genuinely used in each setting) Setting arbitrary cut-offs on minimum time used and eliminating students, or only looking at the first N minutes of tutor usage, just changes what the selection bias is 13

To address these confounds Data was analyzed in two ways – Using all data (more ecologically valid) – Using a time-slice consisting of the 3 rd -8 th hours (minutes ) of each student’s usage of the software (more controlled) This time-slice will not be as representative of the usage in each school, but avoids the confound of total usage time 3 rd -8 th hours were selected, because the initial 2 hours likely represent interface learning, which are not representative of overall tutor use 14

Off-task behavior assessed using Baker’s (2007) Latent Response Model machine- learned detector of off-task behavior – Trained using data from students using a Cognitive Tutor for Middle School Mathematics in several suburban schools Age range was moderately older in this study than in original training data But off-task behavior is similar in nature within the two populations: ceasing to use the software for a significant period of time without seeking help from teacher or software – Validated to generalize to new students and across Cognitive Tutor lessons

Gaming assessed using Baker & de Carvalho’s (2008) Latent Response Model machine-learned detector of gaming the system – Trained using data from age-similar population of students using Algebra Cognitive Tutor in suburban schools – Approach validated to generalize between students and between Cognitive Tutor lessons (Baker, Corbett, Roll, & Koedinger, 2008) – Predicts robust learning in college Genetics (Baker, Gowda, & Corbett, 2011)

Carelessness assessed using Baker, Cobett, & Aleven’s (2008) machine- learned detector of contextual slip – Trained and cross-validated using data from year- long use of same Geometry Cognitive Tutor in suburban schools – Detectors transfer from USA to Philippines and vice-versa (San Pedro et al., 2011)

Application of Models Each model was applied to the full data set and time-slice data set from each school 18

% Off-Task Urban school Suburban school Rural school Full Data Set34.1% (18.0%) 15.4% (20.7%) 20.4% (13.3%) Hours % (22.8%) 16.5% (27.5%) 21.0% (16.5%)

% Off-Task Urban school Suburban school Rural school Full Data Set34.1% (18.0%) 15.4% (20.7%) 20.4% (13.3%) Hours % (22.8%) 16.5% (27.5%) 21.0% (16.5%) All differences in color statistically significant at p<0.05, using Tukey’s HSD

% Off-Task Urban school Suburban school Rural school Full Data Set34.1% (18.0%) 15.4% (20.7%) 20.4% (13.3%) Hours % (22.8%) 16.5% (27.5%) 21.0% (16.5%) All differences in color statistically significant at p<0.05, using Tukey’s HSD

% Off-Task Urban school Suburban school Rural school Full Data Set34.1% (18.0%) 15.4% (20.7%) 20.4% (13.3%) Hours % (22.8%) 16.5% (27.5%) 21.0% (16.5%) All purple columns statistically significantly different at p<0.05

Overall Model Goodness Full data set: School explains 6.4% of variance in time off-task Hours 3-8: School explains 1.2% of variance in time off-task

% Gaming the System Urban school Suburban school Rural school Full Data Set7.4% (2.2%) 6.9% (3.1%) 6.6% (1.7%) Hours % (1.9%) 5.9% (7.3%) 6.4% (2.2%)

% Gaming the System Urban school Suburban school Rural school Full Data Set7.4% (2.2%) 6.9% (3.1%) 6.6% (1.7%) Hours % (1.9%) 5.9% (7.3%) 6.4% (2.2%) All differences in color statistically significant at p<0.05, using Tukey’s HSD

% Gaming the System Urban school Suburban school Rural school Full Data Set7.4% (2.2%) 6.9% (3.1%) 6.6% (1.7%) Hours % (1.9%) 5.9% (7.3%) 6.4% (2.2%) Note that the relationship in gaming flips between data sets (but the change is all in the urban school)

% Gaming the System Urban school Suburban school Rural school Full Data Set7.4% (2.2%) 6.9% (3.1%) 6.6% (1.7%) Hours % (1.9%) 5.9% (7.3%) 6.4% (2.2%) All purple columns statistically significantly different at p<0.05

Overall Model Goodness Full data set: School explains 1.1% of variance in gaming Hours 3-8: School explains 1.7% of variance in gaming Student, tutor lesson, and problem all found to predict significantly larger proportion of variance (Baker, 2007; Baker et al, 2009; Muldner et al, 2010)

Slip Probability Urban school Suburban school Rural school Full Data Set0.50 (0.07) 0.32 (0.11) 0.27 (0.13) Hours (0.08) 0.44 (0.17) 0.33 (0.18)

Slip Probability Urban school Suburban school Rural school Full Data Set0.50 (0.07) 0.32 (0.11) 0.27 (0.13) Hours (0.08) 0.44 (0.17) 0.33 (0.18) All differences in color statistically significant at p<0.05, using Tukey’s HSD

Slip Probability Urban school Suburban school Rural school Full Data Set0.50 (0.07) 0.32 (0.11) 0.27 (0.13) Hours (0.08) 0.44 (0.17) 0.33 (0.18) All differences in color statistically significant at p<0.05, using Tukey’s HSD

Slip Probability Urban school Suburban school Rural school Full Data Set0.50 (0.07) 0.32 (0.11) 0.27 (0.13) Hours (0.08) 0.44 (0.17) 0.33 (0.18) All purple columns statistically significantly different at p<0.05; slip probabilities systematically higher in hours 3-8

Overall Model Goodness Full data set: School explains 16.5% of variance in slip probability Hours 3-8: School explains 11.1% of variance in slip probability

Time-slices agreed More off-task behavior and carelessness in the urban school than in the other two schools

Comments Significant poverty in both urban and rural schools Some factor other than simply socio-economic status explains higher frequency of off-task behavior and carelessness in urban school

Potential hypotheses Differences in teacher expertise (urban teachers in USA have higher turnover and are usually less experienced) Differences in schools’ facilities, equipment (e.g. computers), and physical environment Differences in students’ cultural backgrounds Your hypothesis welcome!

Time-slices disagreed On how much gaming the system occurred in the urban school Less gaming earlier in the year, more gaming later in the year Greater novelty effect in urban students?

Another interesting finding Carelessness dropped significantly more over the course of the school year in the suburban and rural schools than in the urban school Some factor caused the suburban and rural students to become more diligent during the school year – This didn’t happen in the urban school – Not clear why not

Obvious limitations and flaws? In the application of discovery with models

Obvious limitations and flaws: How fatal are they? One school per category Different usage pattern in different schools Detectors only formally validated for suburban students Your ideas?

Discovery with Models: Here There Be Monsters

“Rar.”

Discovery with Models: Here There Be Monsters It’s really easy to do something badly wrong, for some types of “Discovery with Models” analyses No warnings when you do

Think Validity Validity is always important for model creation Doubly-important for discovery with models – Discovery with Models almost always involves applying model to new data – How confident are you that your model will apply to the new data?

Challenges to Valid Application What are some challenges to valid application of a model within a discovery with models analysis?

Challenges to Valid Application Is model valid for population? Is model valid for all tutor lessons? (or other differences) Is model valid for setting of use? (classroom versus homework?) Is the model valid in the first place? (especially important for knowledge engineered models)

That said Part of the point of discovery with models is to conduct analyses that would simply be impossible without the model Many types of measures are used outside the context where they were tested – Questionnaires in particular The key is to find a balance between paralysis and validity – And be honest about what you did so that others can replicate and disagree

Example Different types of validity concerns

The WPI-ASU squabble

Baker (2007) used a machine-learned detector of gaming the system to determine whether gaming the system is better predicted by student or tutor lesson The detector was validated for a different population than the research population (middle school versus high schools) The detector was validated for new lessons in 4 cases, but not for full set

The WPI-ASU squabble Muldner et al. (2011) used a knowledge- engineered detector of gaming the system to determine whether gaming the system is better predicted by student or tutor lesson The detector was not validated, but trusted because of face validity

The WPI-ASU squabble Baker (2007) found that gaming better predicted by lesson Muldner et al. (2011) found that gaming better predicted by student Which one should we trust?

The WPI-ASU squabble Recent unpublished research by the two groups working together Found that a human coder’s labeling of clips where the two detectors disagree… Achieved K=0.17 with the ML detector And K= with the KE detector

The WPI-ASU squabble Which one should we trust? (Neither?)

ML and KE don’t always disagree In Baker et al. (2008) Baker and colleagues conducted two studies correlating ML detectors of gaming to learner characteristics questionnaires Walonoski and Heffernan conducted one study correlating KE detector of gaming to learner characteristics questionnaires Very similar overall results!

Converging evidence increases our trust in the methods!

Models on top of Models

Another area of Discovery with Models is composing models out of other models Examples: – Models of Gaming the System and Help-Seeking use Bayesian Knowledge-Tracing models as components (Baker et al., 2004, 2008a, 2008b; Aleven et al., 2004, 2006) – Models of Preparation for Future Learning use models of Gaming the System as components (Baker et al., 2011) – Models of Affect use models of Off-Task Behavior as components (Baker et al., in press)

Models on top of Models When I talk about this, people often worry about building a model on top of imperfect models Will the error “pile up”?

Models on top of Models But is this really a risk? If the final model successfully predicts the final construct, do we care if the model it uses internally is imperfect? What would be the costs?

What are some Other examples of discovery with models analyses?

Examples Sweet, can you tell us about San Pedro, M.O.C., Baker, R.S.J.d., Rodrigo, M.M. (2011) The Relationship between Carelessness and Affect in a Cognitive Tutor. Proceedings of the 4th bi-annual International Conference on Affective Computing and Intelligent Interaction.

Examples Mike Wixon, can you tell us about Hershkovitz, A., Wixon, M., Baker, R.S.J.d., Gobert, J., Sao Pedro, M. (2011) Carelessness and Goal Orientation in a Science Microworld. Poster paper. Proceedings of the 15th International Conference on Artificial Intelligence in Education,

Examples Mike Sao Pedro, can you tell us about Sao Pedro, M.A., Gobert, J., Baker, R.S.J.d. (in press) The Development and Transfer of Data Collection Inquiry Skills across Physical Science Microworlds. To be presented at the American Educational Research Association Conference.

Comments? Questions?

Asgn. 9 Questions? Comments?

Next Class Wednesday, April 9 3pm-5pm AK232 Missing Data and Imputation Readings Schafer, J.L., Graham, J.W. (2002) Missing Data: Our View of the State of the Art. Psychological Methods, 7 (2), Assignments Due: 9. IMPUTATION

The End