Berkeley Parlab 1. INTRODUCTION A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing 2. CORRELATIONS TO THE GROUND.

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

Unsupervised Learning
Forecasting Using the Simple Linear Regression Model and Correlation
Comparing One Sample to its Population
Inference for Regression
Navigating the parameter space of Bayesian Knowledge Tracing models Visualizations of the convergence of the Expectation Maximization algorithm Zachary.
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Evaluating Diagnostic Accuracy of Prostate Cancer Using Bayesian Analysis Part of an Undergraduate Research course Chantal D. Larose.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Objectives (BPS chapter 24)
1 Learning Semantics-Preserving Distance Metrics for Clustering Graphical Data Aparna S. Varde, Elke A. Rundensteiner, Carolina Ruiz, Mohammed Maniruzzaman.
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Cluster Validation.
Jeremy Wyatt Thanks to Gavin Brown
The Analysis of Variance
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
© University of Minnesota Data Mining CSCI 8980 (Fall 2002) 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center.
Today Concepts underlying inferential statistics
The one sample t-test November 14, From Z to t… In a Z test, you compare your sample to a known population, with a known mean and standard deviation.
POSTER TEMPLATE BY: Cluster-Based Modeling: Exploring the Linear Regression Model Space Student: XiaYi(Sandy) Shen Advisor:
Quantitative Genetics
Getting Started with Hypothesis Testing The Single Sample.
Performance Metrics for Graph Mining Tasks
Relationships Among Variables
Standard Error and Research Methods
Determining the Significance of Item Order In Randomized Problem Sets Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of.
Inference for regression - Simple linear regression
Hypothesis Testing:.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Introduction to Quantitative Data Analysis (continued) Reading on Quantitative Data Analysis: Baxter and Babbie, 2004, Chapter 12.
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
The Hypothesis of Difference Chapter 10. Sampling Distribution of Differences Use a Sampling Distribution of Differences when we want to examine a hypothesis.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Quantitative Skills 1: Graphing
Fundamentals of Data Analysis Lecture 9 Management of data sets and improving the precision of measurement.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE.
Lecture 20: Cluster Validation
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Céline Scheidt and Jef Caers SCRF Affiliate Meeting– April 30, 2009.
Chapter 9: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Type I and II Errors Testing the difference between two means.
1 METHODS FOR DETERMINING SIMILARITY OF EXPOSURE-RESPONSE BETWEEN PEDIATRIC AND ADULT POPULATIONS Stella G. Machado, Ph.D. Quantitative Methods and Research.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Chapter 6: Analyzing and Interpreting Quantitative Data
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Hypothesis Testing and Statistical Significance
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
DATA MINING: CLUSTER ANALYSIS (3) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
A Comparison of Marking Levels in the Reviewed Majors.
Stats Methods at IC Lecture 3: Regression.
Unsupervised Learning
The Impact of Concurrent Coverage Metrics on Testing Effectiveness
Understanding Standards Event Higher Statistics Award
CSE 4705 Artificial Intelligence
Detecting the Learning Value of Items In a Randomized Problem Set
Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.
Hippocampal “Time Cells”: Time versus Path Integration
Roc curves By Vittoria Cozza, matr
Advanced Algebra Unit 1 Vocabulary
Presentation transcript:

Berkeley Parlab 1. INTRODUCTION A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing 2. CORRELATIONS TO THE GROUND TRUTH Error metric is essential for learning model parameters: prior, learn, guess, and slip in BKT. Common metrics are: log-likelihood (LL) root-mean-squared error (RMSE) area under the ROC curve (AUC). Pardos and Yudelson [1] compares different error metrics to investigate which one has the most accuracy of estimating the moment of learning. Our work extends this comparison by looking closer into the relationship between three popular error metrics: LL, RMSE, and AUC, and particularly elucidating the relationship to one another closer to the ground truth point. To assess whether LL, RMSE, or AUC is the best error metric to use in parameter searching for the BKT model, we synthesize 26 datasets to use in various experiments by simulating student responses based on diverse known ground truth parameter values. [1] Z. A. Pardos and M. V. Yudelson. Towards moment of learning accuracy. In Proceedings of the 1st AIED Workshop on Simulated Learners, Asif Dhanani*, Seung Yeon Lee*, Phitchaya Mangpo Phothilimthana*, and Zachary Pardos University of California, Berkeley * Asif Dhanani, Seung Yeon Lee, and Phitchaya Mangpo Phothilimthana contributed equally to this work and are listed alphabetically. Methodology Evaluated LL, RMSE, and AUC values on all points over the entire prior/learn/guess/slip parameter space with a 0.05 interval. Looked at the correlations between values calculated from error metrics (i.e. LL, RMSE, and AUC) and the euclidean distances from the points to the ground truth. Tested whether the correlation between the values calculated by any particular error metric and the distances is significantly stronger than the others’. 3. DISTRIBUTION OF VALUES Results The average LL, RMSE, and AUC correlations were , , and respectively. One-tailed paired t-test revealed RMSE as statistically significantly better than both LL and AUC. LL heat map’s gredient: very high at the beginning (far from the ground truth) and very low at the end (close to the ground truth). -RMSE heat map: the change in the gradient is low. Additionally, the darkest blue region in -RMSE heat map is smaller than that in LL heat map. This suggests that we may be able to refine the proximity of the ground truth better with RMSE. Comparison ∆ of correlations tp-value RMSE > LL << RMSE > AUC LL > AUC One-tailed paired T-test statistics 5. CONCLUSION Heat Maps We discovered that RMSE serves as the strongest indicator metrics for evaluating the closeness of estimated parameters to the true parameters in the BKT model. RMSE has a significantly higher correlation to the distance from the ground truth on average than both LL and AUC, and RMSE is notably better when the estimated parameter value is not very close to the ground truth. The effectiveness of teaching systems without human supervision relies on the ability of the systems to predict the implicit knowledge states of students. We hope that our work can help advance the parameter learning algorithms used in the knowledge tracing model, which in turn can make these teaching systems more effective. Values calculated by different error metrics vs distances to the ground truth Correlation ComparisonsNumber of Datasets RMSE vs LL RMSE > LL26 RMSE < LL0 RMSE vs AUC RMSE > AUC18 RMSE < AUC8 LL vs AUC LL > AUC15 LL < AUC11 Correlation comparisons of error metrics Methodology Visualized the values of LL and -RMSE of all points over the 2 dimensional guess/slip space with a 0.02 interval while fixing prior and learn parameter values to the actual ground truth values. Red represents low values, while blue represents high values. The white dots represent the ground truth. LL -RMSE AUC 4. DIRECT COMPARISON: LL AND RMSE Methodology prior = 0.564, learn = 0.8, guess = 0.35, and slip = 0.4 Methodology Plotted LL values vs RMSE values. Labeled each data point by its distance to the ground truth with a color. The range of colors is the same as used in the previous method. Results LL values and RMSE values correlate logarithmically. A secondary curve, the “hook”, is observed in varying sizes among datasets. When we look at a fixed LL value with varied RMSE values: Most points in the hook have higher - RMSE values and are closer to the ground truth than do the points in the main curve. However, this same pattern is not seen for a fixed RMSE value with varied LL values. After the curve and hook converge, both RMSE and LL give similar estimates of the ground truth. However, for a portion of the graph before this point, RMSE is a better predictor of ground truth values.