The European Conference on e-learing ,2017/10

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

Multidimensional Adaptive Testing with Optimal Design Criteria for Item Selection Joris Mulder & Wim J. Van Der Linden 1.
Fast Algorithms For Hierarchical Range Histogram Constructions
SVM—Support Vector Machines
Siddharth Choudhary.  Refines a visual reconstruction to produce jointly optimal 3D structure and viewing parameters  ‘bundle’ refers to the bundle.
1 A Common Measure of Identity and Value Disclosure Risk Krish Muralidhar University of Kentucky Rathin Sarathy Oklahoma State University.
PCA + SVD.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University.
Re-ranking for NP-Chunking: Maximum-Entropy Framework By: Mona Vajihollahi.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Previously Two view geometry: epipolar geometry Stereo vision: 3D reconstruction epipolar lines Baseline O O’ epipolar plane.
A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University.
Ordinary Differential Equations (ODEs) 1Daniel Baur / Numerical Methods for Chemical Engineers / Implicit ODE Solvers Daniel Baur ETH Zurich, Institut.
Linear and generalised linear models
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Dominant Eigenvalues & The Power Method
Presented By Wanchen Lu 2/25/2013
Cao et al. ICML 2010 Presented by Danushka Bollegala.
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Non Negative Matrix Factorization
Wancai Zhang, Hailong Sun, Xudong Liu, Xiaohui Guo.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
CSDA Conference, Limassol, 2005 University of Medicine and Pharmacy “Gr. T. Popa” Iasi Department of Mathematics and Informatics Gabriel Dimitriu University.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Orthogonalization via Deflation By Achiya Dax Hydrological Service Jerusalem, Israel
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 4. Least squares.
Matching Users and Items Across Domains to Improve the Recommendation Quality Created by: Chung-Yi Li, Shou-De Lin Presented by: I Gde Dharma Nugraha 1.
Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.
A Note on Rectangular Quotients By Achiya Dax Hydrological Service Jerusalem, Israel
Sparse Signals Reconstruction Via Adaptive Iterative Greedy Algorithm Ahmed Aziz, Ahmed Salim, Walid Osamy Presenter : 張庭豪 International Journal of Computer.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Bundle Adjustment A Modern Synthesis Bill Triggs, Philip McLauchlan, Richard Hartley and Andrew Fitzgibbon Presentation by Marios Xanthidis 5 th of No.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.
Matrix Factorization and its applications By Zachary 16 th Nov, 2010.
Show Me the Money! Deriving the Pricing Power of Product Features by Mining Consumer Reviews Nikolay Archak, Anindya Ghose, and Panagiotis G. Ipeirotis.
Fall 1999 Copyright © R. H. Taylor Given a linear systemAx -b = e, Linear Least Squares (sometimes written Ax  b) We want to minimize the sum.
Tutorial I: Missing Value Analysis
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Learning to Align: a Statistical Approach
Big data classification using neural network
PREDICT 422: Practical Machine Learning
Chapter 7. Classification and Prediction
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Deep Feedforward Networks
کاربرد نگاشت با حفظ تنکی در شناسایی چهره
MATRIX FACTORIZATION TECHNIQUES FOR RECOMMENDER SYSTEMS
Approximate Models for Fast and Accurate Epipolar Geometry Estimation
Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.
NBA Draft Prediction BIT 5534 May 2nd 2018
Enhanced-alignment Measure for Binary Foreground Map Evaluation
Structure from motion Input: Output: (Tomasi and Kanade)
Probabilistic Models with Latent Variables
Gerald Dyer, Jr., MPH October 20, 2016
Numerical Analysis Lecture14.
Ensembles.
Lecture 16: Likelihood and estimates of variances
Deep Robust Unsupervised Multi-Modal Network
Factor Analysis BMTRY 726 7/19/2018.
Approximation Algorithms for the Selection of Robust Tag SNPs
Restructuring Sparse High Dimensional Data for Effective Retrieval
Retrieval Performance Evaluation - Measures
Structure from motion Input: Output: (Tomasi and Kanade)
Presentation transcript:

The European Conference on e-learing ,2017/10 On Predicting Student Performance Using Low-Rank Matrix Factorization Techniques Stephan Lorenze University of Copenhagen The European Conference on e-learing ,2017/10

Question Analysis DataSet

Accurate prediction of Student’s abilities Purpose Teachers provide remedial support to weak students and to recommend appropriate tasks to excellent students. Accurate prediction of Student’s abilities

how a student perform on an unseen quiz Question predicting the score of students in the quiz system of the Clio Online learning platform how a student perform on an unseen quiz Matrix completion

A conclusion We conclude that since the active students in the platform perform very similar and the current version of the data set is very sparse, the very low-rank approximation can capture enough information.

assumption there are a small number of latent features revealing the students and tasks preferences. Such assumption is natural and has been widely used in research and application work in educational data (Barnes 2005; Desmarais 2011; van der Linden and Hambleton 2010, Bydzovska 2016 ). It is worth noting that there are prior work using matrix factorization for predicting student performance (Desmarais 2011; Elbadrawy et al. 2016; Thai- Nghe et al. 2010). Since matrix factorization in general requires iterative methods (Koren, Bell and Volinsky 2009; Lee and Seung 2000; Srebro and Jaakkola 2003) to solve the non- convex optimization, the initialization stage is essential in the design of successful methods.

Further If we do not restrict the number of degrees of freedom in the completed matrix, this problem is underdetermined since we can assign any arbitrary values to the missing entries. Therefore, one often tries to estimate a low-rank matrix from the sparse,observed matrix we can cast the matrix completion problem as the weighted low-rank matrix factorization (LRMF)

Therefore the weighted low-rank matrix factorization (LRMF) where the weights are 1s for observed entries and 0s for missing entries

Function standard weighted LRMF weighted non-negative LRMF

Contribution We view the problem of predicting the scores of students from their partial observed scores as the weighted LRMF problem. We study the well-known Expectation-Maximization procedure (EM) (EM algorithm n.d.) for solving it. We investigate the non-negative constrained problem, i.e. all entries of the estimated low-rank matrices U, V are non-negative, and make use of the EM method for solving the non-negative constrained problem. Since the behaviour of the EM method is sensitive to the initial values, we propose using the singular value decomposition (SVD) (Golub and Loan 1999) and the non-negative double SVD (Boutsidis and Gallopoulos 2008) as the initialization stages for the standard weighted and non-negative weighted LRMF, respectively. Experimental results show that the proposed initialization methods lead to fast convergence ratio for both constrained and non-constrained problems.

Contribution We implement and measure the performance of the proposed methods on new real-life data from a Clio Online learning platform. We compare the mean squared error of the EM method with the simple baseline approach based on the global mean score and student/quiz bias. Surprisingly, the advanced EM method is only slightly better or comparable to the baseline approach. We visualize the eigenvalue spectrum of a dense subset of the data set to explain our interesting findings. We conclude that since the active students in the platform perform very similarly and the current version of the data set is very sparse, the very low-rank approximation can capture enough information. This means that the simple baseline approach (rank at most 2) achieves similar performance compared to other advanced methods. We believe that by restricting the quiz data set, e.g. only including quizzes with a time limit, students will behave differently and the advanced EM methods might improve the prediction accuracy

DATA data from the Danish online learning platform Clio Online (Clio Online n.d.) all repeated attempts at the same quiz are removed, so that only the first attempt remains. the data consists of a list of tuples (i,j,v) Their platform includes texts, videos, quizzes, exercises, and more, spanning several different elementary school subjects. We study the performance of students on the quizzes

Process the DataSet What we get A large sparse matrix So extract a subset of the data until Each student si, the set contains at least 15 tuples for that student, and for each quiz qj, the set contains at least 15 tuples for that quiz. This means that each student/quiz has at least 15 answers The resulting data set contains data for n = 1141 students and m = 245 quizzes.