Building a predictive model to enhance students' self-driven engagement Moletsane Moletsane T: +27(0)51 401 9111 | info@ufs.ac.za | www.ufs.ac.za.

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
CART: Classification and Regression Trees Chris Franck LISA Short Course March 26, 2013.
A Quick Overview By Munir Winkel. What do you know about: 1) decision trees 2) random forests? How could they be used?
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Sparse vs. Ensemble Approaches to Supervised Learning
Decision Tree Algorithm
Ensemble Learning: An Introduction
Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning (2), Tree and Forest
Machine Learning CS 165B Spring 2012
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Chapter 9 – Classification and Regression Trees
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
CLASSIFICATION: Ensemble Methods
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Konstantina Christakopoulou Liang Zeng Group G21
Chapter 2 What is Evidence?. Objectives Discuss the concept of “best available clinical evidence.” Describe the general content and procedural characteristics.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
RECITATION 4 MAY 23 DPMM Splines with multiple predictors Classification and regression trees.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Looking for statistical twins
Combining Bagging and Random Subspaces to Create Better Ensembles
Ensemble Classifiers.
Machine Learning: Ensemble Methods
Introduction to Machine Learning
Sampling Which do we choose and why?.
DECISION TREES An internal node represents a test on an attribute.
EXPERIMENTAL RESEARCH
Introduction to Machine Learning and Tree Based Methods
Regression Analysis Module 3.
QUESTIONNAIRE DESIGN AND VALIDATION
Eco 6380 Predictive Analytics For Economists Spring 2016
Chapter 13 – Ensembles and Uplift
Trees, bagging, boosting, and stacking
NSSE Results for Faculty
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
Introduction Feature Extraction Discussions Conclusions Results
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
Classification and Prediction
Exam #3 Review Zuyin (Alvin) Zheng.
Introduction to Data Mining, 2nd Edition
Data Mining – Chapter 3 Classification
Statistical Learning Dong Liu Dept. EEIS, USTC.
Ensemble learning Reminder - Bagging of Trees Random Forest
Model generalization Brief summary of methods
Classification with CART
Marketing Research: Course 4
COMP6321 MACHINE LEARNING PROJECT PRESENTATION
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Metadata on quality of statistical information
Machine Learning Model Constructor
More on Maxent Env. Variable importance:
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Building a predictive model to enhance students' self-driven engagement Moletsane Moletsane T: +27(0)51 401 9111 | info@ufs.ac.za | www.ufs.ac.za

Overview Introduction Data Modelling The “what-if” tool. Introduction and Motivation for a sensitivity tool Data Criteria for inclusion of variables Variables used Modelling Random Forest modelling process Evaluation of the model The “what-if” tool.

What is student engagement? Student engagement measures provide information about: What students do – time and energy devoted to educationally purposeful activities What institutions do – using effective educational practices to induce students to do the right things With the aim of: channelling student energy towards activities that matter.

what do we learn from se surveys? In the absence of reliable indicators of actual student learning, SE surveys are “process indicators or proxies for student learning outcomes” (Banta, Pike, Hansen, 2009; Kuh, 2009) Having reflected on what student engagement is, it is important to explore what we learn from these measures in terms of the quality of teaching and learning.

Is se data shared with students? Little use of student engagement data by students. Similar for technology committees/groups in the institutions (NSSE, 2014) http://www.jsu.edu/oira/reports_pdf/National_Survey_of_Student_Engagement.pdf “…this principle establishes the need to establish what information students will need in order to make more informed decisions regarding their learning journeys as basis for all collection, analysis and use of student data. “ “The collection, analysis and use of student data therefore needs to primarily reflect the interests, values and priorities of students.”

How can we best share SE data to students? In a manner that: Guides students’ effective educational behaviours and encourages students to make more informed decisions regarding their learning Reflects the students interest. Does not violate students’ privacy User friendly

How can we best share SE data? Possible methods include: Creating an annual report for students Releasing snippets of data at certain time intervals (Social media, Posters, Email, SMSs) Publishing SE articles in varsity magazines Using SE data during the advising process, or Providing students with aggregated data Through a web based prediction tool that implements a model based on SE data .

What is the prediction tool? A prediction model (We use a machine learning technique for the prediction modelling) That is implemented in a web interface (Built in the R environment) To make reactive predictions to students inputs on the tool That allow students to: Explore which educational behaviours lead to a higher chance of success, thus encouraging students to make more informed decisions regarding their learning. Ask what if questions, and then find answers Explain reactive predictions

What data do we have? Student Engagement data UFS data from 2013 to 2016. Biographical data Institutional Data Students’ outcome e.g. we use proportion of modules passed Students’ credit and module load

Should we Include all the data? Biographical Data Since we intend on sharing the tool with students, we believe that biographical data may be interpreted in a prejudiced manner. E.g. Race, disability, or gender. Non actionable data For the purpose of the tool, some non actionable data was not included in the prediction model despite being modest predictors. E.g. Faculty, residence status

SASSE data UFS data from 2013 to 2016 has 6213 respondents. Only 4602 of the observations are matched to the institutional data. 190 variables

How do we choose which variables to use? Variable Importance The machine learning technique we use has a built in variable selection method. The method is based on cross validation principles for variables which ranks the variables by the loss of accuracy the model has when a model is implemented without that feature. From the top ranking variables, we select the most predictive 8 variables for our method.

How do we choose which variables to use? Variable Importance The machine learning technique we use has a built in variable selection method. The method is based on cross validation principles for variables which ranks the variables by the loss of accuracy the model has when a model is implemented without that feature. From the top ranking variables, we select the best 5 variables for our interface.

Which variables are most important? MeanDecreaseGini is a measure of variable importance based on the Gini impurity index used for the calculation of splits during training. A common misconception is that the variable importance metric refers to the Gini used for asserting model performance which is closely related to AUC, but this is wrong

Algorithm From 1 to K Draw a bootstrap sample of size n from the data Grow a random forest tree to the bootstrapped data by Selecting m variables at random from the p variables Pick the best variable split among the m variables Split the node into two data nodes Output the ensemble of trees Make a final prediction based on the majority vote of ensemble

Overview of the random forest model New data Sample 1 Learning algorithm Classifier 1 Training data Combined classifiers Sample 2 Learning algorithm Classifier 2 Sample k Learning algorithm Classifier k Prediction

Prediction with all (177) the variables sample (20.97%) Model Resutls Prediction with all (177) the variables sample (20.97%) False positive rate = 20.8% False negative rate = 21.08% Prediction with the selected (8) variables sample (23.64%) False positive rate = 24.3% False negative rate = 23.5% Pred Actual Pred Actual

The tool (Part 1 of 2)

The tool (Part 2 of 2)

Thank you T: +27(0)51 401 9111 | info@ufs.ac.za | www.ufs.ac.za