Analysis on Accelerated Learning Cohorts

Slides:

Advertisements

Similar presentations

Classification.. continued. Prediction and Classification Last week we discussed the classification problem.. – Used the Naïve Bayes Method Today..we.

Advertisements

An Introduction to Boosting Yoav Freund Banter Inc.

Indian Statistical Institute Kolkata

Introduction to Predictive Learning

Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.

Classification and Prediction: Basic Concepts Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Decision Tree Models in Data Mining

Evaluating Classifiers

A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

Lecture Notes 4 Pruning Zhangxi Lin ISQS

Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.

Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )

Look-ahead Linear Regression Trees (LLRT)

Today Ensemble Methods. Recap of the course. Classifier Fusion

Linear Discriminant Analysis and Logistic Regression.

DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:

Data analysis tools Subrata Mitra and Jason Rahman.

Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.

Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.

Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.

LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.

Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.

Supervise Learning. 2 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.”

PREDICTING SONG HOTNESS

GROUP GOAL Learn and understand python programing language Libraries: Pandas Numpy SKlearn Use machine learning algorithms Decision trees Random Forests.

Utilizing “big Data” analytics for student success

7. Performance Measurement

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Name: Sushmita Laila Khan Affiliation: Georgia Southern University

Machine Learning – Classification David Fenyő

Analysis of Fastenal Quoting Practices

Week 2 Presentation: Project 3

Evaluating Classifiers

An Empirical Comparison of Supervised Learning Algorithms

Trees, bagging, boosting, and stacking

David L. Olson Department of Management University of Nebraska

COMP1942 Classification: More Concept Prepared by Raymond Wong

Can Computer Algorithms Guess Your Age and Gender?

Statistical Techniques

Predicting survivors of Neonatal calf diarrhea (NCD) using Logistic Regression or Gradient Boosting Stefano Biffani*, Cesare Lubiano1, Davide Pravettoni1.

ECE 5424: Introduction to Machine Learning

Linear regression project

Dipartimento di Ingegneria «Enzo Ferrari»,

Basic machine learning background with Python scikit-learn

Advanced Analytics Using Enterprise Miner

Predicting Academic Performance of University Students

NBA Draft Prediction BIT 5534 May 2nd 2018

Machine Learning & Data Science

Machine Learning Week 1.

Advanced Analytics. Advanced Analytics What is Machine Learning?

Mitchell Kossoris, Catelyn Scholl, Zhi Zheng

Feature Engineering Studio Special Session

PROBLEM 1 Training Examples: Class 1 Training Examples: Class 2

Ying shen Sse, tongji university Sep. 2016

iSRD Spam Review Detection with Imbalanced Data Distributions

Implementing AdaBoost

CSCI N317 Computation for Scientific Applications Unit Weka

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Reasoning in Psychology Using Statistics

Predicting Loan Defaults

MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn

Support Vector Machines 2

Information Organization: Evaluation of Classification Performance

Presenter: Donovan Orn

Evaluation David Kauchak CS 158 – Fall 2019.

Presentation transcript:

Analysis on 2013-2018 Accelerated Learning Cohorts Benjamin Brown, Grace Rusth The Office of Educational Partnerships and Outreach Oregon Institute of Technology Contact Benjamin: benjaminbrown.cpe@gmail.com

Project Goals Data Analysis Machine Learning Algorithm Generate functional statistics to assist Oregon Tech’s Strategic Enrollment Management division in targeted recruitment of current high school non- degree seeking students Find a usable machine learning prediction model that is reasonably accurate (greater than 75% prediction accuracy) Emphasize accuracy with predicting who will matriculate over who will not matriculate

Overview Data Analysis Machine Learning Algorithm Data gathered for the 2013-2018 cohorts 22716 samples with 12 provided features and 1 generated feature The dataset includes students who have started the dual credit programs in the last 4 years but have not yet graduated All students in the 2013-2015 cohorts should have graduated Ran statistical analysis on the data provided by Oregon Tech’s Office of Institutional Research using Excel functions, charts, and graphs to aid in explanation Programmed in Python using the scipy, numpy, pandas, sklearn, graphviz, and matplotlib modules Ran five different machine learning algorithms to compare accuracies and determine best model for prediction

10-Fold CV Score Comparison Logistic Regression: Mean: 0.7705 Standard Deviation: 0.01706 Linear Discrimination Analysis Mean: 0.7404 Standard Deviation: 0.01956 KNN (k = 5) Mean: 0.7564 Standard Deviation: 0.01475 Support Vector Classification Mean: 0.7803 Standard Deviation: 0.01984 Binary Decision Tree Mean: 0.794 Standard Deviation: 0.01602

Methods By subject comparisons By school comparisons Data Analysis Binary Decision Tree By subject comparisons By school comparisons 2013-2015 vs 2016-2018 cohort comparisons Full dataset comparisons Tree depth of 5 is optimal with this dataset to not over fit Final model predicts 2016-2018 cohort matriculations off of a decision tree trained on the 2013-2015 cohort Validated by splitting the 2013- 2015 cohort before training the model Assumptions: Matriculation can be predicted. All of the included variables (12 given variables: Term, Prefix, Credits, Student Type, Gender, High School, Metro Area)can be used to assist in predicting matriculation. 2013-2015 Cohorts all have had ample time to graduate

Machine Learning Results Decision Tree Predictions with Validation Set 10-Fold Cross Validation Accuracy Accuracy score: 0.9156 Confusion matrix: No Yes No [[14634 947] Yes [ 425 246]] Classification report Precision Recall F1-score Total No 0.97 0.94 0.96 15581 Yes 0.21 0.37 0.26 671 Min: 0.7644 Max: 0.8173 Mean: 0.794 Standard Deviation: 0.01602 Actual = Rows Predicted = Columns Precision: Correct/ total column (true predicted Yes/No / total predicted Yes/No) Recall: Correct / total row (True predicted Yes/No / actual Yes/No) F1: Harmonic mean of precision and recall (2*precision*recall/(precision+recall))

Final Decision Tree

Results Data Analysis Binary Decision Tree Schools geographically close to Oregon Tech’s main campus have higher matriculation rates Students who take more specialized classes are more likely to matriculate to Oregon Tech Students who matriculate take more credits on average than those who do not Determined matriculation can be predicted with acceptable accuracy using decision trees Garnered interest from administrators for further applications of machine learning algorithms within Oregon Tech Geographically close = within ~106 miles More specialized classes: CST/EE/MFG/etc. over MATH/ENG/WRI/etc. More credits = +4 credits over non-Mat, on average.

References and Acknowledgements Idea originated through collaboration between Grace Rusth and Benjamin Brown Data retrieved by Oregon Tech’s Office of Institutional Research Machine learning taught by Dr. Rosanna Overholser, Assistant Professor: Oregon Tech Advice and support from Joseph Reid, Associate Professor, Oregon Tech