Admission Prediction System

Slides:



Advertisements
Similar presentations
Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.
Advertisements

Florida International University COP 4770 Introduction of Weka.
UNIT-2 Data Preprocessing LectureTopic ********************************************** Lecture-13Why preprocess the data? Lecture-14Data cleaning Lecture-15Data.
An Overview of Machine Learning
Introduction to Data Mining with XLMiner

Classifier Decision Tree A decision tree classifies data by predicting the label for each record. The first element of the tree is the root node, representing.
UMass Lowell Computer Science Advanced Algorithms Computational Geometry Prof. Karen Daniels Spring, 2004 Project.
Performance Evaluation in Computer Vision Kyungnam Kim Computer Vision Lab, University of Maryland, College Park.
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
1 A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data Jinwook Seo, Ben Shneiderman University of Maryland Hyun Young Song.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
SPAM DETECTION USING MACHINE LEARNING Lydia Song, Lauren Steimle, Xiaoxiao Xu.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Data Mining Techniques
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Overview DM for Business Intelligence.
Classification with Hyperplanes Defines a boundary between various points of data which represent examples plotted in multidimensional space according.
Some Key Questions about you Data Damian Gordon Brendan Tierney Brian Mac Namee.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Investigating Patterns Cornell Notes & Additional Activities.

Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Data Mining and Decision Support
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Research Methodology Lecture No :32 (Revision Chapters 8,9,10,11,SPSS)
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
PREDICTING SONG HOTNESS
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Pattern Recognition Lecture 20: Data Mining 2 Dr. Richard Spillman Pacific Lutheran University.
Usman Roshan Dept. of Computer Science NJIT
Collage Score Card & Software defect prediction
PhD at CSE: Overview CSE department offers Doctoral degree in the Computer Science (CS) or Computer Engineering areas (CpE) at both MS to PhD and BS to.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
A Smart Tool to Predict Salary Trends of H1-B Holders
Restaurant Revenue Prediction using Machine Learning Algorithms
Machine Learning with Spark MLlib
JMP Discovery Summit 2016 Janet Alvarado
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
Trees, bagging, boosting, and stacking
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Supervised Time Series Pattern Discovery through Local Importance
Performance Measures II
Author: Tianyu Wang and Li-Chiou Chen Presenter: Tianyu Wang
Gerd Kortemeyer, William F. Punch
TOP DM 10 Algorithms C4.5 C 4.5 Research Issue:
TED Talks – A Predictive Analysis Using Classification Algorithms
Tutorial for LightSIDE
Report on Data Cleaning Framework
CSCI N317 Computation for Scientific Applications Unit Weka
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Somi Jacob and Christian Bach
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Analysis for Predicting the Selling Price of Apartments Pratik Nikte
False discovery rate estimation
Predicting Loan Defaults
Usman Roshan Dept. of Computer Science NJIT
Analysis on Accelerated Learning Cohorts
Earthquake Prediction
Credit Card Fraudulent Transaction Detection
Presentation transcript:

Admission Prediction System Guided By: Prof. Meiliu Lu Presented By: Aaishwary Vadodariya Anand Rawat Jaidipkumar Patel Jay Bibodi

Over-View Problem Statement Goals Data Overview Data Issues Data Pre-processing Model Implementation Demonstration Statistical Results & Visual Analysis Future Enhancement References

Problem Statement Problem 1: Problem 2: Aragon is an International Student who wants to pursue his Masters Degree in the US He knows the requirements of each college he wants to apply to He has given all his exams and is now ready to apply Problem 2: University of Gondor has close to 1000 applicants for admission If each application takes 5 hours manually, then the whole set would take close to 5000 hours approximately This can be avoided by using data of previous admits and rejects.

Goals University Selection: To find the probability for a student to get an admit in the university before applying Student Selection: To develop a model based on previous years data of the students who got admits or rejects in a particular university

Data University Dataset for determining university decision 1686 rows with 18 columns Student Dataset for determining student probability to get admit 10 datasets each containing 50 to 200 records of data. Work Experience, GRE Score, TOEFL Score, Undergrad University, Name of Student, Result, Major… etc. Data Source: Facebook Community

Data Issues Noisy Unformatted Inconsistent Data Quality Performance Data Skewness Data Skewness Unformatted (Incompatible datatypes) Performance (Deteriorate without pre-processing) Data Quality: lacking attribute values, lacking certain attributes of interest, containing only aggregate data. Noisy: containing errors and outliers Inconsistent: Containing discrepancies in codes and names

Data Pre-Processing Data Cleaning Feature Scaling Statistical Results Raw Data Technically correct data Consistent data Feature Scaling Statistical Results

Details Result, GRE, AWA, TOEFL and Percentage are the columns, based on which the Student Selection model is designed Using mean of the values for missing values of AWA and TOEFL. Changing categorical data to numeric value. Ignoring record for percentage is not present. GRE, AWA, TOEFL and percentage are columns based on which model is designed for getting probability of student getting admit to university. Same as above except second point. Feature Scaling of all the column used to design model except Result column.

Models

Model Implementation Naïve Bayes  e1071 SVM Linear  e1071 SVM Kernel  e1071 Decision Tree  tree Random Forest  randomForest

University Selection Model STUDENT DATA Model 1 Model 2 Model 3 Model 10 Prediction 1 Prediction 2 Prediction 3 Prediction 10

Demonstration

Statistical Results & Visual Analysis

University Selection Probability for student to get an admit in the university before applying to it X1 X2 MTU_pred 0.96610169 MTU clemson_pred 0.90909091 Clemson NE_Boston_pred 0.82608696 NE_Boston ASU_pred 0.82352941 ASU IITchicago_pred 0.80000000 IITchicago RIT_pred 0.76923077 RIT UTD_pred 0.21296296 UTD UTA_pred 0.18867925 UTA UNC_pred 0.18421053 UNC U_southern_cal_pred 0.08163265 U_southern_cal

naïve Bayes Probability Chart using Naïve Bayes

Student Selection Rejects New Applicants Models Admits Past Years Data Pre-Processing Techniques Machine Learning Models Predictions New Applicants Models Rejects Admits

Naïve Bayes Confusion Matrix 1 67 6 18 108 Error Rate =12.06%

SVM-Linear Confusion Matrix 1 69 4 21 105 Error Rate =12.56%

SVM-Kernel Confusion Matrix 1 63 10 16 110 Error Rate =13.06%

Decision Tree

Decision Tree Confusion Matrix 1 59 14 8 118 Error Rate =11.05%

Random Forest Number of Tress vs Error Rate Legend Optimal between 60 – 100 We choose 70 Legend 0 – Rejects Error 1 – Accepts Error OOB – Out-of-bag Error

Random forest Confusion Matrix 1 62 11 10 116 Error Rate =10.55%

Demonstration

Learnings Data Pre-Processing is vital to the accuracy of the models Choosing appropriate machine learning techniques and algorithms to model the system Graphical representation of the data provides useful insights and can lead to better models Defining scope with respect to the dataset

Future Enhancement Creating the model with additional parameters such as Work Experience, Technical Papers Written, and Content of Letter of Recommendation etc. Creating a model based on the graph of admitted vs enrolled students of previous years to predict the increase or decrease in cutoff scores among applicants Comparing different universities based on applied vs admitted data

References Discussion Paper: A Introduction to data cleaning with R Statistics Netherlands, Henri Faasdreef 312, 2492 JP The Hague, www.cbs.nl A meta-analysis of research in Random Forest for Classification Published in: Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), 2016 Date of Conference: 30 Nov.-2 Dec. 2016, Publisher: IEEE Web Links: https://cran.r-project.org/doc/contrib/de_Jonge+van_der_Loo- Introduction_to_data_cleaning_with_R.pdf https://cran.r-project.org/web/packages/e1071/e1071.pdf https://www.usnews.com/education

Questions, Any?

Fin.