Nurissaidah Ulinnuha
Introduction Student academic performance ( ) Logistic RegressionNaïve Bayessian Artificial Neural Network Student Academic performance 2011 Random Forest Decission Tree
Artificial Neural Network Superiority ANN is useful for application in several areas, including pattern recognition, classification, forecasting, process control, etc. Robust for noisy dataset
Limitation ANNs do not have parametric statistical properties (e.g. they do not have individual coefficient or model significance tests based on the t and F distributions). ANN may converge to local instead of global minima, thereby providing non-optimal data fits.
Logistic Regression Superiority LR is able to provide information about significance value of predictor There are no assumption about normality of dataset.
Limitation Only able to work with binary criterion variable
Naïve Bayessian Superiority Naïve bayessian requires data training fewer than other Classsification method Limitation Dataset should satisfy independent assumption
Random Forest Decision Tree Superiority Random Forest runs efficiently on large databases. Random Forest can handle thousands of input variables without variable deletion. Random Forest gives estimates of what variables are important in the classification. Random Forest has an effective method for estimating missing data and maintains accuracy when a large proportion of the data are missing. Random forest able to do classification, clustering and outlier detection
Limitation Random forests have been observed to overfit for some datasets with noisy classification/regression tasks. Unlike decision trees, the classifications made by Random Forests are difficult for humans to interpret.
Mukta Paliwal and Usha Kumar Title Academic performance of business school graduates using neural network and statistical techniques. Overview This research compare ANN with several statistical techniques. Paliwal conclude that the superior performance of the neural network techniques as compared to regression analysis for prediction problem whereas performance of neural network is comparable to logistic regression and discriminant analysis for classification problem.
J. Zimmerman Title Predicting graduate-level performance from undergraduate achievements Result This research predicting graduate-level performance using random forest decision tree. From this research, we get information that random forest is not only able to do classification but also explain about significance of variable
Raw data DATA GRADUATION OF INFORMATICS ENGINEERING MAGISTER STUDENT ITS ( )
Preprocess (165 field) Filter data with null value Change all attribute to number value Change class attribute to nominal value
Dataset DATA GRADUATION OF INFORMATICS ENGINEERING MAGISTER STUDENT ITS ( )
NoVariable NameInformationValue 1Marital StatusMarital status when take magister college0 = not married 1 = married 2GenderGender of magister student0 = woman 1 = man 3Scholar University Rating of university with scale from 1-10 from survey of Webomatrics 10 = 35 big first rank 9 = 35 big second rank, etc 4Period of StudyTime period for studyNominal (2-4 years) 5Work StatusWork status when take magister college0 = not work 1 = work 6Scholar GPAGPA value at scholarNominal (0-4) 7Age (new student) Age when take magister collegeNominal Information of Dataset Fitur 7 fitur and 104 field
Class A:GPA > 3.5 B:GPA <= 3.5 Tools WeKa
Discussion Data training composition influence the performance of classifier technique. Random Forest analysis is overfit for some dataset. Random Forest in accuracy is not better than other methods for dataset with small fitur
Future Works Discard unimportant atribut dataset using Principal Component analysis. Finding any method to solve overfitting problem of Random Forest Decision Tree