Nurissaidah Ulinnuha. Introduction Student academic performance (1990-2010) Logistic RegressionNaïve Bayessian Artificial Neural Network Student Academic.

Slides:



Advertisements
Similar presentations
On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.
Advertisements

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Random Forest Predrag Radenković 3237/10
Brief introduction on Logistic Regression
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
Weka. Preprocessing Opening a file Editing a file Visualize a variable.
An Overview of Machine Learning
Statistical Tools for Evaluating the Behavior of Rival Forms: Logistic Regression, Tree & Forest, and Naive Discriminative Learning R. Harald Baayen University.
COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –
Multivariate Methods Pattern Recognition and Hypothesis Testing.
Lesson learnt from the UCSD datamining contest Richard Sia 2008/10/10.
Evaluating data quality issues from an industrial data set Gernot Liebchen Bheki Twala Mark Stephens Martin Shepperd Michelle.
Data Mining Techniques Outline
Regression with a Binary Dependent Variable. Introduction What determines whether a teenager takes up smoking? What determines if a job applicant is successful.
Learning From Data Chichang Jou Tamkang University.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Discriminant Analysis Objective Classify sample objects into two or more groups on the basis of a priori information.
Introduction. 1.Data Mining and Knowledge Discovery 2.Data Mining Methods 3.Supervised Learning 4.Unsupervised Learning 5.Other Learning Paradigms 6.Introduction.
Data Mining – Intro.
Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.
Chapter 14 Inferential Data Analysis
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Chapter 11 Simple Regression
Overview DM for Business Intelligence.
Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages.
Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
1 Multivariate Linear Regression Models Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.
Linear Discriminant Analysis and Logistic Regression.
Konstantina Christakopoulou Liang Zeng Group G21
Objectives: Terminology Components The Design Cycle Resources: DHS Slides – Chapter 1 Glossary Java Applet URL:.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Data Mining and Decision Support
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Data Analytics CMIS Short Course part II Day 1 Part 1: Introduction Sam Buttrey December 2015.
Chong Ho Yu.  Data mining (DM) is a cluster of techniques, including decision trees, artificial neural networks, and clustering, which has been employed.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
A Smart Tool to Predict Salary Trends of H1-B Holders
Machine Learning with Spark MLlib
Data Mining – Intro.
Prepared by: Mahmoud Rafeek Al-Farra
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Dipartimento di Ingegneria «Enzo Ferrari»,
Predict House Sales Price
NBA Draft Prediction BIT 5534 May 2nd 2018
Vincent Granville, Ph.D. Co-Founder, DSC
What is Regression Analysis?
Classification and Prediction
CSCI N317 Computation for Scientific Applications Unit Weka
Multivariate Linear Regression Models
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Somi Jacob and Christian Bach
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Nurissaidah Ulinnuha

Introduction Student academic performance ( ) Logistic RegressionNaïve Bayessian Artificial Neural Network Student Academic performance 2011 Random Forest Decission Tree

Artificial Neural Network Superiority ANN is useful for application in several areas, including pattern recognition, classification, forecasting, process control, etc. Robust for noisy dataset

Limitation ANNs do not have parametric statistical properties (e.g. they do not have individual coefficient or model significance tests based on the t and F distributions). ANN may converge to local instead of global minima, thereby providing non-optimal data fits.

Logistic Regression Superiority LR is able to provide information about significance value of predictor There are no assumption about normality of dataset.

Limitation Only able to work with binary criterion variable

Naïve Bayessian Superiority Naïve bayessian requires data training fewer than other Classsification method Limitation Dataset should satisfy independent assumption

Random Forest Decision Tree Superiority Random Forest runs efficiently on large databases. Random Forest can handle thousands of input variables without variable deletion. Random Forest gives estimates of what variables are important in the classification. Random Forest has an effective method for estimating missing data and maintains accuracy when a large proportion of the data are missing. Random forest able to do classification, clustering and outlier detection

Limitation Random forests have been observed to overfit for some datasets with noisy classification/regression tasks. Unlike decision trees, the classifications made by Random Forests are difficult for humans to interpret.

Mukta Paliwal and Usha Kumar Title Academic performance of business school graduates using neural network and statistical techniques. Overview This research compare ANN with several statistical techniques. Paliwal conclude that the superior performance of the neural network techniques as compared to regression analysis for prediction problem whereas performance of neural network is comparable to logistic regression and discriminant analysis for classification problem.

J. Zimmerman Title Predicting graduate-level performance from undergraduate achievements Result This research predicting graduate-level performance using random forest decision tree. From this research, we get information that random forest is not only able to do classification but also explain about significance of variable

Raw data DATA GRADUATION OF INFORMATICS ENGINEERING MAGISTER STUDENT ITS ( )

Preprocess (165 field) Filter data with null value Change all attribute to number value Change class attribute to nominal value

Dataset DATA GRADUATION OF INFORMATICS ENGINEERING MAGISTER STUDENT ITS ( )

NoVariable NameInformationValue 1Marital StatusMarital status when take magister college0 = not married 1 = married 2GenderGender of magister student0 = woman 1 = man 3Scholar University Rating of university with scale from 1-10 from survey of Webomatrics 10 = 35 big first rank 9 = 35 big second rank, etc 4Period of StudyTime period for studyNominal (2-4 years) 5Work StatusWork status when take magister college0 = not work 1 = work 6Scholar GPAGPA value at scholarNominal (0-4) 7Age (new student) Age when take magister collegeNominal Information of Dataset Fitur 7 fitur and 104 field

Class A:GPA > 3.5 B:GPA <= 3.5 Tools WeKa

Discussion Data training composition influence the performance of classifier technique. Random Forest analysis is overfit for some dataset. Random Forest in accuracy is not better than other methods for dataset with small fitur

Future Works Discard unimportant atribut dataset using Principal Component analysis. Finding any method to solve overfitting problem of Random Forest Decision Tree