MTBI Personality Predictor using ML

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
Scott Wiese ECE 539 Professor Hu
An Overview of Machine Learning
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –SVM –Multi-class SVMs –Neural Networks –Multi-layer Perceptron Readings:
Lesson learnt from the UCSD datamining contest Richard Sia 2008/10/10.
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
Scalable Text Mining with Sparse Generative Models
Introduction to Artificial Neural Network and Fuzzy Systems
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Machine Learning Queens College Lecture 13: SVM Again.
Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.
Use of web scraping and text mining techniques in the Istat survey on “Information and Communication Technology in enterprises” Giulio Barcaroli(*), Alessandra.
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.
Introduction to String Kernels Blaz Fortuna JSI, Slovenija.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller 國立雲林科技大學 National Yunlin University of.
© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform.
ECE 471/571 - Lecture 19 Review 11/12/15. A Roadmap 2 Pattern Classification Statistical ApproachNon-Statistical Approach SupervisedUnsupervised Basic.
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Estimation of car gas consumption in city cycle with ANN Introduction  An ANN based approach to estimation of car fuel consumption  Multi Layer Perceptron.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Neural network based hybrid computing model for wind speed prediction K. Gnana Sheela, S.N. Deepa Neurocomputing Volume 122, 25 December 2013, Pages 425–429.
1 Introduction to Neural Networks Recurrent Neural Networks.
Usman Roshan Dept. of Computer Science NJIT
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
A Simple Approach for Author Profiling in MapReduce
Classify A to Z Problem Statement Technical Approach Results Dataset
2/13/2018 4:38 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
CNN-RNN: A Unified Framework for Multi-label Image Classification
Sentiment Analysis of Twitter Messages Using Word2Vec
CEE 6410 Water Resources Systems Analysis
ECE 5424: Introduction to Machine Learning
Simon Fraser University Simon Fraser University
COMP24111: Machine Learning and Optimisation
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Image Recognition. Contents: Motivation Objective Definition Introduction Preprocessing / Edge Detection Neural Networks in Image Recognition Practical.
Deep Learning with TensorFlow online Training at GoLogica Technologies
Natural Language Processing of Knee MRI Reports
Schizophrenia Classification Using
Classifying enterprises by economic activity
כריית מידע -- מבוא ד"ר אבי רוזנפלד.
Analyzing and Visualizing Disaster Phases from Social Media Streams
Machine Learning with Weka
Final Presentation: Neural Network Doc Summarization
Principal Component Analysis
Predicting Pneumonia & MRSA in Hospital Patients
Word Embedding Word2Vec.
Machine Learning 101 Intro to AI, ML, Deep Learning
Advanced Artificial Intelligence Classification
Milton King, Waseem Gharbieh, Sohyun Park, and Paul Cook
Automated Recipe Completion using Multi-Label Neural Networks
Somi Jacob and Christian Bach
Hans Behrens, , 25% Yash Garg, , 25% Prad Kadambi, , 25%
Multivariate Methods Berlin Chen
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
Multivariate Methods Berlin Chen, 2005 References:
Machine Learning Support Vector Machine Supervised Learning
Generative Models and Naïve Bayes
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
Semi-Supervised Learning
Introduction to Sentiment Analysis
Automatic Handwriting Generation
Usman Roshan Dept. of Computer Science NJIT
LHC beam mode classification
Credit Card Fraudulent Transaction Detection
Do Better ImageNet Models Transfer Better?
Presentation transcript:

MTBI Personality Predictor using ML Salman Ahmed Andy Sin Ashrarul Haq Sifat Instructor: Dr. Bert Huang Virginia Tech 12/13/2017

Myers-Briggs Type Indicator What is MBTI Myers-Briggs Type Indicator

Motivation freeform writing: a great degree of personal expression Neuro-scientific background Application in research, business, fun, and many more

Objectives Identify a correlation between writing styles and psychological personalities Evaluate accuracy of MTBI predictor Convert textual representation of freeform writing into feature representation Explore the state-of-the-arts techniques for this prediction task Express the necessity of fancy machine learning models (RNN, ConvNets, CNNs, etc.) in this area

Prior Work Big Five Personality Inventory MBTI Web crawlers to collect data SVM model Estimation accuracy 80% MBTI close relation of brain neurons to written communication short-term memory based recurrent neural network 37% accuracy

Dataset MBTI Dataset from kaggle Not balanced: possibility of biasness 8765 examples 1500 words in each

Data Cleaning

Traditional Model Naïve Bayes – count method tried this method to see the learning works for a basic model Multi-Layer Perceptron - Vector representation Genism Word2Vec embeddings Turn each example into a 32-dimensional vector matrix of 8675 x 32 dimension

Improved Model Principle Component Analysis CountVectorizer maximum number of features : 5000 normalized TF or TF-IDF representation

Axis:

Multinomial Naive Bayes with TF-IDF and Count Vectorizer Logistic Regression with TF-IDF and Count Vectorizer Multi-Layer Perceptron with TF-IDF and Count Vectorizer

Results The Naïve Bayes : 19% accuracy

MLP (basic counting) : 22% accuracy

Multinomial Naïve Bayes: 53% accuracy

Logistic Regression : 64% accuracy

MLP : 48% accuracy

Comparison of Models Model name Accuracy Naïve Bayes (basic counting) 19% Multilayer Perceptron (Word to Vector) 22% Multinomial Naïve Bayes (Count Vectorizer and TF-IDF Similarity) 53% Logistic Regression (Count Vectorizer and TF-IDF Similarity) 64% Multilayer Perceptron (Count Vectorizer and TF-IDF Similarity) 48%

Summary

Conclusion