Juweek Adolphe Zhaoyu Li Ressi Miranda Dr. Shang

Slides:

Advertisements

Similar presentations

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Advertisements

Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.

Farag Saad i-KNOW 2014 Graz- Austria,

Tweet Classification for Political Sentiment Analysis Micol Marchetti-Bowick.

Distant Supervision for Emotion Classification in Twitter posts 1/17.

Political Party, Gender, and Age Classification Based on Political Blogs Michelle Hewlett and Elizabeth Lingg.

Naïve Bayes Advanced Statistical Methods in NLP Ling572 January 19,

Named Entity Classification Chioma Osondu & Wei Wei.

Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.

A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)

Semantic Analysis of Movie Reviews for Rating Prediction

Text Classification: An Implementation Project Prerak Sanghvi Computer Science and Engineering Department State University of New York at Buffalo.

Sentence Classifier for Helpdesk s Anthony 6 June 2006 Supervisors: Dr. Yuval Marom Dr. David Albrecht.

Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.

Text Classification using SVM- light DSSI 2008 Jing Jiang.

Outline Classification Linear classifiers Perceptron Multi-class classification Generative approach Naïve Bayes classifier 2.

Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.

AUTOMATED TEXT CATEGORIZATION: THE TWO-DIMENSIONAL PROBABILITY MODE Abdulaziz alsharikh.

TEXT CLASSIFICATION USING MACHINE LEARNING Student: Hung Vo Course: CP-SC 881 Instructor: Professor Luo Feng Clemson University 04/27/2011.

Spam Detection Ethan Grefe December 13, 2013.

Text categorization Updated 11/1/2006. Performance measures – binary classification Accuracy: acc = (a+d)/(a+b+c+d) Precision: p = a/(a+b) Recall: r =

TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.

Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.

Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Transductive Inference for Text Classification using Support Vector Machines - Thorsten Joachims (1999) 서울시립대 전자전기컴퓨터공학부 데이터마이닝 연구실 G 노준호.

Extracting Hidden Components from Text Reviews for Restaurant Evaluation Juanita Ordonez Data Mining Final Project Instructor: Dr Shahriar Hossain Computer.

Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Nuhi BESIMI, Adrian BESIMI, Visar SHEHU

The Little Big Data Showdown ECE 8110 Pattern Recognition and Machine Learning Temple University Christian Ward.

Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li

Ping-Tsun Chang Intelligent Systems Laboratory NTU/CSIE Using Support Vector Machine for Integrating Catalogs.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.

Does one size really fit all? Evaluating classifiers in a Bag-of-Visual-Words classification Christian Hentschel, Harald Sack Hasso Plattner Institute.

Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier.

Big Data Processing of School Shooting Archives

A Simple Approach for Author Profiling in MapReduce

Learning to Detect and Classify Malicious Executables in the Wild by J

Kim Schouten, Flavius Frasincar, and Rommert Dekker

A Straightforward Author Profiling Approach in MapReduce

Sentiment analysis algorithms and applications: A survey

Support Vector Machines

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Efficient Image Classification on Vertically Decomposed Data

Future-oriented Benchmarking Through Social Media Analysis

Learning Mid-Level Features For Recognition

Shuang-Hong Yang, Hongyuan Zha, Bao-Gang Hu NIPS2009

Tackling the Poor Assumptions of Naive Bayes Text Classifiers Pubished by: Jason D.M.Rennie, Lawrence Shih, Jamime Teevan, David R.Karger Liang Lan 11/19/2007.

CS6604 Project Ensemble Classification

Natural Language Processing of Knee MRI Reports

Jan Rupnik Jozef Stefan Institute

Classifying enterprises by economic activity

Efficient Image Classification on Vertically Decomposed Data

Feature selection Usman Roshan.

Logistic Regression & Parallel SGD

Text Categorization Assigning documents to a fixed set of categories

Minimax Probability Machine (MPM)

Large Scale Support Vector Machines

Instance Based Learning

Information Retrieval

Clinically Significant Information Extraction from Radiology Reports

1.7.2 Multinomial Naïve Bayes

Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]

Introduction to Sentiment Analysis

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Wil Collins, Will Dickerson Client: Mohamed Magdy and CTRnet

Austin Karingada, Jacob Handy, Adviser : Dr

INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID

Presenter: Donovan Orn

Presentation transcript:

Juweek Adolphe Zhaoyu Li Ressi Miranda Dr. Shang Mid-Term Report Juweek Adolphe Zhaoyu Li Ressi Miranda Dr. Shang

Outline (Edited) Learning Experience Project Results Machine Learning Sentiment Analysis Project Results

Learning Experience Machine Learning Algorithms Naive Bayes (probability) Support Vector Machine (SVM) Stochastic Gradient Descent

Learning Experience Sentiment Analysis classify text into a polarity Text Classification into polarity categories Naive Bayes: Bernoulli Naive Bayes: Multinomial Stochastic Gradient Descent TF-IDF (Term frequency - inverse document frequency) Chi-Square Test Stochastic Gradient Descent was used because it works faster and more efficiently.

Why? Improve the accuracy of the algorithms Hope to get better results Even by a little bit Hope to get better results

Scheme/Project Let’s make a comparison between the different algorithm Comparing the algorithms accuracies Changing up features extraction

Methodology Extracting features Make a feature vector Select features Remove features Train Algorithm Test Algorithm

Issues Long time to train and cross-validate different Pipelines Formatting of code prevented inclusion of alternative classifiers (KNearestNeighbors, DecisionTree) Data set format might not be reliable (already processed) Accuracy rates lower than expected

Results

Results No Chi-Squared Chi-Squared Implemented Tfidf/Bi Tfidf/Uni Count/Bi Count/Uni Hash/Bi Hash/Uni MultinomialNB 0.550637716 0.550101526 0.55132977 0.550564977 0.548096016 0.549712898 BernoulliNB 0.550633557 0.548104329 SVM 0.51090564 Chi-Squared Implemented Tfidf/Bi Tfidf/Uni Count/Bi Count/Uni Hash/Bi Hash/Uni MultinomialNB 0.541179586 0.540986305 0.542239491 0.541505867 0.548867048 0.549660941 BernoulliNB 0.541210758 0.541809294 0.550138938 SVM 0.51090564

Results No Chi-Squared Chi-Squared Implemented Tfidf/Bi Tfidf/Uni Count/Bi Count/Uni Hash/Bi Hash/Uni MultinomialNB 0.550637716 0.550101526 0.55132977 0.550564977 0.548096016 0.549712898 BernoulliNB 0.550633557 0.548104329 SVM 0.51090564 Chi-Squared Implemented Tfidf/Bi Tfidf/Uni Count/Bi Count/Uni Hash/Bi Hash/Uni MultinomialNB 0.541179586 0.540986305 0.542239491 0.541505867 0.548867048 0.549660941 BernoulliNB 0.541210758 0.541809294 0.550138938 SVM 0.51090564

Findings MultinomialNB and BernoulliNB dramatically outperformed SGD Chi-squared generally reduces accuracy (30%) Highest overall was about Count/Multinomial/Uni+Bi No consistent correlation between difference in accuracy and usage of unigrams vs bigrams

What does this mean? We do not know Classifier can stand to be more accurate Experiments with additional datasets/algorithms have to be completed first Overall goal to scale to Big Data level

Future Work Figure out what makes our classifier less accurate from the standard No improvement Moving away from the previous project Previous projects were reinventing the wheel Implementing Naive Bayes in MapReduce

Demo of Text Classification