Automated Personality Classification

Slides:



Advertisements
Similar presentations
Recommender System A Brief Survey.
Advertisements

Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Farag Saad i-KNOW 2014 Graz- Austria,
Large-Scale Entity-Based Online Social Network Profile Linkage.
Finding your friends and following them to where you are by Adam Sadilek, Henry Kautz, Jeffrey P. Bigham Presented by Guang Ling 1.
Particle swarm optimization for parameter determination and feature selection of support vector machines Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen,
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Partitioned Logistic Regression for Spam Filtering Ming-wei Chang University of Illinois at Urbana-Champaign Wen-tau Yih and Christopher Meek Microsoft.
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
Collaborative Filtering in iCAMP Max Welling Professor of Computer Science & Statistics.
The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the.
ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy,
Lie Detection using NLP Techniques
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Opinion mining in social networks Student: Aleksandar Ponjavić 3244/2014 Mentor: Profesor dr Veljko Milutinović.
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
The School of Electrical Engineering University of Belgrade.
Appendix: The WEKA Data Mining Software
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
NL Question-Answering using Naïve Bayes and LSA By Kaushik Krishnasamy.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Analysing Microarray Data Using Bayesian Network Learning Name: Phirun Son Supervisor: Dr. Lin Liu.
A Language Independent Method for Question Classification COLING 2004.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.
TEXT CLASSIFICATION USING MACHINE LEARNING Student: Hung Vo Course: CP-SC 881 Instructor: Professor Luo Feng Clemson University 04/27/2011.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Spam Detection Ethan Grefe December 13, 2013.
PIER Research Methods Protocol Analysis Module Hua Ai Language Technologies Institute/ PSLC.
Prediction of Influencers from Word Use Chan Shing Hei.
Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma
Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Author Age Prediction from Text using Linear Regression Dong Nguyen Noah A. Smith Carolyn P. Rose.
Artificial Intelligence Final Project Text document Classification with new type Rule-based PLM Chang, Jung Woo Shin, Dong In Jung, Hyun Joon School of.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Nuhi BESIMI, Adrian BESIMI, Visar SHEHU
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new.
Using Linguistic Analysis and Classification Techniques to Identify Ingroup and Outgroup Messages in the Enron Corpus.
Borja Sanz, Igor Santos, Carlos Laorden, Xabier Ugarte-Pedrero and Pablo Garcia Bringas The 9th Annual IEEE Consumer Communications and Networking Conference.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.
Stock market forecasting using LASSO Linear Regression model
Personality Classification: Computational Intelligence in Psychology and Social Networks A. Kartelj, School of Mathematics, Belgrade V. Filipovic, School.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
By: Shannon Silessi Gender Identification of SMS Texts.
Understanding unstructured texts via Latent Dirichlet Allocation Raphael Cohen DSaaS, EMC IT June 2015.
IDENTIFYING GREAT TEACHERS THROUGH THEIR ONLINE PRESENCE Evanthia Faliagka, Maria Rigou, Spiros Sirmakessis.
Sentiment analysis algorithms and applications: A survey
Advanced data mining with TagHelper and Weka
School of Computer Science & Engineering
Source: Procedia Computer Science(2015)70:
The ACM SAC, Coimbra, Portugal, March 18-22, 2013 TUTORIAL on DATA MINING from SOCIAL, KNOWLEDGE, and SENSOR NETWORKS Babovic, Zoran 1 Bajec, Marko.
Using Friendship Ties and Family Circles for Link Prediction
Prepared by: Mahmoud Rafeek Al-Farra
Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Information Retrieval
Sofia Pediaditaki and Mahesh Marina University of Edinburgh
Kanchana Ihalagedara Rajitha Kithuldeniya Supun weerasekara
NAÏVE BAYES CLASSIFICATION
Presentation transcript:

Automated Personality Classification A. KARTELJ and V. FILIPOVIC School of Mathematics, University of Belgrade, Serbia and V. MILUTINOVIC School of Electrical Engineering, University of Belgrade, Serbia

Agenda Problem overview Classification of the existing solutions Presentation of the existing solutions Comparison of the solutions Work in progress: Bayesian Structure Learning for the APC Future work: Video Based APC Conclusions 3.10.2012 MULTI 2012

Problem Overview 3.10.2012 MULTI 2012

The Big 5 Model Openness to experience – (inventive/curious vs. consistent/cautious). Appreciation for art, emotion, adventure, unusual ideas, curiosity, and variety of experience. Openness reflects the degree of intellectual curiosity, creativity and a preference for novelty and variety. Some disagreement remains about how to interpret the openness factor, which is sometimes called "intellect" rather than openness to experience. Conscientiousness – (efficient/organized vs. easy-going/careless). A tendency to show self-discipline, act dutifully, and aim for achievement; planned rather than spontaneous behavior; organized, and dependable. Extraversion – (outgoing/energetic vs. solitary/reserved). Energy, positive emotions, surgency, assertiveness, sociability and the tendency to seek stimulation in the company of others, and talkativeness. Agreeableness – (friendly/compassionate vs. cold/unkind). A tendency to be compassionate and cooperative rather than suspicious and antagonistic towards others. Neuroticism – (sensitive/nervous vs. secure/confident). The tendency to experience unpleasant emotions easily, such as anger, anxiety, depression, or vulnerability. Neuroticism also refers to the degree of emotional stability and impulse control, and is sometimes referred by its low pole – "emotional stability". 3.10.2012 MULTI 2012

The Steps in Our Research Survey paper (under review at ACM CSUR) Research paper: A new APC model based on Bayesian structure learning (in progress) Real-purpose application of the APC model from step 2 Go to step 3 3.10.2012 MULTI 2012

Elements of APC Corpus: Personality measurement: Model: Essay, weblog, email, news group, Twitter counts... Personality measurement: Questionnaire (internet and written). We are searching for an alternative! Model: Stylistic analysis, linguistic features, machine learning techniques 3.10.2012 MULTI 2012

Applications Social networks – friend suggestions, dating sites (finding compatible partners) Youtube, TripAdvisor, Google, eBay – personality based recommendations Customer targeting, advertisement Other usages – police, anti-terrorism etc. 3.10.2012 MULTI 2012

Mining People’s Characteristics Authorship – who is an author of some non-signed piece of text? Gender – is an author male or female? Mood, emotions – emotions conveyed through text? Opinion – mining opinion from text (positive, negative, …)? Personality 3.10.2012 MULTI 2012

Classification of Solutions C1 criterion separates solutions by type of conversation (1 = self-reflexive, N = continuous) C2 criterion separates solutions by approach (TD = top-down, DD = data-driven, or HY = hybrid) 3.10.2012 MULTI 2012

Linguistic Styles: Language Use as an Individual Difference Pennebaker and King [1999] 3.10.2012 MULTI 2012

LIWC and MRC Features Feature Type Example Anger words LIWC Hate, kill Metaphysical issues God, heaven, coffin Physical state / function Ache, breast, sleep Inclusive words With, and, include Social processes Talk, us, friend Family members Mom, brother, cousin Past tense verbs Walked, were, had References to friends Pal, buddy, coworker Imagery of words MRC Low: future, peace – High: table, car Syllables per word Low: a – High: uncompromisingly Concreteness Low: patience, candor – High: ship Frequency of use Low: duly, nudity – High: he, the LIWC dictionary that represents a part of the text analysis framework LIWC (Linguistic Inquiry and Word Count) developed by Pennebaker et al. [2001]. LIWC categorizes words into meaningful psychological categories. Coltheart [1981] proposed the MRC, a psycholinguistic database of words categorized by various linguistic features of text, such as: imagery, concrete- ness, frequency of usage, etc. 3.10.2012 MULTI 2012

What Are They Blogging About What Are They Blogging About? Personality, Topic and Motivation in Blogs Gill et al. [2009] 3.10.2012 MULTI 2012

Taking Care of the Linguistic Features of Extraversion Gill and Oberlander [2002] 3.10.2012 MULTI 2012

Personality Based Latent Friendship Mining Wang et al. [2009] 3.10.2012 MULTI 2012

A Comparative Evaluation of Personality Estimation Algorithms for the TWIN Recommender System Roshchina et al. [2011] 3.10.2012 MULTI 2012

Predicting Personality with Social Media Golbeck et al. [2011] 3.10.2012 MULTI 2012

Our Twitter Profiles, Our Selves: Predicting Personality with Twitter Quercia et al. [2011] 3.10.2012 MULTI 2012

M5’ rules, Gaussian processes 12 [Celli 2012] 1065 posts Paper Input Corpus Features Algorithm Soft. Cit. I S A R [Pennebaker and King 1999] text essays LIWC correlations n/a 455 H M [Mairesse et al. 2007] text, speech LIWC, MRC C4.5, NB, SMO, M5’ Weka 99 [Gill et al. 2009] weblogs (14.8words) linear regression 26 [Yarkoni 2010] weblogs (100K words) 21 [Gill and Oberlander 2002] emails (105 students) bigrams bigram analysis 49 L [Nowson et al. 2005] weblogs (410K words) word list 48 [Oberlander 2006] weblogs (410K words) N-grams NB, SMO 53 [Wang et al. 2009] text, weblogs (200 pairs) lexical freq. ,TFIDF logistic regression Minitab 1 [Iacobelli et al. 2011] weblogs (3000) LIWC, bigrams, SVM, SMO, NB.. [Argamon et al. 2005] word list, conj. SMO 38 [Argamon et al. 2007] Weka, ATMan 45 [Mairesse and Walker 2006] text , conv. extracts 96 persons (≈100Kwords) LIWC, MRC, utterance… RankBoost 22 [Rigby and Hassan 2007] mail. lists (140K emails) C4.5 Weka, SPSS 30 [Roshchina et al. 2011] TripAdvisor reviews LIWC, MRC Linear, M5, SVM 2 [Quercia et al. 2011] meta 335 Twitter users Twitter counts M5’ rules 5 [Golbeck et al. 2011] text, meta 279 FB users 5 classes (161 in total) M5’ rules, Gaussian processes 12 [Celli 2012] 1065 posts 22 ling. Features majority-based classification I – implementation cost S – scalability A – availability R – reliability 3.10.2012 MULTI 2012

Naive Bayes Classifier Naive Bayes, Oberlander [2006] 3.10.2012 MULTI 2012

Naive Bayes and Bayesian Network 3.10.2012 MULTI 2012

Bayesian Network for the APC 3.10.2012 MULTI 2012

Bayesian Network Structure Learning Obtain corpus (training set T) Fit T to appropriate network structure by: ILP formulation + solver (CPLEX, Gurobi…) on smaller instances Apply metaheuristic on larger instances Validate quality of metaheuristic approach Compare obtained APC accuracy with other approaches 3.10.2012 MULTI 2012

Other Ideas Games with a purpose (GWAP) Clustering personality characteristics 3.10.2012 MULTI 2012

Packing everything together: Video Based APC 3.10.2012 MULTI 2012

Conclusions Classification of the existing solutions (Survey paper) Filling the gaps inside classification tree Introducing Bayesian Structure Learning for the APC Utilizing metaheuristics in dealing with high dimensionality APC potential: social networks, recommender, and expert systems 3.10.2012 MULTI 2012

THANK YOU! Aleksandar Kartelj kartelj@matf.bg.ac.rs Vladimir Filipovic vladaf@matf.bg.ac.rs Veljko Milutinovic vm@etf.bg.ac.rs