Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automated Personality Classification

Similar presentations


Presentation on theme: "Automated Personality Classification"— Presentation transcript:

1 Automated Personality Classification
A. KARTELJ and V. FILIPOVIC School of Mathematics, University of Belgrade, Serbia and V. MILUTINOVIC School of Electrical Engineering, University of Belgrade, Serbia

2 Agenda Problem overview Classification of the existing solutions
Presentation of the existing solutions Comparison of the solutions Work in progress: Bayesian Structure Learning for the APC Future work: Video Based APC Conclusions MULTI 2012

3 Problem Overview MULTI 2012

4 The Big 5 Model Openness to experience – (inventive/curious vs. consistent/cautious). Appreciation for art, emotion, adventure, unusual ideas, curiosity, and variety of experience. Openness reflects the degree of intellectual curiosity, creativity and a preference for novelty and variety. Some disagreement remains about how to interpret the openness factor, which is sometimes called "intellect" rather than openness to experience. Conscientiousness – (efficient/organized vs. easy-going/careless). A tendency to show self-discipline, act dutifully, and aim for achievement; planned rather than spontaneous behavior; organized, and dependable. Extraversion – (outgoing/energetic vs. solitary/reserved). Energy, positive emotions, surgency, assertiveness, sociability and the tendency to seek stimulation in the company of others, and talkativeness. Agreeableness – (friendly/compassionate vs. cold/unkind). A tendency to be compassionate and cooperative rather than suspicious and antagonistic towards others. Neuroticism – (sensitive/nervous vs. secure/confident). The tendency to experience unpleasant emotions easily, such as anger, anxiety, depression, or vulnerability. Neuroticism also refers to the degree of emotional stability and impulse control, and is sometimes referred by its low pole – "emotional stability". MULTI 2012

5 The Steps in Our Research
Survey paper (under review at ACM CSUR) Research paper: A new APC model based on Bayesian structure learning (in progress) Real-purpose application of the APC model from step 2 Go to step 3 MULTI 2012

6 Elements of APC Corpus: Personality measurement: Model:
Essay, weblog, , news group, Twitter counts... Personality measurement: Questionnaire (internet and written). We are searching for an alternative! Model: Stylistic analysis, linguistic features, machine learning techniques MULTI 2012

7 Applications Social networks – friend suggestions, dating sites (finding compatible partners) Youtube, TripAdvisor, Google, eBay – personality based recommendations Customer targeting, advertisement Other usages – police, anti-terrorism etc. MULTI 2012

8 Mining People’s Characteristics
Authorship – who is an author of some non-signed piece of text? Gender – is an author male or female? Mood, emotions – emotions conveyed through text? Opinion – mining opinion from text (positive, negative, …)? Personality MULTI 2012

9 Classification of Solutions
C1 criterion separates solutions by type of conversation (1 = self-reflexive, N = continuous) C2 criterion separates solutions by approach (TD = top-down, DD = data-driven, or HY = hybrid) MULTI 2012

10 Linguistic Styles: Language Use as an Individual Difference Pennebaker and King [1999]
MULTI 2012

11 LIWC and MRC Features Feature Type Example Anger words LIWC Hate, kill
Metaphysical issues God, heaven, coffin Physical state / function Ache, breast, sleep Inclusive words With, and, include Social processes Talk, us, friend Family members Mom, brother, cousin Past tense verbs Walked, were, had References to friends Pal, buddy, coworker Imagery of words MRC Low: future, peace – High: table, car Syllables per word Low: a – High: uncompromisingly Concreteness Low: patience, candor – High: ship Frequency of use Low: duly, nudity – High: he, the LIWC dictionary that represents a part of the text analysis framework LIWC (Linguistic Inquiry and Word Count) developed by Pennebaker et al. [2001]. LIWC categorizes words into meaningful psychological categories. Coltheart [1981] proposed the MRC, a psycholinguistic database of words categorized by various linguistic features of text, such as: imagery, concrete- ness, frequency of usage, etc. MULTI 2012

12 What Are They Blogging About
What Are They Blogging About? Personality, Topic and Motivation in Blogs Gill et al. [2009] MULTI 2012

13 Taking Care of the Linguistic Features of Extraversion Gill and Oberlander [2002]
MULTI 2012

14 Personality Based Latent Friendship Mining Wang et al. [2009]
MULTI 2012

15 A Comparative Evaluation of Personality Estimation Algorithms for the TWIN Recommender System Roshchina et al. [2011] MULTI 2012

16 Predicting Personality with Social Media Golbeck et al. [2011]
MULTI 2012

17 Our Twitter Profiles, Our Selves: Predicting Personality with Twitter Quercia et al. [2011]
MULTI 2012

18 M5’ rules, Gaussian processes 12 [Celli 2012] 1065 posts
Paper Input Corpus Features Algorithm Soft. Cit. I S A R [Pennebaker and King 1999] text essays LIWC correlations n/a 455 H M [Mairesse et al. 2007] text, speech LIWC, MRC C4.5, NB, SMO, M5’ Weka 99 [Gill et al. 2009] weblogs (14.8words) linear regression 26 [Yarkoni 2010] weblogs (100K words) 21 [Gill and Oberlander 2002] s (105 students) bigrams bigram analysis 49 L [Nowson et al. 2005] weblogs (410K words) word list 48 [Oberlander 2006] weblogs (410K words) N-grams NB, SMO 53 [Wang et al. 2009] text, weblogs (200 pairs) lexical freq. ,TFIDF logistic regression Minitab 1 [Iacobelli et al. 2011] weblogs (3000) LIWC, bigrams, SVM, SMO, NB.. [Argamon et al. 2005] word list, conj. SMO 38 [Argamon et al. 2007] Weka, ATMan 45 [Mairesse and Walker 2006] text , conv. extracts 96 persons (≈100Kwords) LIWC, MRC, utterance… RankBoost 22 [Rigby and Hassan 2007] mail. lists (140K s) C4.5 Weka, SPSS 30 [Roshchina et al. 2011] TripAdvisor reviews LIWC, MRC Linear, M5, SVM 2 [Quercia et al. 2011] meta 335 Twitter users Twitter counts M5’ rules 5 [Golbeck et al. 2011] text, meta 279 FB users 5 classes (161 in total) M5’ rules, Gaussian processes 12 [Celli 2012] 1065 posts 22 ling. Features majority-based classification I – implementation cost S – scalability A – availability R – reliability MULTI 2012

19 Naive Bayes Classifier
Naive Bayes, Oberlander [2006] MULTI 2012

20 Naive Bayes and Bayesian Network
MULTI 2012

21 Bayesian Network for the APC
MULTI 2012

22 Bayesian Network Structure Learning
Obtain corpus (training set T) Fit T to appropriate network structure by: ILP formulation + solver (CPLEX, Gurobi…) on smaller instances Apply metaheuristic on larger instances Validate quality of metaheuristic approach Compare obtained APC accuracy with other approaches MULTI 2012

23 Other Ideas Games with a purpose (GWAP)
Clustering personality characteristics MULTI 2012

24 Packing everything together: Video Based APC
MULTI 2012

25 Conclusions Classification of the existing solutions (Survey paper)
Filling the gaps inside classification tree Introducing Bayesian Structure Learning for the APC Utilizing metaheuristics in dealing with high dimensionality APC potential: social networks, recommender, and expert systems MULTI 2012

26 THANK YOU! Aleksandar Kartelj kartelj@matf.bg.ac.rs
Vladimir Filipovic Veljko Milutinovic


Download ppt "Automated Personality Classification"

Similar presentations


Ads by Google