Presentation is loading. Please wait.

Presentation is loading. Please wait.

Political Party, Gender, and Age Classification Based on Political Blogs Michelle Hewlett and Elizabeth Lingg.

Similar presentations


Presentation on theme: "Political Party, Gender, and Age Classification Based on Political Blogs Michelle Hewlett and Elizabeth Lingg."— Presentation transcript:

1 Political Party, Gender, and Age Classification Based on Political Blogs Michelle Hewlett and Elizabeth Lingg

2 Introduction Can individuals be classified by their writing style? Do people under 25 use different punctuation than those over 25? Do they use different words and phrases? Can you figure out someone’s political ideologies by analyzing their writing using probabilistic methods?

3 Classifier Hold Out Cross Validation 80% of Data in Training Set 20% of Data in Test Set Classify Bloggers using a Feature Vector Features generated from training data

4 Features Most frequent unigrams, bigrams, trigrams “Bush”, “troops in Iraq”, “McCain” Sentence length, Word length Punctuation Pronoun usage

5 Features Compute feature probabilities based on frequency in the training data If women use the word “myself” three times as often as men use the word “myself,” P(female|myself) = 75% Pick features which are not 50/50 male/female or 50/50 Republican/Democrat

6 Classification Using the feature vector to classify, bloggers with a low probability of being a Republican were classified as Democrat Writers with high Probability of being a Republican were classified as Republican Writers with moderate Probability were not classified or “Unknown”

7 Classifier Results

8

9

10 Clustering K-means clustering algorithm used with entire data set Used sum of absolute differences instead of Euclidean distance because our differences were so small Initialized centroids to a reasonable guess

11 Clustering Results o Democrat Cluster 1 * Democrat Cluster 2 o Republican Cluster 1 * Republican Cluster 2 o Unknown Cluster 1 * Unknown Cluster 2

12 Clustering Results o Male Cluster 1 * Male Cluster 2 o Female Cluster 1 * Female Cluster 2 o Unknown Cluster 1 * Unknown Cluster 2

13 Conclusion It is possible to identify the characteristics of a writer based on writing style, words and phrases! Political Party gave the best results, followed by Gender, then Age

14 Future Work Generalize results with a larger data set and greater number of features Generalize results in a different domain Possibly implement linear regressions, logistic regressions, SVM


Download ppt "Political Party, Gender, and Age Classification Based on Political Blogs Michelle Hewlett and Elizabeth Lingg."

Similar presentations


Ads by Google