Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching, Malaysia

Social Media and Personal Data Dec 5, 2014AIRS 20142 Much personal information revealed in social media –Content, links, ratings  personal preferences All this information is useful to –Researchers: social science –Businesses: targeted advertising

User Biographies in Twitter Dec 5, 2014AIRS 20143 Self-introductions written in free form Reflect users’ background and interests

User Biographies in Twitter 4 profession interests age Around 28% of Singapore Twitter users and 50% of US Twitter users revealed their personal interests in their biographies. Dong Wei et. al. Who am I on Twitter?: A cross-country comparison. WWW’2014 Dec 5, 2014AIRS 2014

Outline Background Our task Syntactic patterns of interest tags Build training data + gold standard Method Experiments Summary 5 Dec 5, 2014AIRS 2014

Our task Automatically extract phrases that describe a user’s personal interests. –We call them “interest tags” –A typical information extraction problem. –Automatically build training data based on common syntactic patterns. 6 Dec 5, 2014AIRS 2014

Method Linear Chain CRF BIO labels 7 Dec 5, 2014AIRS 2014

Syntactic Patterns of Interest Tags 8 Based on manual annotation of 500 user biographies. 28.8% of user biographies contain meaningful interest tags. Dec 5, 2014AIRS 2014

Building Training Data Seed patterns: –Play + [NP] –[NP] + fan –Interested in + [NP] Steps: –Use seed patterns to extract noun phrases and rank them according to their frequency –Pick the top-100 ranked noun phrases and use them as positive instances to train CRF 9 Dec 5, 2014AIRS 2014

Features Syntactic or dependency features are not used as the Twitter text is noisy for parsing Both lexical and POS tag feature are used To avoid over-fitting: only features extracted from the surrounding tokens for each position are used 10 Dec 5, 2014AIRS 2014

Gold Standard Two annotators: graduate students 500 randomly sampled user biographies 1190 sentences –Two annotators disagree on 10 sentences –High agreement 11 Dec 5, 2014AIRS 2014

Experiment 12 BL-700: top 700 frequent phrases, we choose 700 because it gets the highest F-score among various numbers. Seed: use seed patterns to recognize interest tags Dec 5, 2014AIRS 2014

Extracted Patterns 13 Dec 5, 2014AIRS 2014 Some popular patterns are: [Interest tag] + fan/lover/enthusiast I love + [interest tag] [interest tag] is/are my life

Is it difficult to predict interest tags by users’ tweets? 14 Dec 5, 2014AIRS 2014

Is it difficult to predict interest tags by users’ tweets? We also applied Tf-idf ranking, which has been used to extract personalized user tags, to extract user interest tags. 15 Dec 5, 2014AIRS 2014 Interest tags extracted from user’s biographies are not necessarily reflected in a user’s post tweets. They can work as supplementary information when profiling a user.

Summary We studied the problem of extracting interest tags from Twitter user biographies We automatically built noisy training data based on syntactic patterns We trained CRF classifier on the noisy training data and achieved decent performance Interest tags extracted from Twitter user biographies may not be reflected in user’s tweets 16 Dec 5, 2014AIRS 2014

Thank you! Questions? 17 Dec 5, 2014AIRS 2014

Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Similar presentations

Presentation on theme: "Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Similar presentations

Presentation on theme: "Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,"— Presentation transcript:

Similar presentations

About project

Feedback