Word Embeddings to Enhance Twitter Gang Member Profile Identification Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)Kno.e.sis Wright.

Slides:



Advertisements
Similar presentations
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Advertisements

BEHAVIORAL PREDICTION OF TWITTER USERS BASED ON TEXTUAL INFORMATION Shiyao Wang.
Sarcasm Detection on Twitter A Behavioral Modeling Approach
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
The Criminal and Routine Activities of Gang Members Online Scott Decker SCHOOL OF CRIMINOLOGY AND CRIMINAL JUSTICE.
Analyzing the Social Media Footprint of Street Gangs 1 Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)Kno.e.sis 2 Center for Urban.
Commentary-based Video Categorization and Concept Discovery By Janice Leung.
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
A Statistical and Schema Independent Approach to Identify Equivalent Properties on Linked Data † Kno.e.sis Center Wright State University Dayton OH, USA.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Creating an Online Professional Presence Using Social Media.
Sampletalk Technology Presentation Andrew Gleibman
Analysis and Monetization of Social Data Amit P. Sheth Lexis-Nexis Ohio Eminent Scholar Director, Kno.e.sis Center, Wright State University.
On The Effectiveness of Kolmogorov Complexity Estimation to Discriminate Semantic Types Presenters: Enkh-Amgalan Baatarjav Kalyan Pathapati Subbu Satyajeet.
Presentation by: Bill Lage, Marketing Technology Manager RE/MAX North Central July 15, :00pm – 3:00pm.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Prediction of Influencers from Word Use Chan Shing Hei.
Gang Violence By: Christian Munoz. Tweets about my topic.
Linking Organizational Social Networking Profiles PROJECT ID: H JEROME CHENG ZHI KAI (A H ) 1.
Learning from the Crowd: Collaborative Filtering Techniques for Identifying On-the-Ground Twitterers during Mass Disruptions Kate Starbird University of.
Sarah Manuel Final Presentation MCO435-Social Media.
ENHANCING CLUSTERING BLOG DOCUMENTS BY UTILIZING AUTHOR/READER COMMENTS Beibei Li, Shuting Xu, Jun Zhang Department of Computer Science University of Kentucky.
Lanierhs.org > Academics > Media Center > Health: Illegal Drugs Project Download and save TO YOUR H:Home Drive some colors you might want to use in your.
How Chapters Can use Social Media Mark Storace Sacramento Chapter Jan 2011.
Big Data Processing of School Shooting Archives
Internet Business Associate v2.0
Roots of Trauma What social media tell us about trauma and loss among youth in urban settings Desmond Upton Patton, PhD, MSW Assistant Professor, Columbia.
Finding Street Gang Members on Twitter
EmojiNet: A Machine Readable Emoji Sense Inventory
Queensland University of Technology
BRAIN Alliance Research Team Annual Progress Report (Jul – Feb
EmojiNet: An Open Service and API for Emoji Sense Discovery
Group Members: Chloe Ng (2)
Gangs.
Signals Revealing Street Gang Members on Twitter
EmojiNet: Building a Machine Readable Sense Inventory for Emoji
By : Namesh Kher Big Data Insights – INFM 750
Jingcheng Du, B.S., Jun Xu, Ph.D., Hsingyi Song, MPH, Cui Tao, Ph.D.
Sentiment analysis tools
Social Media Guidelines for Nursing Students
Saliency-guided Video Classification via Adaptively weighted learning
More leads, More enquiries, More sales
University of Computer Studies, Mandalay
Sentiment Analysis Study
A Semantics-Based Measure of
Learn how to make the most of your social media without feeling overwhelmed or technologically challenged.
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Cyber Monday Membership Campaign
Lexical: Words vs. Characters Syntactic and Stylistic
Deceptive News Prediction Clickbait Score Inference
Digital Citizenship Parent Workshop Digital Footprints
Personalizing Search on Shared Devices
Analyzing and Securing Social Networks
Receiver Interpretations of Emoji Functions: A Gender Perspective
Finding Clusters within a Class to Improve Classification Accuracy
Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang
#VisualHashtags Visual Summarization of Social Media Events using Mid-Level Visual Elements Sonal Goel (IIIT-Delhi), Sarthak Ahuja (IBM Research, India),
Deep Learning Research & Application Center
iSRD Spam Review Detection with Imbalanced Data Distributions
Lesson 2: Internet Communication
Factogram: Discover your Instagram Digital Identity
Dr. Debaleena Chattopadhyay Department of Computer Science
Social Media Social media
Machine Learning for Visual Scene Classification with EEG Data
Interactive media.
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
Word embeddings (continued)
ENSC 427: COMMUNICATION NETWORKS SPRING 2018
Using Link Information to Enhance Web Page Classification
Presentation transcript:

Word Embeddings to Enhance Twitter Gang Member Profile Identification Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)Kno.e.sis Wright State University, Dayton, OH, USA Amit Sheth Derek Doran Presented at IJCAI Workshop on Semantic Machine Learning (SML 2016) New York City, NY, USA, July 10, 2016 Lakshika Balasuriya Sanjaya Wijeratne

Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification IJCAI 2016 Tweets Source – Image Source – / *Example tweets shown above are public data reported in the cited news paper article. They are not part of our research dataset.

IJCAI 2016 What does gang related research tell us? “Gangs use social media mainly to post videos depicting their illegal behaviors, watch videos, threaten rival gangs and their members, display firearms and money from drug sales” [Patton 2015, Morselli 2013] Studies have shown, 82% of gang members had used the Internet and 71% of them had used social media [Decker 2011] 45% of gang members have participated in online offensive activities and 8% of them have recruited new members online [Pyrooz 2013] Image Source – Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification

IJCAI 2016 There are other spectators too… Source – Image Source – Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification

5 But there’s a problem… Image Source – Wijeratne, Sanjaya et al. "Analyzing the Social Media Footprint of Street Gangs" in IEEE ISI 2015, doi: /ISI Source – Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification IJCAI 2016

6 Enhance Twitter Gang Member Profile Identification with Word Embeddings Image Source – Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016

IJCAI 2016 Gang Member Dataset Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Number of Words from each Feature Gang Member Class Non-gang Member Class Total Tweets3,825,09245,213,02749,038,119 Profile Descriptions3,34821,18224,530 Emoji732,7123,685,6694,418,381 YouTube Videos554,85710,459,23511,041,092 Profile Images10,16273,25283,414 Total5,126,17659,452,36564,578,536 DescriptionGang Member Class Non-gang Member Class Total Number of Profiles4002,8653,265 Number of Tweets821,4127,238,7588,060,170 Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016

IJCAI 2016 Feature Type – Tweets Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Words from Gang Member’s Tweets Words from Non-gang Member’s Tweets Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016

IJCAI 2016 Feature Type – Profile Descriptions Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016 *Example Twitter profiles shown above are public data reported in news articles. They are not part of our research dataset. Word usage in profile descriptions: Gang Vs. Non-gang members Percent of profiles that contain the word in profile description

IJCAI 2016 Feature Types – Emoji Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016 Emoji usage distribution: Gang Vs. Non-gang members Percent of profiles that contains the emoji in Tweets

IJCAI 2016 Feature Types – YouTube Links Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, % of gang members post at least one YouTube video 76.58% of them are related to gangster hip-hop A gang member shares 8 YouTube links on average Top 5 words from gang members – shit, like, nigga, fuck, lil Top 5 words from non-gang members – like, love, peopl, song, get

IJCAI 2016 Feature Types – Profile Images Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016 Image tags distribution: Gang Vs. Non-gang members Percent of profiles that contains the image tag

IJCAI 2016 Classification Approach Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Convert all non textual features to text Emoji for Python API for emoji 1 Clarifai API for profile and cover images 2 Remove stop word and stem the remaining Train a Skip-gram model using textual features Negative sample rate = 10 Context window size = 5 Ignore words with min count = 5 Number of dimensions = | 2

IJCAI 2016 Classification Approach Cont. Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Classifier training with word embeddings

IJCAI 2016 Representing Training Examples Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification For each Twitter profile p, with n unique words, and the vector of the i th word in p denoted by w ip, we compute the feature vectors for V p in the following ways. Sum of word embeddings – Sum of word vectors for all words in Twitter profile p

IJCAI 2016 Representing Training Examples Cont. Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Mean of word embeddings – Mean of word vectors for all words in Twitter profile p Sum of word embeddings weighted by term frequency where term frequency of i th word in p is denoted by c ip

IJCAI 2016 Representing Training Examples Cont. Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Sum of word embeddings weighted by tf-idf where tf-idf of i th word in p is denoted by t ip Sum of word embeddings weighted by term frequency where term frequency of i th word in p is denoted by c ip.

IJCAI 2016 Classification Results Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Classification results based on 10-fold cross validation

IJCAI 2016 Results Interpretation Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Averaging the vector sum weighted by term frequency performs the best Out of the five vector based operations on word embeddings; Four of them gave classifiers that beat baseline model(1) Two of them performed as nearly equal or beat baseline model(2)

IJCAI 2016 Results Interpretation Cont. Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Word vector (top 10) for BDK showed Five were rival gangs of Black Disciples Two were syntactic variations of BDK Three were syntactic variations of GDK Word vector (top 10) for GDK showed Six were Gangster Disciple gangs where others have expressed their hatred towards them Word vector (top 10) for CPDK showed Other keywords showing hatred towards cops such as fuck12, fuckthelaw, fuk12 and gang names

IJCAI 2016 Challenges and Future work  Build image recognition systems that can accurately identify images commonly posted by gang members  Guns, Drugs, Money, Gang Signs  Integrate slang dictionaries and use them to pre- train word embeddings  Hipwiki.com  Utilize social networks of gang members to identify new gang members Image Source – Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification

IJCAI 2016 Connect with Image Source – Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification

IJCAI 2016 Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification

References [1] D. U. Patton, “Gang violence, crime, and substance use on twitter: A snapshot of gang communications in Detroit,” Society for Social Work and Research 19th Annual Conference: The Social and Behavioral Importance of Increased Longevity, Jan 2015 [2]C. Morselli and D. Decary-Hetu, “Crime facilitation purposes of social networking sites: A review and analysis of the cyber banging phenomenon,” Small Wars & Insurgencies, vol. 24, no. 1, pp. 152–170, 2013 [3]S. Decker and D. Pyrooz, “Leaving the gang: Logging off and moving on. Council on foreign relations,” 2011 IJCAI 2016 Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification

References Cont. [4] D. C. Pyrooz, S. H. Decker, and R. K. Moule Jr, “Criminal and routine activities in online settings: Gangs, offenders, and the internet,” Justice Quarterly, no. ahead-of-print, pp. 1–29, 2013 [5] S. Wijeratne, D. Doran, A. Sheth, and J. L. Dustin, “Analyzing the social media footprint of street gangs”, In Proc. of IEEE ISI, 2015, pages 91–96, May [6]L. Balasuriya, S. Wijeratne, D. Doran, and A. Sheth, “Finding street gang members on twitter”, In Technical Report, Wright State University, IJCAI 2016 Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification