Attacking Strategies Analysis on Social Media SFW -- Remember to include a list of your pubs at the end Chun-Ming Lai Computer Science, University of California, Davis
Social Media Exerting significant impact on mass communication Traditional Media Social Media Datasize Less More User Type Reader Editor/Reporter Time-based Delayed Real time Social Media Exerting significant impact on mass communication Top-down, Authoritative, vs. distributed, skim SFW – “Editor/Reporter” and “reader” 11/28/2018
Traditional communication Authoratative 11/28/2018
Social Media Distributed Distributed, node will be controlled 11/28/2018
Facebook.com/63811549237/posts/10153038271604238 2014, 12-19, 03:06 am 11/28/2018
GMT+0 11/28/2018
11/28/2018 Total: 609 comments
The absence of capable guardians Major Dimensions Likely offender (Attacker Bahavior) Malicious URLs Facebook Social Media Dataset Targets / Environments /Impact of campaigns Attackers digital footprints Suitable Targets (Targets posts, pages) The absence of capable guardians (potential audience) Routine Activity Theory (RAT) (L. Cohen, 1979) SFW – I think we need a slide, like this, earlier to spell out your target topic. SFW – you have been talking for 20 minutes and then you described your problem. 11/28/2018
Security Threat Severe Threat Medium to light Threat New type Threat Phishing Malware, drive-by-download Medium to light Threat Advertisement Spamming (Fund-raising, porn, canned messages, etc.) New type Threat Rumors, Media manipulation, sign up, vote stuffing, etc. Fake News Crowdturfing = CrowdSourcing + Astroturfing Sometimes it’s hard to evaluate “spamming” New SFW – Likefarm? Is that ContentFarm? 11/28/2018
Impact Personal OSN site Society Privacy, personal info leakage User Interface, Quality of Experience (user retention, etc.) OSN site Attempt to sway to public opinions, elections Destroy one’s credit Society Personal : Privacy, malware, phishing, OSNs: User Interface, Quality of Experience Society: Attempt to sway (or influence public opinions, elections) Destroy one’s social credit SFW – what do you mean “QoE”? 11/28/2018
Difficulty & Challenge Heterogeneous and huge data Text, media, transaction, etc. Labeled Data is precious Different Criteria Data size and type New Patterns of Online Service Application Bursts, Facebook Live, Game, etc. SFW – how about different user or group behaviors? Do we want to cover Apps? 11/28/2018
Hopefully Contribution (3W1H) Suitable Targets (Targets posts, pages) Hopefully Contribution (3W1H) Where ?? US, Middle East, Asia, etc. Politics, sports, entertainment, etc. How efficiency ? Audience, User experience, etc. Search Engine Spam, phishing, social media manipulation, sign up, etc. Who ? Fake, net army, compromised, etc. What are these Malicious URLs for ? The absence of capable guardians (potential audience) Who are those attackers? What other activities do they have on Facebook? Are they compromised accounts or fake accounts? Where are those pages that more likely to be spread Malicious URLs? What’s the relationship among these targets pages? What are these Malicious URLs for? How efficiency is for each malicious URL? How many users have seen and been affected by the URL? SFW – not only what you like to find answers, but also what are the new innovations necessary to obtain those answers!! Likely offender (Attacker Behavior) 11/28/2018
Distributed, trustworthy 11/28/2018
Outline Introduction Related Work & Evaluation Tools Suitable Targets Potential Audience Attackers Behavior Future Work In this section, we will introduce several 11/28/2018
Related work Context Filter (V. Balakrishnan 2016, C. Grier 2010, G. Stringhini 2010 ) Blacklists Text structure & pattern User-profile (K. Lee, 2010) Geography, personal info. created (updated) time profile pictures Briefly introduce context, user profile, network-based, blacklists 11/28/2018
Related Work (cont’) Behavior-driven signal (C. Cao 2015, G. Wang 2013) Clicks Likes Shares Network-based (B. Viswanath 2010) Edge: friend, like similarity, etc. Static or dynamic Margin groups Find one, and clustering Combine 4 categories to do so Blackmarkets SFW – you should provide some sample references for these related work 11/28/2018
Evaluation Tools VirusTotal URLBlacklists API, 60+ security engine support, Avira, Kapersky, Google Safebrowsing, etc. URLBlacklists File based, 100+ categories, 10,000,000 + domain Ads, porn, drug, weapon, etc. 11/28/2018
Labeled Data Sorted blacklists Sorted url_parsed with prefix Black.com Black1.net Phish.com … …. d.Com c.d.com b.c.d.com a.b.c.d … Labeled Data 𝑂( log 𝑛𝑚 ) 11/28/2018
Outline Introduction Related Work & Evaluation Tools Suitable Targets Potential Audience Attackers Behavior Future Work 11/28/2018
Suitable Targets Problem Any post thread p in social media platform, predict whether p contains at least one malicious comment via a classifier – c {target,nontarget} SFW – we need to have a better organized presentation for problems. SFW – the defenders concern might be different – we need to consider the risk factor 11/28/2018
Key idea: Life Cycle of Posts 10 hrs Shelf Life, skim messages, can “catch” ones eyes only , enlarge the influence https://www.facebook.com/barackobama/posts/10151673679836749 https://www.facebook.com/cnn/posts/313652498762911 SFW – ask the audience “which post has higher prob to be attacked”? 11/28/2018
Popularity Attention is everything !!! Avg. Time: FB/ 50 mins, sports/ 17 mins [FB / NYT] Liking, commenting, sharing, reading, etc. Interdisciplinary Works – Economy, advertisement, communication Output: tweets counts, FB shares / comments, total clicks, etc. Input: content, topic, number of comments after a short time, etc. Theory: Information Cascade, bandwagon effect, attention economy, etc. Reference: (A. Tatar, 2011), (C. Castillo, 2010), (K. Wang, 2015) Economy, advertisement, communication SFW – How does FB push/deliver the information to your users? SFW – Interdisciplinary (should these be related work?) And, give references? 11/28/2018
Definition Time Series (TS) TScreated(post): the time an original article is posted TSj: a time period j following the time of the original TSfinal: the end of our observation Accumulated Number of participants (AccNcomment) The number of post comments between TSi and TS(i-1) Discussion Atmosphere Vector (DAV) SFW – watch out for the transition into this slide. SFW – do you want to provide one example for all or most of the slides? SFW – I feel that you should give an example to explain. SFW – Definition**s** 11/28/2018
Example TScreated(Climate) = 2014-12-19 03:06:42 Suppose j = 5, final = 120 DAV(Climate) = [# of comments 03:06:42 ~ 03:11:42 1st # of comments 03:11:42 ~ 03:16:42 2nd … # of comments 05:01:42 ~ 05:06:42] 24th 11/28/2018
Dataset Totally 42,703,463 2011~2014 Ten Main Media pages on Facebook 11/28/2018
Dynamic time evolving Features 11/28/2018
Several static features Spanning time (Shelf-life) Time(last comment) – Time (post time) # of comments Total # of cmts regarding posts users, likes, etc. SFW – write down definition side by side. Several static features 11/28/2018
Near Real Time SFW – how to interpret 10 minutes? (what is the total time and attack time)? Results 11/28/2018
Next question: prefer which stage? Early Lead the discussion in the beginning User Interface Late Notification function New coming Audience Middle or random The advantage of two increases slightly, peaks, and experiences a long-tail decay Panic SFW – this one is important. Need to say it better. SFW – Also, FB changed the way of their organization and notification (in 2015 or 2016) 11/28/2018
Discussion (1/2) 9420 comments have been detected, provided by 5026 accounts SFW – CDF of WHAT? (we probably need more definitions, and need to get more examples) 11/28/2018
Discussion (2/2) Discussion (2/2) SFW – we need better and slower explanation with examples and key points regarding your result. Time duration between two consecutive malicious comments in the same page Discussion (2/2) 11/28/2018
Remarks Predict Suitable Targets successfully with temporal features Attackers: Follow or not? Defenders: Deploy resource Temporal Analysis with different variables Stage Exact time after post created Time duration between two consecutive malicious comments in the same page SFW – explain “Exact time after last attack” 11/28/2018
Outline Introduction Related Work & Evaluation Tools Suitable Targets Potential Audience Attackers Behavior Future Work SFW – should have a better structure./// 11/28/2018
Why study Effectiveness Communication is trying to influence others. Qualitative and quantitative analysis for each mURL. Risk Assessment and control Suitable Targets are the objects. 11/28/2018
Intuitive thinking How many people have seen/clicked the message? (Directly) Hard to get entire data since recommending system Communication User intention to rejoin Shelf-live period Feedback SFW – How do you know “been notified”? SFW – BTW< what is Shelf life? 11/28/2018
Estimate Audience Action Within 𝛿𝑡 in Page G action—comment, like, angry, reaction, etc. T0 - 𝛿𝑡 T0 (attack) T0 + 𝛿𝑡 11/28/2018
Basic Result – 5,10,15,20 minutes 11/28/2018
Indirect influence – final comments Predicting final comments/visits using post’ early stage reaction Distribution matrix Dij (j participants within i minutes) Prediction Matrix Mij SFW – practice more on this slide and maybe you can use an example. SFW – why is final comment important? (What do you by several work have been done?) 11/28/2018
Example 4 Posts with final comments: D56 = {A,B,C} A (100), B (101), C (102), D (2) D56 = {A,B,C} Input a post E got 6 comments within first 5 minutes Probably > 100 (lower bound) ~90% accuracy 11/28/2018
Result SFW – what does this mean? SFW – can you choose “Popular” non-target? SFW – and, the meaning about this comparison SFW – should mention some future work 11/28/2018
Some future work More accurate prediction > 100 v.s. 100~200 Pick “popular ” from Non-Target Some pages have lots of low popularity posts Target posts Non-Target posts 11/28/2018
Remarks Direct Estimation Indirect Estimation Twindow, , hundreds of audiences will be influenced Indirect Estimation Impact to life cycle (even popular) 11/28/2018
Outline Introduction Related Work & Evaluation Tools Suitable Targets Potential Audience Attackers Behavior Future Work 11/28/2018
Work Review Network-based Behavior, profile based Social Media Manipulation Sign up Search Engine Spamming Vote Stuffing Network-based Static: Margin Dynamic: Deviation Behavior, profile based No or google images Anomaly Detection Not just classification Fake, compromised Not just a classification problem between SFW – need to work on this – Why ad hoc? SFW – model for different accounts 11/28/2018
Accounts other activities From previous experiment, 5026 malicious accounts were identified 40,000 + pages on Facebook (2011-2016) >70% accounts don’t have “like” Like is easier SFW – accounts (compromised or fake accounts) SFW – why no/less likes? 9420 comments have been detected, provided by 5026 accounts 11/28/2018
SFW – only for the 5000 attackers 11/28/2018
Accounts footprints Response time to post thread Vote Stuffing Ten comments to ten different articles Remain online to “lead’ discussion Commenting time Vector = SFW – response time SFW – what do we mean “Lead” SFW – mentioned “privacy” SFW – the content is the same SFW – advertisement – vote-stuffing SFW – compromised or fake or ??? SFW – mention “future work” – activitist – his active inconsistent with the content of the post (self-serving). Vote Stuffing 11/28/2018
Normal v.s. Malicious accounts Malicious accounts like to comment in the late Legitimate accounts commits after a fixed time from original article 11/28/2018
Same content, multiple accounts One message, multiple accounts (red) One account, same but different post threads (green) SFW – a network of user accounts? SFW – what is the innovation from these examples? SFW – your talk will be like a lot of case study but how to converge? 11/28/2018
Outline Introduction Related Work & Evaluation Tools Suitable Targets Potential Audience Attackers Behavior Future Work 11/28/2018
The absence of capable guardians Concluding Remarks Who are attackers? For what? Likely offender (Attacker Bahavior) Suitable Targets (Targets posts, pages) The absence of capable guardians (potential audience) How efficiency? Where are targets? 11/28/2018
Text and Sentiment Analysis Different categories of posts thread Politics, commercial, entertainment, etc. Topic and sentiment around campaigns Challenges multiple languages Fuzzy, subculture word choice 11/28/2018
Characterize mURLs Lots of mURLs ads, porn, malware, phishing Detail: Fund raiser, case reporter, drive-by download, etc Will hurt users or not, using VM Challenge Binary Characterization is hard Required manually checking 11/28/2018
Get close to real world Users daily activity Event-based Time-zone, geography Different stages of posts thread Event-based Rally in Virginia Shooting in San Bernadino Fake news 11/28/2018
Timeline Topic Process Time Impact and Effectiveness of mURLs 70% 2-3 months Text and sentiment analysis 40% 5-6 months Characterize mURLs Just started 1-1.5 year Accounts activities 20% 1 year Event-based, timezone Innovations Just Started 1-2 years SFW – look good, there should be a period of time to develop innovations 11/28/2018
Related Publications Wang, Keith C., Chun-Ming Lai , Teng Wang, and S. Felix Wu. "Bandwagon Effect in Facebook Discussion Groups." In Proceedings of the ASE BigData & SocialInformatics 2015, p. 17. ACM, 2015. Wang, Teng, Chunsheng Victor Fang, Chun-Ming Lai, and S. Felix Wu. "Triaging Anomalies in Dynamic Graphs: Towards Reducing False Positives." In Smart City/SocialCom/SustainCom (SmartCity), 2015 IEEE International Conference on, pp. 354-359. IEEE, 2015. Yunfeng Hong, Yongjian Hu, Chun-Ming Lai, S. Felix Wu, Iulian Neamtiu, Yu Paul, Hasan Cam and Gail-Joon Ahn, "Defining and Detecting Environment Discrimination in Android Apps." Accepted by SECURECOMM, 2017 Chun-Ming Lai, Xiaoyun Wang, Yunfeng Hong, Yu-Cheng Lin, S. Felix Wu, Patrick McDaniel, Hasan Cam “Attacking Strategies and Temporal Analysis Involving Facebook Discussion Groups.” Submitted to International Conference on Network and Service Management (CNSM) 2017 Yunfeng Hong, Yu-Cheng Lin, Chun-Ming Lai, S. Felix Wu, George Barnett, “Profiling Facebook Public Page Graph.” Submitted to Social Computing and Semantic Data Mining (ICNC), 2017 11/28/2018
Thank you! Q & A SFW – what have been done? Whether you can justify some of your work is fundamental and not just incremental and applied? SFW – balance between contributions to CS versus Social Science 11/28/2018
Reference (1/5) 11 election stories that went viral on Facebook http://www.businessinsider.com/fake-presidential-election-news- viral-facebook-trump-clinton-2016-11/#4-ireland-is-officially- accepting-trump-refugees-from-america-8 Global social media research summary 2017 http://www.smartinsights.com/social-media-marketing/social-media- strategy/new-global-social-media-research/ Lawrence Cohen and Marcus Felson, « Social Change and Crime Rate Trends : A Routine Activity Approach », American Sociological Review, 44 (4), 1979, pp. 588–608 11/28/2018
Reference (2/5) https://www.nytimes.com/2016/05/06/business/facebook-bends- the-rules-of-audience-engagement-to-its-advantage.html Tatar, Alexandru, et al. "Predicting the popularity of online articles based on user comments." Proceedings of the International Conference on Web Intelligence, Mining and Semantics. ACM, 2011. Kim, Su-Do, Sung-Hwan Kim, and Hwan-Gue Cho. "Predicting the virtual temperature of web-blog articles as a measurement tool for online popularity." Computer and Information Technology (CIT), 2011 IEEE 11th International Conference on. IEEE, 2011. 11/28/2018
Reference (3/5) Yu, Bei, Miao Chen, and Linchi Kwok. "Toward predicting popularity of social marketing messages." International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction. Springer, Berlin, Heidelberg, 2011. Lakkaraju, Himabindu, and Jitendra Ajmera. "Attention prediction on social media brand pages." Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 2011. Bandari, Roja, Sitaram Asur, and Bernardo A. Huberman. "The pulse of news in social media: Forecasting popularity." ICWSM 12 (2012): 26-33. 11/28/2018
Reference (4/5) Pinto, Henrique, Jussara M. Almeida, and Marcos A. Gonçalves. "Using early view patterns to predict the popularity of youtube videos." Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 2013. Wang, Keith C., et al. "Bandwagon Effect in Facebook Discussion Groups." Proceedings of the ASE BigData & SocialInformatics 2015. ACM, 2015. Harsule, Sneha R., and Mininath K. Nighot. "N-Gram Classifier System to Filter Spam Messages from OSN User Wall." Innovations in Computer Science and Engineering. Springer Singapore, 2016. 21-28. 11/28/2018
Reference(5/5) Lee, Kyumin, James Caverlee, and Steve Webb. "Uncovering social spammers: social honeypots+ machine learning." Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010. Ma, Jialin, et al. "A message topic model for multi-grain SMS spam filtering." International Journal of Technology and Human Interaction (IJTHI) 12.2 (2016): 83-95. 11/28/2018
Back up slides Comments in target posts : comments in non-target posts: 6:4 11/28/2018
Backup slides Experiment environment Setup Maria DB and Social Crawler at UC Davis (cyrus.cs.ucdavis.edu) Scikit-learn (classifier) Setup Parse URL from Database HTTP request to recover (time consuming) Send to VirusTotal Blacklists matching Depends on pages, usually takes several days / page 11/28/2018
Parameter Naïve Bayes: The likelihood of the features is assumed to be Gaussian Adaboost: # of estimators = 50, learning rate = 1, algorithm: ‘SAMME.R’ Decision Tree: min_samples_split = 2 and min_samples_leaf = 1, as depth, nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. 11/28/2018
How to pick those pages? Most representative social media around the world, selected by numbers of comments Crawled by Social Interactive Networking and Conversation Entropy Ranking (SINCERE) http:sincere.se Are able to include more pages 11/28/2018
Compared with other works Platform Personal networks vs social media Twitter vs Facebook pages Data source Crowdsoucing vs unknown Evaluation Honeypot vs real data 11/28/2018
Backup Slides Why RAT ? Crime is always there, and need to know the intention and prevent them RAT states that crime is inevitable, which is best to describe the scenario of cybercrime. causality vs correlation From data perspective, or even AI, identifying causality is hard 11/28/2018
Data-driven work Imbalanced dataset, normal vs targets post 1:100 Two ways Oversampling or downsampling (SMOTE) Different matrix to measure 11/28/2018
Introduction Definition of Online Groups (S. Johnson, 2010) Participants share common interests. Group membership is voluntary and unrestricted. Participation is clearly visible, allowing individuals to accurately identify participation status. The collective is recognized as a group by outside observers. What’s is social media or online groups , closure 封閉性 SFW – What is data-driven? And, why do we limit ourselves to that? 11/28/2018
Indirect est. – In a post thread Notification Mechanism Reacted T0(first comment) Tend (last comment) n: total number of comments Li: Number of Likes 11/28/2018
Effect function for malicious account SFW – what is effect function? SFW – what is the connection between this result with your first result? 11/28/2018
Social Media Cont’ 11/28/2018
Examples Lots of security chanllenges occurs because of distributed mode SFW – have you read those four Fake News papers? Motivation 11/28/2018