Presentation is loading. Please wait.

Presentation is loading. Please wait.

Attacking Strategies Analysis on Social Media

Similar presentations


Presentation on theme: "Attacking Strategies Analysis on Social Media"— Presentation transcript:

1 Attacking Strategies Analysis on Social Media
SFW -- Remember to include a list of your pubs at the end Chun-Ming Lai Computer Science, University of California, Davis

2 Social Media Exerting significant impact on mass communication
Traditional Media Social Media Datasize Less More User Type Reader Editor/Reporter Time-based Delayed Real time Social Media Exerting significant impact on mass communication Top-down, Authoritative, vs. distributed, skim SFW – “Editor/Reporter” and “reader” 11/28/2018

3 Traditional communication
Authoratative 11/28/2018

4 Social Media Distributed Distributed, node will be controlled
11/28/2018

5 Facebook.com/63811549237/posts/10153038271604238 2014, 12-19, 03:06 am
11/28/2018

6 GMT+0 11/28/2018

7 11/28/2018 Total: 609 comments

8 The absence of capable guardians
Major Dimensions Likely offender (Attacker Bahavior) Malicious URLs Facebook Social Media Dataset Targets / Environments /Impact of campaigns Attackers digital footprints Suitable Targets (Targets posts, pages) The absence of capable guardians (potential audience) Routine Activity Theory (RAT) (L. Cohen, 1979) SFW – I think we need a slide, like this, earlier to spell out your target topic. SFW – you have been talking for 20 minutes and then you described your problem. 11/28/2018

9 Security Threat Severe Threat Medium to light Threat New type Threat
Phishing Malware, drive-by-download Medium to light Threat Advertisement Spamming (Fund-raising, porn, canned messages, etc.) New type Threat Rumors, Media manipulation, sign up, vote stuffing, etc. Fake News Crowdturfing = CrowdSourcing + Astroturfing Sometimes it’s hard to evaluate “spamming” New SFW – Likefarm? Is that ContentFarm? 11/28/2018

10 Impact Personal OSN site Society Privacy, personal info leakage
User Interface, Quality of Experience (user retention, etc.) OSN site Attempt to sway to public opinions, elections Destroy one’s credit Society Personal : Privacy, malware, phishing, OSNs: User Interface, Quality of Experience Society: Attempt to sway (or influence public opinions, elections) Destroy one’s social credit SFW – what do you mean “QoE”? 11/28/2018

11 Difficulty & Challenge
Heterogeneous and huge data Text, media, transaction, etc. Labeled Data is precious Different Criteria Data size and type New Patterns of Online Service Application Bursts, Facebook Live, Game, etc. SFW – how about different user or group behaviors? Do we want to cover Apps? 11/28/2018

12 Hopefully Contribution (3W1H)
Suitable Targets (Targets posts, pages) Hopefully Contribution (3W1H) Where ?? US, Middle East, Asia, etc. Politics, sports, entertainment, etc. How efficiency ? Audience, User experience, etc. Search Engine Spam, phishing, social media manipulation, sign up, etc. Who ? Fake, net army, compromised, etc. What are these Malicious URLs for ? The absence of capable guardians (potential audience) Who are those attackers? What other activities do they have on Facebook? Are they compromised accounts or fake accounts? Where are those pages that more likely to be spread Malicious URLs? What’s the relationship among these targets pages? What are these Malicious URLs for? How efficiency is for each malicious URL? How many users have seen and been affected by the URL? SFW – not only what you like to find answers, but also what are the new innovations necessary to obtain those answers!! Likely offender (Attacker Behavior) 11/28/2018

13 Distributed, trustworthy
11/28/2018

14 Outline Introduction Related Work & Evaluation Tools Suitable Targets
Potential Audience Attackers Behavior Future Work In this section, we will introduce several 11/28/2018

15 Related work Context Filter (V. Balakrishnan 2016, C. Grier 2010, G. Stringhini 2010 ) Blacklists Text structure & pattern User-profile (K. Lee, 2010) Geography, personal info. created (updated) time profile pictures Briefly introduce context, user profile, network-based, blacklists 11/28/2018

16 Related Work (cont’) Behavior-driven signal (C. Cao 2015, G. Wang 2013) Clicks Likes Shares Network-based (B. Viswanath 2010) Edge: friend, like similarity, etc. Static or dynamic Margin groups Find one, and clustering Combine 4 categories to do so Blackmarkets SFW – you should provide some sample references for these related work 11/28/2018

17 Evaluation Tools VirusTotal URLBlacklists
API, 60+ security engine support, Avira, Kapersky, Google Safebrowsing, etc. URLBlacklists File based, 100+ categories, 10,000,000 + domain Ads, porn, drug, weapon, etc. 11/28/2018

18 Labeled Data Sorted blacklists Sorted url_parsed with prefix Black.com
Black1.net Phish.com …. d.Com c.d.com b.c.d.com a.b.c.d Labeled Data 𝑂( log 𝑛𝑚 ) 11/28/2018

19 Outline Introduction Related Work & Evaluation Tools Suitable Targets
Potential Audience Attackers Behavior Future Work 11/28/2018

20 Suitable Targets Problem
Any post thread p in social media platform, predict whether p contains at least one malicious comment via a classifier – c {target,nontarget} SFW – we need to have a better organized presentation for problems. SFW – the defenders concern might be different – we need to consider the risk factor 11/28/2018

21 Key idea: Life Cycle of Posts
10 hrs Shelf Life, skim messages, can “catch” ones eyes only , enlarge the influence SFW – ask the audience “which post has higher prob to be attacked”? 11/28/2018

22 Popularity Attention is everything !!!
Avg. Time: FB/ 50 mins, sports/ 17 mins [FB / NYT] Liking, commenting, sharing, reading, etc. Interdisciplinary Works – Economy, advertisement, communication Output: tweets counts, FB shares / comments, total clicks, etc. Input: content, topic, number of comments after a short time, etc. Theory: Information Cascade, bandwagon effect, attention economy, etc. Reference: (A. Tatar, 2011), (C. Castillo, 2010), (K. Wang, 2015) Economy, advertisement, communication SFW – How does FB push/deliver the information to your users? SFW – Interdisciplinary (should these be related work?) And, give references? 11/28/2018

23 Definition Time Series (TS)
TScreated(post): the time an original article is posted TSj: a time period j following the time of the original TSfinal: the end of our observation Accumulated Number of participants (AccNcomment) The number of post comments between TSi and TS(i-1) Discussion Atmosphere Vector (DAV) SFW – watch out for the transition into this slide. SFW – do you want to provide one example for all or most of the slides? SFW – I feel that you should give an example to explain. SFW – Definition**s** 11/28/2018

24 Example TScreated(Climate) = 2014-12-19 03:06:42
Suppose j = 5, final = 120 DAV(Climate) = [# of comments 03:06:42 ~ 03:11:42 1st # of comments 03:11:42 ~ 03:16:42 2nd # of comments 05:01:42 ~ 05:06:42] 24th 11/28/2018

25 Dataset Totally 42,703,463 2011~2014 Ten Main Media pages on Facebook
11/28/2018

26 Dynamic time evolving Features
11/28/2018

27 Several static features
Spanning time (Shelf-life) Time(last comment) – Time (post time) # of comments Total # of cmts regarding posts users, likes, etc. SFW – write down definition side by side. Several static features 11/28/2018

28 Near Real Time SFW – how to interpret 10 minutes? (what is the total time and attack time)? Results 11/28/2018

29 Next question: prefer which stage?
Early Lead the discussion in the beginning User Interface Late Notification function New coming Audience Middle or random The advantage of two increases slightly, peaks, and experiences a long-tail decay Panic SFW – this one is important. Need to say it better. SFW – Also, FB changed the way of their organization and notification (in 2015 or 2016) 11/28/2018

30 Discussion (1/2) 9420 comments have been detected, provided by 5026 accounts SFW – CDF of WHAT? (we probably need more definitions, and need to get more examples) 11/28/2018

31 Discussion (2/2) Discussion (2/2)
SFW – we need better and slower explanation with examples and key points regarding your result. Time duration between two consecutive malicious comments in the same page Discussion (2/2) 11/28/2018

32 Remarks Predict Suitable Targets successfully with temporal features
Attackers: Follow or not? Defenders: Deploy resource Temporal Analysis with different variables Stage Exact time after post created Time duration between two consecutive malicious comments in the same page SFW – explain “Exact time after last attack” 11/28/2018

33 Outline Introduction Related Work & Evaluation Tools Suitable Targets
Potential Audience Attackers Behavior Future Work SFW – should have a better structure./// 11/28/2018

34 Why study Effectiveness
Communication is trying to influence others. Qualitative and quantitative analysis for each mURL. Risk Assessment and control Suitable Targets are the objects. 11/28/2018

35 Intuitive thinking How many people have seen/clicked the message? (Directly) Hard to get entire data since recommending system Communication User intention to rejoin Shelf-live period Feedback SFW – How do you know “been notified”? SFW – BTW< what is Shelf life? 11/28/2018

36 Estimate Audience Action Within 𝛿𝑡 in Page G
action—comment, like, angry, reaction, etc. T0 - 𝛿𝑡 T0 (attack) T0 + 𝛿𝑡 11/28/2018

37 Basic Result – 5,10,15,20 minutes 11/28/2018

38 Indirect influence – final comments
Predicting final comments/visits using post’ early stage reaction Distribution matrix Dij (j participants within i minutes) Prediction Matrix Mij SFW – practice more on this slide and maybe you can use an example. SFW – why is final comment important? (What do you by several work have been done?) 11/28/2018

39 Example 4 Posts with final comments: D56 = {A,B,C}
A (100), B (101), C (102), D (2) D56 = {A,B,C} Input a post E got 6 comments within first 5 minutes Probably > 100 (lower bound) ~90% accuracy 11/28/2018

40 Result SFW – what does this mean?
SFW – can you choose “Popular” non-target? SFW – and, the meaning about this comparison SFW – should mention some future work 11/28/2018

41 Some future work More accurate prediction
> 100 v.s. 100~200 Pick “popular ” from Non-Target Some pages have lots of low popularity posts Target posts Non-Target posts 11/28/2018

42 Remarks Direct Estimation Indirect Estimation
Twindow, , hundreds of audiences will be influenced Indirect Estimation Impact to life cycle (even popular) 11/28/2018

43 Outline Introduction Related Work & Evaluation Tools Suitable Targets
Potential Audience Attackers Behavior Future Work 11/28/2018

44 Work Review Network-based Behavior, profile based
Social Media Manipulation Sign up Search Engine Spamming Vote Stuffing Network-based Static: Margin Dynamic: Deviation Behavior, profile based No or google images Anomaly Detection Not just classification Fake, compromised Not just a classification problem between SFW – need to work on this – Why ad hoc? SFW – model for different accounts 11/28/2018

45 Accounts other activities
From previous experiment, 5026 malicious accounts were identified 40,000 + pages on Facebook ( ) >70% accounts don’t have “like” Like is easier SFW – accounts (compromised or fake accounts) SFW – why no/less likes? 9420 comments have been detected, provided by 5026 accounts 11/28/2018

46 SFW – only for the 5000 attackers
11/28/2018

47 Accounts footprints Response time to post thread Vote Stuffing
Ten comments to ten different articles Remain online to “lead’ discussion Commenting time Vector = SFW – response time SFW – what do we mean “Lead” SFW – mentioned “privacy” SFW – the content is the same SFW – advertisement – vote-stuffing SFW – compromised or fake or ??? SFW – mention “future work” – activitist – his active inconsistent with the content of the post (self-serving). Vote Stuffing 11/28/2018

48 Normal v.s. Malicious accounts
Malicious accounts like to comment in the late Legitimate accounts commits after a fixed time from original article 11/28/2018

49 Same content, multiple accounts
One message, multiple accounts (red) One account, same but different post threads (green) SFW – a network of user accounts? SFW – what is the innovation from these examples? SFW – your talk will be like a lot of case study but how to converge? 11/28/2018

50 Outline Introduction Related Work & Evaluation Tools Suitable Targets
Potential Audience Attackers Behavior Future Work 11/28/2018

51 The absence of capable guardians
Concluding Remarks Who are attackers? For what? Likely offender (Attacker Bahavior) Suitable Targets (Targets posts, pages) The absence of capable guardians (potential audience) How efficiency? Where are targets? 11/28/2018

52 Text and Sentiment Analysis
Different categories of posts thread Politics, commercial, entertainment, etc. Topic and sentiment around campaigns Challenges multiple languages Fuzzy, subculture word choice 11/28/2018

53 Characterize mURLs Lots of mURLs ads, porn, malware, phishing
Detail: Fund raiser, case reporter, drive-by download, etc Will hurt users or not, using VM Challenge Binary  Characterization is hard Required manually checking 11/28/2018

54 Get close to real world Users daily activity Event-based
Time-zone, geography Different stages of posts thread Event-based Rally in Virginia Shooting in San Bernadino Fake news 11/28/2018

55 Timeline Topic Process Time Impact and Effectiveness of mURLs 70%
2-3 months Text and sentiment analysis 40% 5-6 months Characterize mURLs Just started 1-1.5 year Accounts activities 20% 1 year Event-based, timezone Innovations Just Started 1-2 years SFW – look good, there should be a period of time to develop innovations 11/28/2018

56 Related Publications Wang, Keith C., Chun-Ming Lai , Teng Wang, and S. Felix Wu. "Bandwagon Effect in Facebook Discussion Groups." In Proceedings of the ASE BigData & SocialInformatics 2015, p. 17. ACM, Wang, Teng, Chunsheng Victor Fang, Chun-Ming Lai, and S. Felix Wu. "Triaging Anomalies in Dynamic Graphs: Towards Reducing False Positives." In Smart City/SocialCom/SustainCom (SmartCity), 2015 IEEE International Conference on, pp IEEE, 2015. Yunfeng Hong, Yongjian Hu, Chun-Ming Lai, S. Felix Wu, Iulian Neamtiu, Yu Paul, Hasan Cam and Gail-Joon Ahn, "Defining and Detecting Environment Discrimination in Android Apps." Accepted by SECURECOMM, 2017 Chun-Ming Lai, Xiaoyun Wang, Yunfeng Hong, Yu-Cheng Lin, S. Felix Wu, Patrick McDaniel, Hasan Cam “Attacking Strategies and Temporal Analysis Involving Facebook Discussion Groups.” Submitted to International Conference on Network and Service Management (CNSM) 2017 Yunfeng Hong, Yu-Cheng Lin, Chun-Ming Lai, S. Felix Wu, George Barnett, “Profiling Facebook Public Page Graph.” Submitted to Social Computing and Semantic Data Mining (ICNC), 2017 11/28/2018

57 Thank you! Q & A SFW – what have been done? Whether you can justify some of your work is fundamental and not just incremental and applied? SFW – balance between contributions to CS versus Social Science 11/28/2018

58 Reference (1/5) 11 election stories that went viral on Facebook viral-facebook-trump-clinton /#4-ireland-is-officially- accepting-trump-refugees-from-america-8 Global social media research summary strategy/new-global-social-media-research/ Lawrence Cohen and Marcus Felson, « Social Change and Crime Rate Trends : A Routine Activity Approach », American Sociological Review, 44 (4), 1979, pp. 588–608 11/28/2018

59 Reference (2/5) the-rules-of-audience-engagement-to-its-advantage.html Tatar, Alexandru, et al. "Predicting the popularity of online articles based on user comments." Proceedings of the International Conference on Web Intelligence, Mining and Semantics. ACM, 2011. Kim, Su-Do, Sung-Hwan Kim, and Hwan-Gue Cho. "Predicting the virtual temperature of web-blog articles as a measurement tool for online popularity." Computer and Information Technology (CIT), IEEE 11th International Conference on. IEEE, 2011. 11/28/2018

60 Reference (3/5) Yu, Bei, Miao Chen, and Linchi Kwok. "Toward predicting popularity of social marketing messages." International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction. Springer, Berlin, Heidelberg, 2011. Lakkaraju, Himabindu, and Jitendra Ajmera. "Attention prediction on social media brand pages." Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 2011. Bandari, Roja, Sitaram Asur, and Bernardo A. Huberman. "The pulse of news in social media: Forecasting popularity." ICWSM 12 (2012): 11/28/2018

61 Reference (4/5) Pinto, Henrique, Jussara M. Almeida, and Marcos A. Gonçalves. "Using early view patterns to predict the popularity of youtube videos." Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 2013. Wang, Keith C., et al. "Bandwagon Effect in Facebook Discussion Groups." Proceedings of the ASE BigData & SocialInformatics ACM, 2015. Harsule, Sneha R., and Mininath K. Nighot. "N-Gram Classifier System to Filter Spam Messages from OSN User Wall." Innovations in Computer Science and Engineering. Springer Singapore, 11/28/2018

62 Reference(5/5) Lee, Kyumin, James Caverlee, and Steve Webb. "Uncovering social spammers: social honeypots+ machine learning." Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010. Ma, Jialin, et al. "A message topic model for multi-grain SMS spam filtering." International Journal of Technology and Human Interaction (IJTHI) 12.2 (2016): 11/28/2018

63 Back up slides Comments in target posts : comments in non-target posts: 6:4 11/28/2018

64 Backup slides Experiment environment Setup
Maria DB and Social Crawler at UC Davis (cyrus.cs.ucdavis.edu) Scikit-learn (classifier) Setup Parse URL from Database HTTP request to recover (time consuming) Send to VirusTotal Blacklists matching Depends on pages, usually takes several days / page 11/28/2018

65 Parameter Naïve Bayes: The likelihood of the features is assumed to be Gaussian Adaboost: # of estimators = 50, learning rate = 1, algorithm: ‘SAMME.R’ Decision Tree: min_samples_split = 2 and min_samples_leaf = 1, as depth, nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. 11/28/2018

66 How to pick those pages? Most representative social media around the world, selected by numbers of comments Crawled by Social Interactive Networking and Conversation Entropy Ranking (SINCERE) Are able to include more pages 11/28/2018

67 Compared with other works
Platform Personal networks vs social media Twitter vs Facebook pages Data source Crowdsoucing vs unknown Evaluation Honeypot vs real data 11/28/2018

68 Backup Slides Why RAT ? Crime is always there, and need to know the intention and prevent them RAT states that crime is inevitable, which is best to describe the scenario of cybercrime. causality vs correlation From data perspective, or even AI, identifying causality is hard 11/28/2018

69 Data-driven work Imbalanced dataset, normal vs targets post 1:100
Two ways Oversampling or downsampling (SMOTE) Different matrix to measure 11/28/2018

70 Introduction Definition of Online Groups (S. Johnson, 2010)
Participants share common interests. Group membership is voluntary and unrestricted. Participation is clearly visible, allowing individuals to accurately identify participation status. The collective is recognized as a group by outside observers. What’s is social media or online groups , closure 封閉性 SFW – What is data-driven? And, why do we limit ourselves to that? 11/28/2018

71 Indirect est. – In a post thread
Notification Mechanism Reacted T0(first comment) Tend (last comment) n: total number of comments Li: Number of Likes 11/28/2018

72 Effect function for malicious account
SFW – what is effect function? SFW – what is the connection between this result with your first result? 11/28/2018

73 Social Media Cont’ 11/28/2018

74 Examples Lots of security chanllenges occurs because of distributed mode SFW – have you read those four Fake News papers? Motivation 11/28/2018


Download ppt "Attacking Strategies Analysis on Social Media"

Similar presentations


Ads by Google