Attacking Strategies Analysis on Social Media

Slides:



Advertisements
Similar presentations
Request Dispatching for Cheap Energy Prices in Cloud Data Centers
Advertisements

SpringerLink Training Kit
Luminosity measurements at Hadron Colliders
From Word Embeddings To Document Distances
Choosing a Dental Plan Student Name
Virtual Environments and Computer Graphics
Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI
THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –
D. Phát triển thương hiệu
NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN
Điều trị chống huyết khối trong tai biến mạch máu não
BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.
Nasal Cannula X particulate mask
Evolving Architecture for Beyond the Standard Model
HF NOISE FILTERS PERFORMANCE
Electronics for Pedestrians – Passive Components –
Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel
L-Systems and Affine Transformations
CMSC423: Bioinformatic Algorithms, Databases and Tools
Some aspect concerning the LMDZ dynamical core and its use
Bayesian Confidence Limits and Intervals
实习总结 (Internship Summary)
Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,
Front End Electronics for SOI Monolithic Pixel Sensor
Face Recognition Monday, February 1, 2016.
Solving Rubik's Cube By: Etai Nativ.
CS284 Paper Presentation Arpad Kovacs
انتقال حرارت 2 خانم خسرویار.
Summer Student Program First results
Theoretical Results on Neutrinos
HERMESでのHard Exclusive生成過程による 核子内クォーク全角運動量についての研究
Wavelet Coherence & Cross-Wavelet Transform
yaSpMV: Yet Another SpMV Framework on GPUs
Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.
MOCLA02 Design of a Compact L-­band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Fuel cell development program for electric vehicle
Overview of TST-2 Experiment
Optomechanics with atoms
داده کاوی سئوالات نمونه
Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium  
ლექცია 4 - ფული და ინფლაცია
10. predavanje Novac i financijski sustav
Wissenschaftliche Aussprache zur Dissertation
FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,
Particle acceleration during the gamma-ray flares of the Crab Nebular
Interpretations of the Derivative Gottfried Wilhelm Leibniz
Advisor: Chiuyuan Chen Student: Shao-Chun Lin
Widow Rockfish Assessment
SiW-ECAL Beam Test 2015 Kick-Off meeting
On Robust Neighbor Discovery in Mobile Wireless Networks
Chapter 6 并发:死锁和饥饿 Operating Systems: Internals and Design Principles
You NEED your book!!! Frequency Distribution
Y V =0 a V =V0 x b b V =0 z
Fairness-oriented Scheduling Support for Multicore Systems
Climate-Energy-Policy Interaction
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Ch48 Statistics by Chtan FYHSKulai
The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.
Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs
Online Learning: An Introduction
Factor Based Index of Systemic Stress (FISS)
What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.
THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*
Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.
The Toroidal Sporadic Source: Understanding Temporal Variations
FW 3.4: More Circle Practice
ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف
Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM
Limits on Anomalous WWγ and WWZ Couplings from DØ
Presentation transcript:

Attacking Strategies Analysis on Social Media SFW -- Remember to include a list of your pubs at the end Chun-Ming Lai Computer Science, University of California, Davis

Social Media Exerting significant impact on mass communication Traditional Media Social Media Datasize Less More User Type Reader Editor/Reporter Time-based Delayed Real time Social Media Exerting significant impact on mass communication Top-down, Authoritative, vs. distributed, skim SFW – “Editor/Reporter” and “reader” 11/28/2018

Traditional communication Authoratative 11/28/2018

Social Media Distributed Distributed, node will be controlled 11/28/2018

Facebook.com/63811549237/posts/10153038271604238 2014, 12-19, 03:06 am 11/28/2018

GMT+0 11/28/2018

11/28/2018 Total: 609 comments

The absence of capable guardians Major Dimensions Likely offender (Attacker Bahavior) Malicious URLs Facebook Social Media Dataset Targets / Environments /Impact of campaigns Attackers digital footprints Suitable Targets (Targets posts, pages) The absence of capable guardians (potential audience) Routine Activity Theory (RAT) (L. Cohen, 1979) SFW – I think we need a slide, like this, earlier to spell out your target topic. SFW – you have been talking for 20 minutes and then you described your problem. 11/28/2018

Security Threat Severe Threat Medium to light Threat New type Threat Phishing Malware, drive-by-download Medium to light Threat Advertisement Spamming (Fund-raising, porn, canned messages, etc.) New type Threat Rumors, Media manipulation, sign up, vote stuffing, etc. Fake News Crowdturfing = CrowdSourcing + Astroturfing Sometimes it’s hard to evaluate “spamming” New SFW – Likefarm? Is that ContentFarm? 11/28/2018

Impact Personal OSN site Society Privacy, personal info leakage User Interface, Quality of Experience (user retention, etc.) OSN site Attempt to sway to public opinions, elections Destroy one’s credit Society Personal : Privacy, malware, phishing, OSNs: User Interface, Quality of Experience Society: Attempt to sway (or influence public opinions, elections) Destroy one’s social credit SFW – what do you mean “QoE”? 11/28/2018

Difficulty & Challenge Heterogeneous and huge data Text, media, transaction, etc. Labeled Data is precious Different Criteria Data size and type New Patterns of Online Service Application Bursts, Facebook Live, Game, etc. SFW – how about different user or group behaviors? Do we want to cover Apps? 11/28/2018

Hopefully Contribution (3W1H) Suitable Targets (Targets posts, pages) Hopefully Contribution (3W1H) Where ?? US, Middle East, Asia, etc. Politics, sports, entertainment, etc. How efficiency ? Audience, User experience, etc. Search Engine Spam, phishing, social media manipulation, sign up, etc. Who ? Fake, net army, compromised, etc. What are these Malicious URLs for ? The absence of capable guardians (potential audience) Who are those attackers? What other activities do they have on Facebook? Are they compromised accounts or fake accounts? Where are those pages that more likely to be spread Malicious URLs? What’s the relationship among these targets pages? What are these Malicious URLs for? How efficiency is for each malicious URL? How many users have seen and been affected by the URL? SFW – not only what you like to find answers, but also what are the new innovations necessary to obtain those answers!! Likely offender (Attacker Behavior) 11/28/2018

Distributed, trustworthy 11/28/2018

Outline Introduction Related Work & Evaluation Tools Suitable Targets Potential Audience Attackers Behavior Future Work In this section, we will introduce several 11/28/2018

Related work Context Filter (V. Balakrishnan 2016, C. Grier 2010, G. Stringhini 2010 ) Blacklists Text structure & pattern User-profile (K. Lee, 2010) Geography, personal info. created (updated) time profile pictures Briefly introduce context, user profile, network-based, blacklists 11/28/2018

Related Work (cont’) Behavior-driven signal (C. Cao 2015, G. Wang 2013) Clicks Likes Shares Network-based (B. Viswanath 2010) Edge: friend, like similarity, etc. Static or dynamic Margin groups Find one, and clustering Combine 4 categories to do so Blackmarkets SFW – you should provide some sample references for these related work 11/28/2018

Evaluation Tools VirusTotal URLBlacklists API, 60+ security engine support, Avira, Kapersky, Google Safebrowsing, etc. URLBlacklists File based, 100+ categories, 10,000,000 + domain Ads, porn, drug, weapon, etc. 11/28/2018

Labeled Data Sorted blacklists Sorted url_parsed with prefix Black.com Black1.net Phish.com … …. d.Com c.d.com b.c.d.com a.b.c.d … Labeled Data 𝑂( log 𝑛𝑚 ) 11/28/2018

Outline Introduction Related Work & Evaluation Tools Suitable Targets Potential Audience Attackers Behavior Future Work 11/28/2018

Suitable Targets Problem Any post thread p in social media platform, predict whether p contains at least one malicious comment via a classifier – c {target,nontarget} SFW – we need to have a better organized presentation for problems. SFW – the defenders concern might be different – we need to consider the risk factor 11/28/2018

Key idea: Life Cycle of Posts 10 hrs Shelf Life, skim messages, can “catch” ones eyes only , enlarge the influence https://www.facebook.com/barackobama/posts/10151673679836749 https://www.facebook.com/cnn/posts/313652498762911 SFW – ask the audience “which post has higher prob to be attacked”? 11/28/2018

Popularity Attention is everything !!! Avg. Time: FB/ 50 mins, sports/ 17 mins [FB / NYT] Liking, commenting, sharing, reading, etc. Interdisciplinary Works – Economy, advertisement, communication Output: tweets counts, FB shares / comments, total clicks, etc. Input: content, topic, number of comments after a short time, etc. Theory: Information Cascade, bandwagon effect, attention economy, etc. Reference: (A. Tatar, 2011), (C. Castillo, 2010), (K. Wang, 2015) Economy, advertisement, communication SFW – How does FB push/deliver the information to your users? SFW – Interdisciplinary (should these be related work?) And, give references? 11/28/2018

Definition Time Series (TS) TScreated(post): the time an original article is posted TSj: a time period j following the time of the original TSfinal: the end of our observation Accumulated Number of participants (AccNcomment) The number of post comments between TSi and TS(i-1) Discussion Atmosphere Vector (DAV) SFW – watch out for the transition into this slide. SFW – do you want to provide one example for all or most of the slides? SFW – I feel that you should give an example to explain. SFW – Definition**s** 11/28/2018

Example TScreated(Climate) = 2014-12-19 03:06:42 Suppose j = 5, final = 120 DAV(Climate) = [# of comments 03:06:42 ~ 03:11:42 1st # of comments 03:11:42 ~ 03:16:42 2nd … # of comments 05:01:42 ~ 05:06:42] 24th 11/28/2018

Dataset Totally 42,703,463 2011~2014 Ten Main Media pages on Facebook 11/28/2018

Dynamic time evolving Features 11/28/2018

Several static features Spanning time (Shelf-life) Time(last comment) – Time (post time) # of comments Total # of cmts regarding posts users, likes, etc. SFW – write down definition side by side. Several static features 11/28/2018

Near Real Time SFW – how to interpret 10 minutes? (what is the total time and attack time)? Results 11/28/2018

Next question: prefer which stage? Early Lead the discussion in the beginning User Interface Late Notification function New coming Audience Middle or random The advantage of two increases slightly, peaks, and experiences a long-tail decay Panic SFW – this one is important. Need to say it better. SFW – Also, FB changed the way of their organization and notification (in 2015 or 2016) 11/28/2018

Discussion (1/2) 9420 comments have been detected, provided by 5026 accounts SFW – CDF of WHAT? (we probably need more definitions, and need to get more examples) 11/28/2018

Discussion (2/2) Discussion (2/2) SFW – we need better and slower explanation with examples and key points regarding your result. Time duration between two consecutive malicious comments in the same page Discussion (2/2) 11/28/2018

Remarks Predict Suitable Targets successfully with temporal features Attackers: Follow or not? Defenders: Deploy resource Temporal Analysis with different variables Stage Exact time after post created Time duration between two consecutive malicious comments in the same page SFW – explain “Exact time after last attack” 11/28/2018

Outline Introduction Related Work & Evaluation Tools Suitable Targets Potential Audience Attackers Behavior Future Work SFW – should have a better structure./// 11/28/2018

Why study Effectiveness Communication is trying to influence others. Qualitative and quantitative analysis for each mURL. Risk Assessment and control Suitable Targets are the objects. 11/28/2018

Intuitive thinking How many people have seen/clicked the message? (Directly) Hard to get entire data since recommending system Communication User intention to rejoin Shelf-live period Feedback SFW – How do you know “been notified”? SFW – BTW< what is Shelf life? 11/28/2018

Estimate Audience Action Within 𝛿𝑡 in Page G action—comment, like, angry, reaction, etc. T0 - 𝛿𝑡 T0 (attack) T0 + 𝛿𝑡 11/28/2018

Basic Result – 5,10,15,20 minutes 11/28/2018

Indirect influence – final comments Predicting final comments/visits using post’ early stage reaction Distribution matrix Dij (j participants within i minutes) Prediction Matrix Mij SFW – practice more on this slide and maybe you can use an example. SFW – why is final comment important? (What do you by several work have been done?) 11/28/2018

Example 4 Posts with final comments: D56 = {A,B,C} A (100), B (101), C (102), D (2) D56 = {A,B,C} Input a post E got 6 comments within first 5 minutes Probably > 100 (lower bound) ~90% accuracy 11/28/2018

Result SFW – what does this mean? SFW – can you choose “Popular” non-target? SFW – and, the meaning about this comparison SFW – should mention some future work 11/28/2018

Some future work More accurate prediction > 100 v.s. 100~200 Pick “popular ” from Non-Target Some pages have lots of low popularity posts Target posts Non-Target posts 11/28/2018

Remarks Direct Estimation Indirect Estimation Twindow, , hundreds of audiences will be influenced Indirect Estimation Impact to life cycle (even popular) 11/28/2018

Outline Introduction Related Work & Evaluation Tools Suitable Targets Potential Audience Attackers Behavior Future Work 11/28/2018

Work Review Network-based Behavior, profile based Social Media Manipulation Sign up Search Engine Spamming Vote Stuffing Network-based Static: Margin Dynamic: Deviation Behavior, profile based No or google images Anomaly Detection Not just classification Fake, compromised Not just a classification problem between SFW – need to work on this – Why ad hoc? SFW – model for different accounts 11/28/2018

Accounts other activities From previous experiment, 5026 malicious accounts were identified 40,000 + pages on Facebook (2011-2016) >70% accounts don’t have “like” Like is easier SFW – accounts (compromised or fake accounts) SFW – why no/less likes? 9420 comments have been detected, provided by 5026 accounts 11/28/2018

SFW – only for the 5000 attackers 11/28/2018

Accounts footprints Response time to post thread Vote Stuffing Ten comments to ten different articles Remain online to “lead’ discussion Commenting time Vector = SFW – response time SFW – what do we mean “Lead” SFW – mentioned “privacy” SFW – the content is the same SFW – advertisement – vote-stuffing SFW – compromised or fake or ??? SFW – mention “future work” – activitist – his active inconsistent with the content of the post (self-serving). Vote Stuffing 11/28/2018

Normal v.s. Malicious accounts Malicious accounts like to comment in the late Legitimate accounts commits after a fixed time from original article 11/28/2018

Same content, multiple accounts One message, multiple accounts (red) One account, same but different post threads (green) SFW – a network of user accounts? SFW – what is the innovation from these examples? SFW – your talk will be like a lot of case study but how to converge? 11/28/2018

Outline Introduction Related Work & Evaluation Tools Suitable Targets Potential Audience Attackers Behavior Future Work 11/28/2018

The absence of capable guardians Concluding Remarks Who are attackers? For what? Likely offender (Attacker Bahavior) Suitable Targets (Targets posts, pages) The absence of capable guardians (potential audience) How efficiency? Where are targets? 11/28/2018

Text and Sentiment Analysis Different categories of posts thread Politics, commercial, entertainment, etc. Topic and sentiment around campaigns Challenges multiple languages Fuzzy, subculture word choice 11/28/2018

Characterize mURLs Lots of mURLs ads, porn, malware, phishing Detail: Fund raiser, case reporter, drive-by download, etc Will hurt users or not, using VM Challenge Binary  Characterization is hard Required manually checking 11/28/2018

Get close to real world Users daily activity Event-based Time-zone, geography Different stages of posts thread Event-based Rally in Virginia Shooting in San Bernadino Fake news 11/28/2018

Timeline Topic Process Time Impact and Effectiveness of mURLs 70% 2-3 months Text and sentiment analysis 40% 5-6 months Characterize mURLs Just started 1-1.5 year Accounts activities 20% 1 year Event-based, timezone Innovations Just Started 1-2 years SFW – look good, there should be a period of time to develop innovations 11/28/2018

Related Publications Wang, Keith C., Chun-Ming Lai , Teng Wang, and S. Felix Wu. "Bandwagon Effect in Facebook Discussion Groups." In Proceedings of the ASE BigData & SocialInformatics 2015, p. 17. ACM, 2015. Wang, Teng, Chunsheng Victor Fang, Chun-Ming Lai, and S. Felix Wu. "Triaging Anomalies in Dynamic Graphs: Towards Reducing False Positives." In Smart City/SocialCom/SustainCom (SmartCity), 2015 IEEE International Conference on, pp. 354-359. IEEE, 2015. Yunfeng Hong, Yongjian Hu, Chun-Ming Lai, S. Felix Wu, Iulian Neamtiu, Yu Paul, Hasan Cam and Gail-Joon Ahn, "Defining and Detecting Environment Discrimination in Android Apps." Accepted by SECURECOMM, 2017 Chun-Ming Lai, Xiaoyun Wang, Yunfeng Hong, Yu-Cheng Lin, S. Felix Wu, Patrick McDaniel, Hasan Cam “Attacking Strategies and Temporal Analysis Involving Facebook Discussion Groups.” Submitted to International Conference on Network and Service Management (CNSM) 2017 Yunfeng Hong, Yu-Cheng Lin, Chun-Ming Lai, S. Felix Wu, George Barnett, “Profiling Facebook Public Page Graph.” Submitted to Social Computing and Semantic Data Mining (ICNC), 2017 11/28/2018

Thank you! Q & A SFW – what have been done? Whether you can justify some of your work is fundamental and not just incremental and applied? SFW – balance between contributions to CS versus Social Science 11/28/2018

Reference (1/5) 11 election stories that went viral on Facebook http://www.businessinsider.com/fake-presidential-election-news- viral-facebook-trump-clinton-2016-11/#4-ireland-is-officially- accepting-trump-refugees-from-america-8 Global social media research summary 2017 http://www.smartinsights.com/social-media-marketing/social-media- strategy/new-global-social-media-research/ Lawrence Cohen and Marcus Felson, « Social Change and Crime Rate Trends : A Routine Activity Approach », American Sociological Review, 44 (4), 1979, pp. 588–608 11/28/2018

Reference (2/5) https://www.nytimes.com/2016/05/06/business/facebook-bends- the-rules-of-audience-engagement-to-its-advantage.html Tatar, Alexandru, et al. "Predicting the popularity of online articles based on user comments." Proceedings of the International Conference on Web Intelligence, Mining and Semantics. ACM, 2011. Kim, Su-Do, Sung-Hwan Kim, and Hwan-Gue Cho. "Predicting the virtual temperature of web-blog articles as a measurement tool for online popularity." Computer and Information Technology (CIT), 2011 IEEE 11th International Conference on. IEEE, 2011. 11/28/2018

Reference (3/5) Yu, Bei, Miao Chen, and Linchi Kwok. "Toward predicting popularity of social marketing messages." International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction. Springer, Berlin, Heidelberg, 2011. Lakkaraju, Himabindu, and Jitendra Ajmera. "Attention prediction on social media brand pages." Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 2011. Bandari, Roja, Sitaram Asur, and Bernardo A. Huberman. "The pulse of news in social media: Forecasting popularity." ICWSM 12 (2012): 26-33. 11/28/2018

Reference (4/5) Pinto, Henrique, Jussara M. Almeida, and Marcos A. Gonçalves. "Using early view patterns to predict the popularity of youtube videos." Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 2013. Wang, Keith C., et al. "Bandwagon Effect in Facebook Discussion Groups." Proceedings of the ASE BigData & SocialInformatics 2015. ACM, 2015. Harsule, Sneha R., and Mininath K. Nighot. "N-Gram Classifier System to Filter Spam Messages from OSN User Wall." Innovations in Computer Science and Engineering. Springer Singapore, 2016. 21-28. 11/28/2018

Reference(5/5) Lee, Kyumin, James Caverlee, and Steve Webb. "Uncovering social spammers: social honeypots+ machine learning." Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010. Ma, Jialin, et al. "A message topic model for multi-grain SMS spam filtering." International Journal of Technology and Human Interaction (IJTHI) 12.2 (2016): 83-95. 11/28/2018

Back up slides Comments in target posts : comments in non-target posts: 6:4 11/28/2018

Backup slides Experiment environment Setup Maria DB and Social Crawler at UC Davis (cyrus.cs.ucdavis.edu) Scikit-learn (classifier) Setup Parse URL from Database HTTP request to recover (time consuming) Send to VirusTotal Blacklists matching Depends on pages, usually takes several days / page 11/28/2018

Parameter Naïve Bayes: The likelihood of the features is assumed to be Gaussian Adaboost: # of estimators = 50, learning rate = 1, algorithm: ‘SAMME.R’ Decision Tree: min_samples_split = 2 and min_samples_leaf = 1, as depth, nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. 11/28/2018

How to pick those pages? Most representative social media around the world, selected by numbers of comments Crawled by Social Interactive Networking and Conversation Entropy Ranking (SINCERE) http:sincere.se Are able to include more pages 11/28/2018

Compared with other works Platform Personal networks vs social media Twitter vs Facebook pages Data source Crowdsoucing vs unknown Evaluation Honeypot vs real data 11/28/2018

Backup Slides Why RAT ? Crime is always there, and need to know the intention and prevent them RAT states that crime is inevitable, which is best to describe the scenario of cybercrime. causality vs correlation From data perspective, or even AI, identifying causality is hard 11/28/2018

Data-driven work Imbalanced dataset, normal vs targets post 1:100 Two ways Oversampling or downsampling (SMOTE) Different matrix to measure 11/28/2018

Introduction Definition of Online Groups (S. Johnson, 2010) Participants share common interests. Group membership is voluntary and unrestricted. Participation is clearly visible, allowing individuals to accurately identify participation status. The collective is recognized as a group by outside observers. What’s is social media or online groups , closure 封閉性 SFW – What is data-driven? And, why do we limit ourselves to that? 11/28/2018

Indirect est. – In a post thread Notification Mechanism Reacted T0(first comment) Tend (last comment) n: total number of comments Li: Number of Likes 11/28/2018

Effect function for malicious account SFW – what is effect function? SFW – what is the connection between this result with your first result? 11/28/2018

Social Media Cont’ 11/28/2018

Examples Lots of security chanllenges occurs because of distributed mode SFW – have you read those four Fake News papers? Motivation 11/28/2018