BotGraph: Large Scale Spamming Botnet Detection Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum Speaker: 林佳宜.

Slides:



Advertisements
Similar presentations
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Advertisements

Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Google News Personalization: Scalable Online Collaborative Filtering
Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
A Model of Computation for MapReduce
Analysis and Modeling of Social Networks Foudalis Ilias.
SOCELLBOT: A New Botnet Design to Infect Smartphones via Online Social Networking th IEEE Canadian Conference on Electrical and Computer Engineering(CCECE)
Optimus: A Dynamic Rewriting Framework for Data-Parallel Execution Plans Qifa Ke, Michael Isard, Yuan Yu Microsoft Research Silicon Valley EuroSys 2013.
1 A Spam Mail-based Solution for Botnet Detection and Network Bandwidth Protection 許富皓 資訊工程學系 中央大學 1.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
1 BotGraph: Large Scale Spamming Botnet Detection Yao Zhao EECS Department Northwestern University.
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao Yinglian Xie *, Fang Yu *, Qifa Ke *, Yuan Yu *, Yan Chen and Eliot Gillum ‡ EECS Department,
Distributed Iterative Training Kevin Gimpel Shay Cohen Severin Hacker Noah A. Smith.
CBLOCK: An Automatic Blocking Mechanism for Large-Scale Deduplication Tasks Ashwin Machanavajjhala Duke University with Anish Das Sarma, Ankur Jain, Philip.
SocialFilter: Introducing Social Trust to Collaborative Spam Mitigation Michael Sirivianos Telefonica Research Telefonica Research Joint work with Kyungbaek.
Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Shuang Hao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray,
S PAMMING B OTNETS : S IGNATURES AND C HARACTERISTICS Introduction of AutoRE Framework.
Pregel: A System for Large-Scale Graph Processing
Patterns And A Generative Model Jan 24, 2014 Authors: Jianwei Niu, Wanjiun Liao, Jing Peng, Chao Tong Presenter: Guoming Wang Published: Performance Computing.
Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari.
John P., Fang Yu, Yinglian Xie, Martin Abadi, Arvind Krishnamurthy University of California, Santa Cruz USENIX SECURITY SYMPOSIUM, August, 2010 John P.,
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
Speaker:Chiang Hong-Ren Botnet Detection by Monitoring Group Activities in DNS Traffic.
Location-aware MapReduce in Virtual Cloud 2011 IEEE computer society International Conference on Parallel Processing Yifeng Geng1,2, Shimin Chen3, YongWei.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
FluXOR: Detecting and Monitoring Fast-Flux Service Networks Emanuele Passerini, Roberto Paleari, Lorenzo Martignoni, and Danilo Bruschi 5th international.
2012 4th International Conference on Cyber Conflict C. Czosseck, R. Ottis, K. Ziolkowski (Eds.) 2012 © NATO CCD COE Publications, Tallinn 朱祐呈.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of.
Automated Social Hierarchy Detection through Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:
1 Characterizing Botnet from Spam Records Presenter: Yi-Ren Yeh ( 葉倚任 ) Authors: L. Zhuang, J. Dunagan, D. R. Simon, H. J. Wang, I. Osipkov, G. Hulten,
ACT: Attachment Chain Tracing Scheme for Virus Detection and Control Jintao Xiong Proceedings of the 2004 ACM workshop on Rapid malcode Presented.
Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
A VIRTUAL HONEYPOT FRAMEWORK Author : Niels Provos Publication: Usenix Security Symposium Presenter: Hiral Chhaya for CAP6103.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,
Studying Spamming Botnets Using Botlab 台灣科技大學資工所 楊馨豪 2009/10/201 Machine Learning And Bioinformatics Laboratory.
Speaker: Hom-Jay Hom Date:2009/11/17 Botnet, and the CyberCriminal Underground IEEE 2008 Hsin chun Chen Clinton J. Mielke II.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, Presented.
Spamming Botnets: Signatures and Characteristics Authors:Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten+, Ivan Osipkov+ Presenter: Chia-Li.
Automating Analysis of Large-Scale Botnet Probing Events Zhichun Li, Anup Goyal, Yan Chen and Vern Paxson* Lab for Internet and Security Technology (LIST)
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
LexPageRank: Prestige in Multi-Document Text Summarization Gunes Erkan, Dragomir R. Radev (EMNLP 2004)
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
The Structure of Scientific Collaboration Networks by M. E. J. Newman CMSC 601 Paper Summary Marie desJardins January 27, 2009.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
A Place-based Model for the Internet Topology Xiaotao Cai Victor T.-S. Shi William Perrizo NDSU {Xiaotao.cai, Victor.shi,
How dynamic are IP addresses? Yinglian Xie, Fang Yu, Kannan Achan, Eliot Gillum, Moises Goldszmidt, Ted Wobber SIGCOMM ‘07 Chulhyun Park
Dec 14, 2014, Harvard University
Written by Qiang Cao, Xiaowei Yang, Jieqi Yu and Christopher Palow
Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1
Cohesive Subgraph Computation over Large Graphs
Written by Qiang Cao, Xiaowei Yang, Jieqi Yu and Christopher Palow
by Hyunwoo Park and Kichun Lee Knowledge-Based Systems 60 (2014) 58–72
De-anonymizing the Internet Using Unreliable IDs
De-anonymizing the Internet Using Unreliable IDs By Yinglian Xie, Fang Yu, and Martín Abadi Presented by Peng Cheng 03/22/2017.
湖南大学-信息科学与工程学院-计算机与科学系
Department of Computer Science University of York
Presentation transcript:

BotGraph: Large Scale Spamming Botnet Detection Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum Speaker: 林佳宜

References Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum, BotGraph: Large Scale Spamming Botnet Detection, in The 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI '09), USENIX, April 2009

3 Outline Introduction BotGraph Architecture Random Graph Theory Hierarchical algorithm False Positive Analysis Conclusion

4 Introduction Design and implement a novel system called BotGraph to detect a new type of botnet spamming attacks targeting major Web providers. Two months of Hotmail log containing over 500 million users. Identified over 26 million botnetcreated user accounts with a low false positive rate.

5 Date and Environment Each record in the input log data contains three fields: UserID, IPAddress, and Login Timestamp. The implementation is based on the existing distributed computing models such as MapReduce and DryadLINQ Using the same 240-machine cluster in the experiments.

6 BotGraph Architecture BotGraph has two components:  aggressive sign-up detection  stealthy botuser detection based on their login activities

7 Detection of Aggressive Signups A sudden increase of signup activities is suspicious. EWMA algorithm to detect sudden changes in signup activities.

8 Detection of Stealthy Bot accounts The sharing of one IP address Multiple bot-users must log in from a common bot The sharing of multiple IP addresses Each account needs to be assigned to different bots Multiple shared IP addresses in the same Autonomous System (AS) are only counted as one shared IP address.

9 Graph-Based Bot-User Detection Use random graph models to analyze the user-user graph , and design a hierarchical algorithm to extract such components formed by bot-users.

10 Random Graph Theory G(n, p) as the random graph model n-vertex graph by simply assigning an edge to each pair of vertices with probability p ∈ [0, 1] G(n, p) has average degree d = n · p If d < 1, high probability the largest component in the graph has size less than O(log n). If d > 1, high probability the largest , component in the graph has size O(n).

11 spammers for assigning bot- accounts to bots Consider the following three typical strategies 1. Bot-user accounts are randomly assigned to bots 2. The spammer assigns k available bot-users for bot request. a bot makes only one request for k bot-users each day 3. no limit on the number of bot-users a bot can request for one day and that k = 1

12 Simulate assigning strategies Simulate the above typical spamming strategies and construct the corresponding user-user graph model1:10000 acount 500 bot model2:pick k = 20 model3:assume the bots go online with a Poisson arrival distribution and the length of bot online time fits a exponential distribution

13 Result 1.T is a transition point. 2.Model 2 has a transition value of T = 2. 3.Model 1 and 3 have the same transition value of T = 3. 4.Normal users usually cannot form large components with more than 100 nodes.

14 Extracting the graph components From the user-user graph generated with some predefined threshold T Need to handle the following issues Hard to choose a single fixed threshold of T Bot-users from different bot-user groups may be in thesame connected component Exist connected components of normal users

15 Partitioned data by IP addresses

16 Partitioned data by user IDs

17 Hierarchical algorithm

18 Bot-user groups

19 Confidence measures BotGraph computes two histograms from a 30-day log: h1: the numbers of s sent per day per user. h2: the sizes of s. Computes two statistics, s1 and s2, from the normalized histograms to quantify their differences: s1: the percentage of users who sent more than 3 s per day; s2: the areas of peaks in the normalized -size histogram, or the percentage of users who sent out s with a similar size. Both s1 and s2 are in the range of [0, 1] and can be used as confidence measures

20 Bot-User Pruning

21 Performance Evaluation[1/2]

22 Performance Evaluation[2/2]

23 False Positive Analysis Naming Patterns User-name template. such names are ‘w9168d4dc8c5c25f9” and ‘x9550a21da4e456a2”. Only 0.44% of the identified bot-users do not strictly follow the naming templates. Signup Dates Only 0.08% bot-users were signed up before year 2007 About 59.1% of bot-account of all the input dataset signed up before 2007 Adjust the false positive rate to be 0.08%/59.1% = 0.13%

24 Naming pattem score

25 Conclution BotGraph is implemented as a parallel Dryad/DryadLINQ application running on a large-scale computer cluster Using two-month Hotmail logs, BotGraph successfully detected more than 26 million botnet accounts The experience will be useful to a wide category of applications for constructing and analyzing large graphs

26 Thank you