Download presentation
Presentation is loading. Please wait.
Published byJemima Jordan Modified over 9 years ago
1
1 Measurement and Classification of Humans and Bots in Internet Chat By Steven Gianvecchio, Mengjun Xie, Zhenyu Wu, and Haining Wang College of William and Mary
2
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 2 Outline Background Measurement Classification System Experimental Evaluation Conclusion
3
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 3 Outline Background Measurement Classification System Experimental Evaluation Conclusion
4
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 4 Bots Bots - programs that automate human tasks web bots automate browsing the web chat bots automate online chat can be harmful and/or helpful
5
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 5 Chat Bots vs. BotNets BotNets – networks of compromised machines some use chat systems (IRC) for C&C, others use P2P, HTTP, etc. abuse various systems Chat Bots – automated chat programs some are helpful, e.g., chat loggers can abuse chat systems and their users
6
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 6 The Chat Bot Problem MSN The Problem – chat bots abuse chat services (e.g., AOL, Yahoo!, MSN) send spam spread malicious software mount phishing attacks Our focus is on the Yahoo! chat system
7
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 7 A Typical Chat Alice12 entered the room. Alice12: Hi room. Alice12 entered the room. Alice12: Hi room. Bob34: hi alice Alice12 entered the room. Alice12: Hi room. Bob34: hi alice Susie88: any guys want to let a cute girl move in with them! hehe Alice12 entered the room. Alice12: Hi room. Bob34: hi alice Susie88: any guys want to let a cute girl move in with them! hehe Alice12: What’s up? Alice12 entered the room. Alice12: Hi room. Bob34: hi alice Susie88: any guys want to let a cute girl move in with them! hehe Alice12: What’s up? Bob34: not much Alice12 entered the room. Alice12: Hi room. Bob34: hi alice Susie88: any guys want to let a cute girl move in with them! hehe Alice12: What’s up? Bob34: not much Susie88: can you guys see me on my web-cam?? (its in my profile)
8
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 8 Yahoo! Chat Yahoo! chat is a large commercial chat service over 3,000 chat rooms AUTH, CHAT, IM, …
9
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 9 Yahoo! Chat Yahoo! chat system client connects to a server servers relay messages to/from clients
10
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 10 Outline Background Measurement Classification System Experimental Evaluation Conclusion
11
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 11 Measurement August-November 2007 – we collect data August 2007 – Yahoo! adds CAPTCHA must pass to join a chat room protocol update, prevents some 3 rd party clients from accessing chat October 2007 – bots are back some bots return before 3 rd party clients
12
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 12 Measurement September and October 2007 very few chat bots August and November 2007 many chat bots 1,440 hours of chat logs 147 chat logs 21 chat rooms
13
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 13 Measurement To create our dataset, we read and label the chat users as human, bot, or ambiguous In total, we recognized 14 different types of chat bots different triggering mechanisms different text generation techniques
14
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 14 Triggering Mechanisms Timer-Based periodic timers, e.g., 40 seconds random timers, e.g., 45-125 seconds Response-Based responds to other users Sam77: Bob12, you’re just full of questions, aren’t you? Sam77: Bob12, lots of evidence for evolution can be found here http://
15
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 15 Text Generation Character Padding Fiona88: anyone boredjn wanna chat?uklcss Synonym Phrases Marjorie99: Hi Babes! Marjorie Here! Inspect My Site Marjorie99: Mmmm Folks! Im Marjorie! View My Webpage Odd Line or Word Spacing Message Replay
16
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 16 Types of Chat Bots Periodic Bots – sends messages based on periodic timers Random Bots – sends messages based on random timers Responder Bots – responds to messages of other users Replay Bots – replays messages of other users
17
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 17 Humans inter-message delay – evidence of heavy tail message size – well fit by Exponential (λ=0.034)
18
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 18 Periodic Bots inter-message delay – several clusters with high probabilities message size – messages built from templates approximate a normal distribution
19
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 19 Random Bots inter-message delay – Equilikely distribution at 40, 64, and 88; Uniform distribution 45-125 message size – messages selected from a small database
20
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 20 Responder Bots inter-message delay – human-like timing message size – multiple templates of different lengths
21
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 21 Replay Bots inter-message delay – cluster with high probabilities (replay bots are periodic) message size – human-like size, well fit by Exponential (λ=0.028)
22
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 22 Outline Background Measurement Classification System Experimental Evaluation Conclusion
23
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 23 Classification System Entropy Classifier detects abnormal behavior based on message sizes and inter-message delays accurate but slow Machine Learning Classifier detects “learned” patterns based on message content fast but must be trained
24
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 24 Observation – chat bots are less complex than humans, and thus, lower in entropy exploits the low entropy of chat bots Corrected Conditional Entropy Test (CCE) estimates higher-order entropy Entropy Test (EN) estimates first-order entropy Entropy Classifier
25
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 25 Machine Learning Classifier Observation - chat spam like email spam is a text classification problem exploits message content of chat bots CRM114 a powerful text classification system several built-in classifiers: HMM, KNN/Hyperspace, OSB, SVM, Winnow, etc. we use OSB
26
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 26 Hybrid Classification System entropy classifier builds and maintains the bot corpus machine learning classifier uses the bot and human corpora BOT CORPUS CLASSIFY AS CHAT BOT HUMAN CORPUS CLASSIFY AS HUMAN INPUT ENTROPY CLASSIFIER MACHINE LEARNING CLASSIFIER
27
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 27 Outline Background Measurement Classification System Experimental Evaluation Conclusion
28
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 28 Experimental Evaluation Types of Chat Bots Periodic Bots Random Bots Responder Bots Replay Bots Classifiers entropy classifier – 100 messages machine learning classifier – 25 messages
29
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 29 Experimental Evaluation Classification Tests Ent – entropy classifier SupML – fully-supervised ML classifier, trained on AUG BOTS SupMLre – fully-supervised ML classifier, retrained on NOV BOTS EntML – entropy-trained ML
30
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 30 AUG BOTSNOV BOTS periodicrandomrespondperiodicrandomreplayhuman test TP FP EN(imd) 121/12168/681/3051/51109/10940/407/1713 CCE(imd) 121/12149/684/3051/51109/10940/4011/1713 EN(ms) 92/1217/688/3046/5134/1090/407/1713 CCE(ms) 77/1218/6830/3051/516/1090/4011/1713 OVERALL 121/12168/6830/3051/51109/10940/4017/1713 Entropy Classifier EN – entropy CCE – corrected conditional entropy (imd) – inter-message delay (ms) – message size
31
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 31 AUG BOTSNOV BOTS periodicrandomrespondperiodicrandomreplayhuman test TP FP EN(imd) 121/12168/681/3051/51109/10940/407/1713 CCE(imd) 121/12149/684/3051/51109/10940/4011/1713 EN(ms) 92/1217/688/3046/5134/1090/407/1713 CCE(ms) 77/1218/6830/3051/516/1090/4011/1713 OVERALL 121/12168/6830/3051/51109/10940/4017/1713 EN(imd) and CCE(imd) problems against responder bots detect most other chat bots
32
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 32 AUG BOTSNOV BOTS periodicrandomrespondperiodicrandomreplayhuman test TP FP EN(imd) 121/12168/681/3051/51109/10940/407/1713 CCE(imd) 121/12149/684/3051/51109/10940/4011/1713 EN(ms) 92/1217/688/3046/5134/1090/407/1713 CCE(ms) 77/1218/6830/3051/516/1090/4011/1713 OVERALL 121/12168/6830/3051/51109/10940/4017/1713 EN(ms) and CCE(ms) problems against random and replay bots detect most other chat bots
33
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 33 AUG BOTSNOV BOTS periodicrandomrespondperiodicrandomreplayhuman test TP FP EN(imd) 121/12168/681/3051/51109/10940/407/1713 CCE(imd) 121/12149/684/3051/51109/10940/4011/1713 EN(ms) 92/1217/688/3046/5134/1090/407/1713 CCE(ms) 77/1218/6830/3051/516/1090/4011/1713 OVERALL 121/12168/6830/3051/51109/10940/4017/1713 OVERALL detects all chat bots false positive rate is ~0.01 100 messages
34
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 34 AUG BOTSNOV BOTS periodicrandomrespondperiodicrandomreplayhuman test TP FP Ent 121/12168/6830/3051/51109/10940/4017/1713 SupML 121/12168/6830/3014/51104/1091/400/1713 SupMLre 121/12168/6830/3051/51109/10940/400/1713 EntML 121/12168/6830/3051/51109/10940/401/1713 Entropy and Machine Learning Classifiers Ent – entropy classifier (from last slide) SupML – fully-supervised machine learning SupMLre – SupML retrained EntML – entropy-trained machine learning
35
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 35 AUG BOTSNOV BOTS periodicrandomrespondperiodicrandomreplayhuman Test TP FP Ent 121/12168/6830/3051/51109/10940/4017/1713 SupML 121/12168/6830/3014/51104/1091/400/1713 SupMLre 121/12168/6830/3051/51109/10940/400/1713 EntML 121/12168/6830/3051/51109/10940/401/1713 Ent OVERALL results from previous slide
36
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 36 AUG BOTSNOV BOTS periodicrandomrespondperiodicrandomreplayhuman test TP FP Ent 121/12168/6830/3051/51109/10940/4017/1713 SupML 121/12168/6830/3014/51104/1091/400/1713 SupMLre 121/12168/6830/3051/51109/10940/400/1713 EntML 121/12168/6830/3051/51109/10940/401/1713 SupML has problems against November bots needs to be retrained for new bots SupMLre detects all bots
37
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 37 AUG BOTSNOV BOTS periodicrandomrespondperiodicrandomreplayhuman test TP FP Ent 121/12168/6830/3051/51109/10940/4017/1713 SupML 121/12168/6830/3014/51104/1091/400/1713 SupMLre 121/12168/6830/3051/51109/10940/400/1713 EntML 121/12168/6830/3051/51109/10940/401/1713 EntML false positive rate is ~0.0005 (Ent is ~0.01) 25 messages
38
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 38 Outline Background Measurement Classification System Experimental Evaluation Conclusion
39
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 39 Conclusion Measurements overall, chat bots are less complex than humans some chat bots more human-like Classification System exploits benefits of both classifiers quickly classifies known chat bots accurately classifies unknown chat bots
40
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 40 Conclusion (cont.) Future Work investigate more advanced chat bots explore applications of entropy on other forms of bots (e.g., web bots) explore other applications of entropy (e.g., detecting covert timing channels)
41
USENIX Security 2008 Measurement and Classification of Humans and Bots in Internet Chat 41 Questions? Thank You!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.