Download presentation
Presentation is loading. Please wait.
Published byArchibald Kennedy Modified over 9 years ago
1
INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and K. Li, IEEE, INFOCOM 2008 Intelligent E-Commerce System Lab. Aettie, Ji
2
- 2 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ OUTLINE INTORDUCTION PRIOR WORK THE ALPACAS ANTI-SPAM FRAMEWORK Feature-Preserving Fingerprint Privacy-Preserving Collaboration Protocol System Structure EXPERIMENTS & RESULTS DISSCUSION CONCLUSION
3
- 3 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ INRTODUCTION Motivations Recent spam attack expose strong challenges to statistical filters, which have been popular. Collaborative spam filtering has a natural defense paradigm, wherein information of spam is shared, since the spammers sends similar emails to several target receivers. However, privacy of participating collaboration is an important challenge. For protecting privacy, digest approaches have been proposed but they are not sufficient.
4
- 4 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ INRTODUCTION Contributions ALPACAS: Large-scale Privacy-Aware Collaborative Anti- spam System. A resilient fingerprint generation technique, “feature- preserving transformation”, is proposed. A privacy-preserving protocol is designed to control the amount of information to be shared. The experimental results demonstrate that the ALPACAS outperforms traditional stand-alone statistical filters.
5
- 5 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ PRIOR WORK Drawbacks of the existing collaborative anti- spam schemes (using DCC). How it works? Participating servers in DCC share the email’s digests computed through hash functions such as MD5. DCC system replies back with the recent statistics about the digests. Drawbacks Hashing schemes like MD5 generate complete different hash value even if a single byte is altered. The DCC scheme does not completely address the privacy issue. inference-based privacy breaches.
6
- 6 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ THE ALPACAS ANTI-SPAM FRAMEWORK(1/2) Challenges To protect email privacy, The messages have to be encrypted. It should retain important feature of the messages. To avoid inference-based privacy beaches, It is necessary to minimize the information revealed during the collaboration. ALPACAS framework components Feature-preserving fingerprint Privacy-preserving protocol DHT-based architecture
7
- 7 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ THE ALPACAS ANTI-SPAM FRAMEWORK(2/2) Fig. 1: ALPACAS System Overview (a) ALPACAS Network (b) Internal mechanism of EA4
8
- 8 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Feature-Preserving Fingerprint(1/4) Shingle-based Message Transformation Shingle: If two documents vary by a small amount their shingle sets also differ by a small amount. THE ALPACAS ANTI-SPAM FRAMEWORK Fig. 2: ALPACAS Feature Sets, DCC and Razor Digests for 2 spam emails (Texts in bold font indicate differences)
9
- 9 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Feature-Preserving Fingerprint(2/4) Shingle-based Message Transformation Generation of transformed feature set of message M a (TFSet(M a )) Computing Rabin fingerprint[11] of consecutive tokens in sliding window of length W Each fingerprint is in the range of (0, 2 K – 1) For a message with X tokens, X – W + 1 fingerprints are obtained. The smallest Y are retained. The similarity between M a and M b can be calculated as THE ALPACAS ANTI-SPAM FRAMEWORK
10
- 10 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Feature-Preserving Fingerprint(3/4) Shingle-based Message Transformation In consideration of the privacy preservation, Rabin fingerprint algorithm is one-way hash function such that it is infeasible to reverse. However, it is possible to infer a word or a group of words from an individual feature value. THE ALPACAS ANTI-SPAM FRAMEWORK
11
- 11 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Feature-Preserving Fingerprint(4/4) Term-level Privacy Preservation Controlled shuffling The email text is divided into consecutive h chucks of z consecutive token. The tokens in each chuck are shuffled in a pre-defined manner, remaining the ordering of chucks. Each chuck is divided into y sub-chuck. (y is a factor of z.) The tokens in chuck CK h are shuffled such that the token at r th position in the s th sub-chuck is moved to (r ⅹ y + s) th position in CK h. If two messages contain an identical term, by shuffling the term, the feature set could be different. THE ALPACAS ANTI-SPAM FRAMEWORK
12
- 12 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Privacy-Preserving Collaboration Protocol (1/3) Spam/ham dichotomy Protocol EA j receives M a, then computes TFSet(M a ). EA j sends query to other agent with subset of TFSet(M a ). EA k receives the query, then check its spam/ham KB. For each matching entry in spam KB, EA k sends back the complete transformed feature set. For each matching entry in ham KB, EA k sends back a small, randomly selected part of the transformed feature set. THE ALPACAS ANTI-SPAM FRAMEWORK Revealing the contents of a spam email does not affect the privacy, whereas revealing information about a ham email constitutes a privacy breach.
13
- 13 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Privacy-Preserving Collaboration Protocol (2/3) THE ALPACAS ANTI-SPAM FRAMEWORK Fig. 3: ALPACAS Protocol: Query and Response
14
- 14 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Privacy-Preserving Collaboration Protocol (3/3) Protocol(cont’) EA j now computes the ratio of MaxSpamOvlp(M a ) to MaxHamOvlp(M a ) and decides whether the M a is spam or ham. If the score is greater than a threshold λ, M a is classified spam, otherwise ham. THE ALPACAS ANTI-SPAM FRAMEWORK
15
- 15 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ System Structure (1/2) Design principle DHT-based Architecture EA j is responsible for maintaining information about all the emails whose TFSet as one feature element in the range of allocated to it. THE ALPACAS ANTI-SPAM FRAMEWORK A query should be sent to an email agent only if it has a reasonable chance of containing information about the email that is being verified. Contacting any other email agent not only introduces inefficiencies but also leads to unnecessary exposure of data.
16
- 16 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ System Structure (2/2) DHT-based Architecture (cont’) N email agent. All feature elements lie within (0, 2 K -1). The range (0, 2 K -1) is divided into N overlapping region as {(MinF 0,MaxF 0 ), (MinF 1,MaxF 1 ),..., (MinF N-1, 2 K −1)}. (MinF j, MaxF j ) denotes the sub-range allocated to EA j. For spam, EA j stores the entire TFSet. For ham, EA j stores the subset of TFSet. If MinF j ≤ Ft ≤ MaxF j, then EA j is called rendezvous agent of feature element Ft. THE ALPACAS ANTI-SPAM FRAMEWORK
17
- 17 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ EXPERIMENTS & RESULTS Benchmarked algorithm Bogofilter based on Bayesian filtering Calculating a spamminess score of the email. DCC based on simple hash-based collaborative filtering Counting the number of times the hash value of the email has been reported as a spam.
18
- 18 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Experimental Setup Dataset TREC email corpus & SpamAssassin email corpus TREC corpus is classified into 67 email sets according to their target address (67 agents). Half of each email set including ham and spam is used for training and the remainder for testing. Each individual has a pre-classified email corpus(SpamAssassin) a the initial knowledgebase. EXPERIMENTS & RESULTS
19
- 19 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Performance Metrics Spam filtering accuracy A ham email that is classified a spam by the filtering scheme is termed as false positive. Privacy of collaborative anti-spam system Message-level privacy breach percentage is defined as the ratio number of test ham messages suffering privacy compromises to the total number of test ham messages. Communication overhead of the system Per-test communication cost metric is defined as the total number of messages circulated in the system during the entire experiment. EXPERIMENTS & RESULTS
20
- 20 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ SPAM Filtering Effectiveness EXPERIMENTS & RESULTS Fig. 4: False Positive Percentages of ALPACAS, BogoFilter and DCC Fig. 5: False Negative Percentages of ALPACAS, BogoFilter and DCC Fig. 6: System Overall Accuracy (DCC is not displayed because its FP is 0)
21
- 21 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Robustness Against Attacks EXPERIMENTS & RESULTS Fig. 7: System Robustness Against Good-Word Attacks Fig. 8: System Robustness against Character Replacement Attacks
22
- 22 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Privacy Awareness EXPERIMENTS & RESULTS Fig. 9: Privacy Breach in ALPACAS (Varying Number of Agents)
23
- 23 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Communication Oveheads EXPERIMENTS & RESULTS Fig. 10: Communication Overheads of the ALPACAS and the DCC systems
24
- 24 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Massage Transformation Algorithm Analysis EXPERIMENTS & RESULTS Fig. 11: False Positive of ALPACAS for Various Parameter Setup Fig. 12: False Negative of ALPACAS for Various Parameter Setup Fig. 13: Effectiveness of Controlled Shuffling Strategy
25
- 25 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ DISCUSSION Approaches like statistical filtering combined the feature preservation transformation scheme. Applying dynamic nature of email agent to the system using replication and finger-table based routing. Approaches for preventing malicious email agents.
26
- 26 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ CONCLUSION In this paper, the design and evaluation of ALPACAS is presented. The two novel features: A feature preserving transformation technique A privacy-preserving protocol Our initial experiments show that ALPACAS Is very effective in filtering spam. Has high resilience towards various attacks. Has strong privacy protection to the participating entities.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.