INHA UNIVERSITY INCHEON, KOREA ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and K. Li, IEEE, INFOCOM 2008 Intelligent E-Commerce System Lab. Aettie, Ji

- 2 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ OUTLINE  INTORDUCTION  PRIOR WORK  THE ALPACAS ANTI-SPAM FRAMEWORK  Feature-Preserving Fingerprint  Privacy-Preserving Collaboration Protocol  System Structure  EXPERIMENTS & RESULTS  DISSCUSION  CONCLUSION

- 3 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ INRTODUCTION  Motivations  Recent spam attack expose strong challenges to statistical filters, which have been popular.  Collaborative spam filtering has a natural defense paradigm, wherein information of spam is shared, since the spammers sends similar emails to several target receivers.  However, privacy of participating collaboration is an important challenge.  For protecting privacy, digest approaches have been proposed but they are not sufficient.

- 4 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ INRTODUCTION  Contributions  ALPACAS: Large-scale Privacy-Aware Collaborative Anti- spam System. A resilient fingerprint generation technique, “feature- preserving transformation”, is proposed. A privacy-preserving protocol is designed to control the amount of information to be shared.  The experimental results demonstrate that the ALPACAS outperforms traditional stand-alone statistical filters.

- 5 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ PRIOR WORK  Drawbacks of the existing collaborative anti- spam schemes (using DCC).  How it works? Participating servers in DCC share the email’s digests computed through hash functions such as MD5. DCC system replies back with the recent statistics about the digests.  Drawbacks Hashing schemes like MD5 generate complete different hash value even if a single byte is altered. The DCC scheme does not completely address the privacy issue.  inference-based privacy breaches.

- 6 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ THE ALPACAS ANTI-SPAM FRAMEWORK(1/2)  Challenges  To protect email privacy, The messages have to be encrypted. It should retain important feature of the messages.  To avoid inference-based privacy beaches, It is necessary to minimize the information revealed during the collaboration.  ALPACAS framework components  Feature-preserving fingerprint  Privacy-preserving protocol  DHT-based architecture

- 7 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ THE ALPACAS ANTI-SPAM FRAMEWORK(2/2) Fig. 1: ALPACAS System Overview (a) ALPACAS Network (b) Internal mechanism of EA4

- 8 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Feature-Preserving Fingerprint(1/4)  Shingle-based Message Transformation  Shingle: If two documents vary by a small amount their shingle sets also differ by a small amount. THE ALPACAS ANTI-SPAM FRAMEWORK Fig. 2: ALPACAS Feature Sets, DCC and Razor Digests for 2 spam emails (Texts in bold font indicate differences)

- 9 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Feature-Preserving Fingerprint(2/4)  Shingle-based Message Transformation  Generation of transformed feature set of message M a (TFSet(M a )) Computing Rabin fingerprint[11] of consecutive tokens in sliding window of length W Each fingerprint is in the range of (0, 2 K – 1) For a message with X tokens, X – W + 1 fingerprints are obtained. The smallest Y are retained. The similarity between M a and M b can be calculated as THE ALPACAS ANTI-SPAM FRAMEWORK

- 10 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Feature-Preserving Fingerprint(3/4)  Shingle-based Message Transformation  In consideration of the privacy preservation, Rabin fingerprint algorithm is one-way hash function such that it is infeasible to reverse. However, it is possible to infer a word or a group of words from an individual feature value. THE ALPACAS ANTI-SPAM FRAMEWORK

- 11 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Feature-Preserving Fingerprint(4/4)  Term-level Privacy Preservation  Controlled shuffling The email text is divided into consecutive h chucks of z consecutive token. The tokens in each chuck are shuffled in a pre-defined manner, remaining the ordering of chucks. Each chuck is divided into y sub-chuck. (y is a factor of z.) The tokens in chuck CK h are shuffled such that the token at r th position in the s th sub-chuck is moved to (r ⅹ y + s) th position in CK h. If two messages contain an identical term, by shuffling the term, the feature set could be different. THE ALPACAS ANTI-SPAM FRAMEWORK

- 12 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Privacy-Preserving Collaboration Protocol (1/3)  Spam/ham dichotomy  Protocol  EA j receives M a, then computes TFSet(M a ).  EA j sends query to other agent with subset of TFSet(M a ).  EA k receives the query, then check its spam/ham KB.  For each matching entry in spam KB, EA k sends back the complete transformed feature set.  For each matching entry in ham KB, EA k sends back a small, randomly selected part of the transformed feature set. THE ALPACAS ANTI-SPAM FRAMEWORK Revealing the contents of a spam email does not affect the privacy, whereas revealing information about a ham email constitutes a privacy breach.

- 13 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Privacy-Preserving Collaboration Protocol (2/3) THE ALPACAS ANTI-SPAM FRAMEWORK Fig. 3: ALPACAS Protocol: Query and Response

- 14 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Privacy-Preserving Collaboration Protocol (3/3)  Protocol(cont’)  EA j now computes the ratio of MaxSpamOvlp(M a ) to MaxHamOvlp(M a ) and decides whether the M a is spam or ham.  If the score is greater than a threshold λ, M a is classified spam, otherwise ham. THE ALPACAS ANTI-SPAM FRAMEWORK

- 15 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ System Structure (1/2)  Design principle  DHT-based Architecture  EA j is responsible for maintaining information about all the emails whose TFSet as one feature element in the range of allocated to it. THE ALPACAS ANTI-SPAM FRAMEWORK A query should be sent to an email agent only if it has a reasonable chance of containing information about the email that is being verified. Contacting any other email agent not only introduces inefficiencies but also leads to unnecessary exposure of data.

- 16 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ System Structure (2/2)  DHT-based Architecture (cont’)  N email agent.  All feature elements lie within (0, 2 K -1).  The range (0, 2 K -1) is divided into N overlapping region as {(MinF 0,MaxF 0 ), (MinF 1,MaxF 1 ),..., (MinF N-1, 2 K −1)}.  (MinF j, MaxF j ) denotes the sub-range allocated to EA j. For spam, EA j stores the entire TFSet. For ham, EA j stores the subset of TFSet.  If MinF j ≤ Ft ≤ MaxF j, then EA j is called rendezvous agent of feature element Ft. THE ALPACAS ANTI-SPAM FRAMEWORK

- 17 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ EXPERIMENTS & RESULTS  Benchmarked algorithm  Bogofilter based on Bayesian filtering Calculating a spamminess score of the email.  DCC based on simple hash-based collaborative filtering Counting the number of times the hash value of the email has been reported as a spam.

- 18 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Experimental Setup  Dataset  TREC email corpus & SpamAssassin email corpus  TREC corpus is classified into 67 email sets according to their target address (67 agents).  Half of each email set including ham and spam is used for training and the remainder for testing.  Each individual has a pre-classified email corpus(SpamAssassin) a the initial knowledgebase. EXPERIMENTS & RESULTS

- 19 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Performance Metrics  Spam filtering accuracy  A ham email that is classified a spam by the filtering scheme is termed as false positive.  Privacy of collaborative anti-spam system  Message-level privacy breach percentage is defined as the ratio number of test ham messages suffering privacy compromises to the total number of test ham messages.  Communication overhead of the system  Per-test communication cost metric is defined as the total number of messages circulated in the system during the entire experiment. EXPERIMENTS & RESULTS

- 20 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ SPAM Filtering Effectiveness EXPERIMENTS & RESULTS Fig. 4: False Positive Percentages of ALPACAS, BogoFilter and DCC Fig. 5: False Negative Percentages of ALPACAS, BogoFilter and DCC Fig. 6: System Overall Accuracy (DCC is not displayed because its FP is 0)

- 21 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Robustness Against Attacks EXPERIMENTS & RESULTS Fig. 7: System Robustness Against Good-Word Attacks Fig. 8: System Robustness against Character Replacement Attacks

- 22 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Privacy Awareness EXPERIMENTS & RESULTS Fig. 9: Privacy Breach in ALPACAS (Varying Number of Agents)

- 23 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Communication Oveheads EXPERIMENTS & RESULTS Fig. 10: Communication Overheads of the ALPACAS and the DCC systems

- 24 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ Massage Transformation Algorithm Analysis EXPERIMENTS & RESULTS Fig. 11: False Positive of ALPACAS for Various Parameter Setup Fig. 12: False Negative of ALPACAS for Various Parameter Setup Fig. 13: Effectiveness of Controlled Shuffling Strategy

- 25 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ DISCUSSION  Approaches like statistical filtering combined the feature preservation transformation scheme.  Applying dynamic nature of email agent to the system using replication and finger-table based routing.  Approaches for preventing malicious email agents.

- 26 - INHA UNIVERSITY INCHEON, KOREA http://eslab.inha.ac.kr/ CONCLUSION  In this paper, the design and evaluation of ALPACAS is presented.  The two novel features:  A feature preserving transformation technique  A privacy-preserving protocol  Our initial experiments show that ALPACAS  Is very effective in filtering spam.  Has high resilience towards various attacks.  Has strong privacy protection to the participating entities.

INHA UNIVERSITY INCHEON, KOREA ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

Similar presentations

Presentation on theme: "INHA UNIVERSITY INCHEON, KOREA ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

INHA UNIVERSITY INCHEON, KOREA ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

Similar presentations

Presentation on theme: "INHA UNIVERSITY INCHEON, KOREA ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and."— Presentation transcript:

Similar presentations

About project

Feedback