INHA UNIVERSITY INCHEON, KOREA ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

Slides:



Advertisements
Similar presentations
Collaborative Tagging in Recommender Systems AE-TTIE JI1, CHEOL YEON1, HEUNG-NAM KIM1, AND GEUN-SIK JO2 1 Intelligent E-Commerce Systems Laboratory,
Advertisements

Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.
CrowdER - Crowdsourcing Entity Resolution
Decentralized Reactive Clustering in Sensor Networks Yingyue Xu April 26, 2015.
Efficient Public Key Infrastructure Implementation in Wireless Sensor Networks Wireless Communication and Sensor Computing, ICWCSC International.
A Distributed Security Framework for Heterogeneous Wireless Sensor Networks Presented by Drew Wichmann Paper by Himali Saxena, Chunyu Ai, Marco Valero,
Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks Yehonatan Cohen Daniel Gordon Danny Hendler Ben-Gurion University Yehonatan.
TrustMe: Anonymous Management of Trust Relationships in Decentralized P2P Systems Aameek Singh and Ling Liu Presented by: Korporn Panyim.
Randomized Radon Transforms for Biometric Authentication via Fingerprint Hashing 2007 ACM Digital Rights Management Workshop Alexandria, VA (USA) October.
Forwarding Redundancy in Opportunistic Mobile Networks: Investigation and Elimination Wei Gao 1, Qinghua Li 2 and Guohong Cao 3 1 The University of Tennessee,
Hash-Based IP Traceback Best Student Paper ACM SIGCOMM’01.
Edith C. H. Ngai1, Jiangchuan Liu2, and Michael R. Lyu1
Small-world Overlay P2P Network
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
Fast Statistical Spam Filter by Approximate Classifications Authors: Kang Li Zhenyu Zhong University of Georgia Reader: Deke Guo.
Dept. of Computer Science & Engineering, CUHK1 Trust- and Clustering-Based Authentication Services in Mobile Ad Hoc Networks Edith Ngai and Michael R.
Detecting Near Duplicates for Web Crawling Authors : Gurmeet Singh Mank Arvind Jain Anish Das Sarma Presented by Chintan Udeshi 6/28/ Udeshi-CS572.
The problems associated with operating an effective anti-spam blocklist system in an increasingly hostile environment. Robert Gallagher September 2004.
An Authentication Service Against Dishonest Users in Mobile Ad Hoc Networks Edith Ngai, Michael R. Lyu, and Roland T. Chin IEEE Aerospace Conference, Big.
Spam May CS239. Taxonomy (UBE)  Advertisement  Phishing Webpage  Content  Links From: Thrifty Health-Insurance Mailed-By: noticeoption.comReply-To:
Privacy-Preserving Computation and Verification of Aggregate Queries on Outsourced Databases Brian Thompson 1, Stuart Haber 2, William G. Horne 2, Tomas.
1/13/2003Approximate Object Location and Spam Filtering on Tapestry1 Feng Zhou Li Zhuang
1 The Mystery of Cooperative Web Caching 2 b b Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces.
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
An Effective Defense Against Spam Laundering Paper by: Mengjun Xie, Heng Yin, Haining Wang Presented at:CCS'06 Presentation by: Devendra Salvi.
D ATABASE S ECURITY Proposed by Abdulrahman Aldekhelallah University of Scranton – CS521 Spring2015.
UC Santa Cruz Providing High Reliability in a Minimum Redundancy Archival Storage System Deepavali Bhagwat Kristal Pollack Darrell D. E. Long Ethan L.
Cong Wang1, Qian Wang1, Kui Ren1 and Wenjing Lou2
Mobile IP: Introduction Reference: “Mobile networking through Mobile IP”; Perkins, C.E.; IEEE Internet Computing, Volume: 2 Issue: 1, Jan.- Feb. 1998;
An adaptive framework of multiple schemes for event and query distribution in wireless sensor networks Vincent Tam, Keng-Teck Ma, and King-Shan Lui IEEE.
.Net Security and Performance -has security slowed down the application By Krishnan Ganesh Madras.
Privacy-Preserving P2P Data Sharing with OneSwarm -Piggy.
Active Learning for Class Imbalance Problem
Network and Systems Security By, Vigya Sharma (2011MCS2564) FaisalAlam(2011MCS2608) DETECTING SPAMMERS ON SOCIAL NETWORKS.
Application-Layer Anycasting By Samarat Bhattacharjee et al. Presented by Matt Miller September 30, 2002.
Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 9/19/2015Slide 1 (of 32)
Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR AHMED, SANDEEP PANDEY, VANJA JOSIFOVSKI, LLUIS GARCIA-PUEYO,
02/22/2005 Joint Seminer Satoshi Koga Information Technology & Security Lab. Kyushu Univ. A Distributed Online Certificate Status Protocol with Low Communication.
nd Joint Workshop between Security Research Labs in JAPAN and KOREA Profile-based Web Application Security System Kyungtae Kim High Performance.
Source-End Defense System against DDoS attacks Fu-Yuan Lee, Shiuhpyng Shieh, Jui-Ting Shieh and Sheng Hsuan Wang Distributed System and Network Security.
Privacy-Aware Personalization for Mobile Advertising
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Blind Pattern Matching Attack on Watermark Systems D. Kirovski and F. A. P. Petitcolas IEEE Transactions on Signal Processing, VOL. 51, NO. 4, April 2003.
Trust- and Clustering-Based Authentication Service in Mobile Ad Hoc Networks Presented by Edith Ngai 28 October 2003.
BOTNET JUDO Fighting Spam with Itself By: Pitsillidis, Levchenko, Kreibich, Kanich, Voelker, Paxson, Weaver, and Savage Presentation by: Heath Carroll.
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Instance Filtering for Entity Recognition Advisor : Dr.
Rushing Attacks and Defense in Wireless Ad Hoc Network Routing Protocols ► Acts as denial of service by disrupting the flow of data between a source and.
Rendezvous Regions: A Scalable Architecture for Service Location and Data-Centric Storage in Large-Scale Wireless Sensor Networks Karim Seada, Ahmed Helmy.
Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms Author: Monika Henzinger Presenter: Chao Yan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Evolving Reactive NPCs for the Real-Time Simulation Game.
1 Utilizing Shared Vehicle Trajectories for Data Forwarding in Vehicular Networks IEEE INFOCOM MINI-CONFERENCE Fulong Xu, Shuo Gu, Jaehoon Jeong, Yu Gu,
Machine Learning for Spam Filtering 1 Sai Koushik Haddunoori.
Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.
Secure Spread Spectrum Watermarking for Multimedia Young K Hwang.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Presented By Amarjit Datta
Efficient Resource Allocation for Wireless Multicast De-Nian Yang, Member, IEEE Ming-Syan Chen, Fellow, IEEE IEEE Transactions on Mobile Computing, April.
Freenet: Anonymous Storage and Retrieval of Information
Jinfang Jiang, Guangjie Han, Lei Shu, Han-Chieh Chao, Shojiro Nishio
Network Security Celia Li Computer Science and Engineering York University.
Talal H. Noor, Quan Z. Sheng, Lina Yao,
Improving Digest-Based Collaborative Spam Detection
Kalyan Boggavarapu Lehigh University
Paraskevi Raftopoulou, Euripides G.M. Petrakis
MEET-IP Memory and Energy Efficient TCAM-based IP Lookup
Presentation transcript:

INHA UNIVERSITY INCHEON, KOREA ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and K. Li, IEEE, INFOCOM 2008 Intelligent E-Commerce System Lab. Aettie, Ji

- 2 - INHA UNIVERSITY INCHEON, KOREA OUTLINE  INTORDUCTION  PRIOR WORK  THE ALPACAS ANTI-SPAM FRAMEWORK  Feature-Preserving Fingerprint  Privacy-Preserving Collaboration Protocol  System Structure  EXPERIMENTS & RESULTS  DISSCUSION  CONCLUSION

- 3 - INHA UNIVERSITY INCHEON, KOREA INRTODUCTION  Motivations  Recent spam attack expose strong challenges to statistical filters, which have been popular.  Collaborative spam filtering has a natural defense paradigm, wherein information of spam is shared, since the spammers sends similar s to several target receivers.  However, privacy of participating collaboration is an important challenge.  For protecting privacy, digest approaches have been proposed but they are not sufficient.

- 4 - INHA UNIVERSITY INCHEON, KOREA INRTODUCTION  Contributions  ALPACAS: Large-scale Privacy-Aware Collaborative Anti- spam System. A resilient fingerprint generation technique, “feature- preserving transformation”, is proposed. A privacy-preserving protocol is designed to control the amount of information to be shared.  The experimental results demonstrate that the ALPACAS outperforms traditional stand-alone statistical filters.

- 5 - INHA UNIVERSITY INCHEON, KOREA PRIOR WORK  Drawbacks of the existing collaborative anti- spam schemes (using DCC).  How it works? Participating servers in DCC share the ’s digests computed through hash functions such as MD5. DCC system replies back with the recent statistics about the digests.  Drawbacks Hashing schemes like MD5 generate complete different hash value even if a single byte is altered. The DCC scheme does not completely address the privacy issue.  inference-based privacy breaches.

- 6 - INHA UNIVERSITY INCHEON, KOREA THE ALPACAS ANTI-SPAM FRAMEWORK(1/2)  Challenges  To protect privacy, The messages have to be encrypted. It should retain important feature of the messages.  To avoid inference-based privacy beaches, It is necessary to minimize the information revealed during the collaboration.  ALPACAS framework components  Feature-preserving fingerprint  Privacy-preserving protocol  DHT-based architecture

- 7 - INHA UNIVERSITY INCHEON, KOREA THE ALPACAS ANTI-SPAM FRAMEWORK(2/2) Fig. 1: ALPACAS System Overview (a) ALPACAS Network (b) Internal mechanism of EA4

- 8 - INHA UNIVERSITY INCHEON, KOREA Feature-Preserving Fingerprint(1/4)  Shingle-based Message Transformation  Shingle: If two documents vary by a small amount their shingle sets also differ by a small amount. THE ALPACAS ANTI-SPAM FRAMEWORK Fig. 2: ALPACAS Feature Sets, DCC and Razor Digests for 2 spam s (Texts in bold font indicate differences)

- 9 - INHA UNIVERSITY INCHEON, KOREA Feature-Preserving Fingerprint(2/4)  Shingle-based Message Transformation  Generation of transformed feature set of message M a (TFSet(M a )) Computing Rabin fingerprint[11] of consecutive tokens in sliding window of length W Each fingerprint is in the range of (0, 2 K – 1) For a message with X tokens, X – W + 1 fingerprints are obtained. The smallest Y are retained. The similarity between M a and M b can be calculated as THE ALPACAS ANTI-SPAM FRAMEWORK

INHA UNIVERSITY INCHEON, KOREA Feature-Preserving Fingerprint(3/4)  Shingle-based Message Transformation  In consideration of the privacy preservation, Rabin fingerprint algorithm is one-way hash function such that it is infeasible to reverse. However, it is possible to infer a word or a group of words from an individual feature value. THE ALPACAS ANTI-SPAM FRAMEWORK

INHA UNIVERSITY INCHEON, KOREA Feature-Preserving Fingerprint(4/4)  Term-level Privacy Preservation  Controlled shuffling The text is divided into consecutive h chucks of z consecutive token. The tokens in each chuck are shuffled in a pre-defined manner, remaining the ordering of chucks. Each chuck is divided into y sub-chuck. (y is a factor of z.) The tokens in chuck CK h are shuffled such that the token at r th position in the s th sub-chuck is moved to (r ⅹ y + s) th position in CK h. If two messages contain an identical term, by shuffling the term, the feature set could be different. THE ALPACAS ANTI-SPAM FRAMEWORK

INHA UNIVERSITY INCHEON, KOREA Privacy-Preserving Collaboration Protocol (1/3)  Spam/ham dichotomy  Protocol  EA j receives M a, then computes TFSet(M a ).  EA j sends query to other agent with subset of TFSet(M a ).  EA k receives the query, then check its spam/ham KB.  For each matching entry in spam KB, EA k sends back the complete transformed feature set.  For each matching entry in ham KB, EA k sends back a small, randomly selected part of the transformed feature set. THE ALPACAS ANTI-SPAM FRAMEWORK Revealing the contents of a spam does not affect the privacy, whereas revealing information about a ham constitutes a privacy breach.

INHA UNIVERSITY INCHEON, KOREA Privacy-Preserving Collaboration Protocol (2/3) THE ALPACAS ANTI-SPAM FRAMEWORK Fig. 3: ALPACAS Protocol: Query and Response

INHA UNIVERSITY INCHEON, KOREA Privacy-Preserving Collaboration Protocol (3/3)  Protocol(cont’)  EA j now computes the ratio of MaxSpamOvlp(M a ) to MaxHamOvlp(M a ) and decides whether the M a is spam or ham.  If the score is greater than a threshold λ, M a is classified spam, otherwise ham. THE ALPACAS ANTI-SPAM FRAMEWORK

INHA UNIVERSITY INCHEON, KOREA System Structure (1/2)  Design principle  DHT-based Architecture  EA j is responsible for maintaining information about all the s whose TFSet as one feature element in the range of allocated to it. THE ALPACAS ANTI-SPAM FRAMEWORK A query should be sent to an agent only if it has a reasonable chance of containing information about the that is being verified. Contacting any other agent not only introduces inefficiencies but also leads to unnecessary exposure of data.

INHA UNIVERSITY INCHEON, KOREA System Structure (2/2)  DHT-based Architecture (cont’)  N agent.  All feature elements lie within (0, 2 K -1).  The range (0, 2 K -1) is divided into N overlapping region as {(MinF 0,MaxF 0 ), (MinF 1,MaxF 1 ),..., (MinF N-1, 2 K −1)}.  (MinF j, MaxF j ) denotes the sub-range allocated to EA j. For spam, EA j stores the entire TFSet. For ham, EA j stores the subset of TFSet.  If MinF j ≤ Ft ≤ MaxF j, then EA j is called rendezvous agent of feature element Ft. THE ALPACAS ANTI-SPAM FRAMEWORK

INHA UNIVERSITY INCHEON, KOREA EXPERIMENTS & RESULTS  Benchmarked algorithm  Bogofilter based on Bayesian filtering Calculating a spamminess score of the .  DCC based on simple hash-based collaborative filtering Counting the number of times the hash value of the has been reported as a spam.

INHA UNIVERSITY INCHEON, KOREA Experimental Setup  Dataset  TREC corpus & SpamAssassin corpus  TREC corpus is classified into 67 sets according to their target address (67 agents).  Half of each set including ham and spam is used for training and the remainder for testing.  Each individual has a pre-classified corpus(SpamAssassin) a the initial knowledgebase. EXPERIMENTS & RESULTS

INHA UNIVERSITY INCHEON, KOREA Performance Metrics  Spam filtering accuracy  A ham that is classified a spam by the filtering scheme is termed as false positive.  Privacy of collaborative anti-spam system  Message-level privacy breach percentage is defined as the ratio number of test ham messages suffering privacy compromises to the total number of test ham messages.  Communication overhead of the system  Per-test communication cost metric is defined as the total number of messages circulated in the system during the entire experiment. EXPERIMENTS & RESULTS

INHA UNIVERSITY INCHEON, KOREA SPAM Filtering Effectiveness EXPERIMENTS & RESULTS Fig. 4: False Positive Percentages of ALPACAS, BogoFilter and DCC Fig. 5: False Negative Percentages of ALPACAS, BogoFilter and DCC Fig. 6: System Overall Accuracy (DCC is not displayed because its FP is 0)

INHA UNIVERSITY INCHEON, KOREA Robustness Against Attacks EXPERIMENTS & RESULTS Fig. 7: System Robustness Against Good-Word Attacks Fig. 8: System Robustness against Character Replacement Attacks

INHA UNIVERSITY INCHEON, KOREA Privacy Awareness EXPERIMENTS & RESULTS Fig. 9: Privacy Breach in ALPACAS (Varying Number of Agents)

INHA UNIVERSITY INCHEON, KOREA Communication Oveheads EXPERIMENTS & RESULTS Fig. 10: Communication Overheads of the ALPACAS and the DCC systems

INHA UNIVERSITY INCHEON, KOREA Massage Transformation Algorithm Analysis EXPERIMENTS & RESULTS Fig. 11: False Positive of ALPACAS for Various Parameter Setup Fig. 12: False Negative of ALPACAS for Various Parameter Setup Fig. 13: Effectiveness of Controlled Shuffling Strategy

INHA UNIVERSITY INCHEON, KOREA DISCUSSION  Approaches like statistical filtering combined the feature preservation transformation scheme.  Applying dynamic nature of agent to the system using replication and finger-table based routing.  Approaches for preventing malicious agents.

INHA UNIVERSITY INCHEON, KOREA CONCLUSION  In this paper, the design and evaluation of ALPACAS is presented.  The two novel features:  A feature preserving transformation technique  A privacy-preserving protocol  Our initial experiments show that ALPACAS  Is very effective in filtering spam.  Has high resilience towards various attacks.  Has strong privacy protection to the participating entities.