Distributed Quota Enforcement for Spam Control Jee Whan Choi Chaoting Xuan
Contents Introduction Distributed Quota Enforcement (DQE) DQE Architecture Enforcer Design Evaluation Conclusions
Introduction SPAM – Unsolicited Bulk – 50-70% of today is SPAM SPAM Filters – text scanning – Rate of false positive is approximately 1% – Economic damage estimated at 100’s of millions of dollars Distributed Quota Enforcement (DQE) – Quotas on the # of mails a sender can send
Distributed Quota Enforcement Design Objectives – Protocol No False Positives Untrusted Enforcer Privacy – Enforcer Scalability Fault Tolerance High Throughput Attack-Resiliency Mutually Untrusting Nodes
Architecture
Quota Allocation and Creation Quota Allocation – Quota allocated by select few globally trusted quota allocators (QA) Cs = { Spub, expiration time, quota }QApriv Stamp – Created by the sender Stamp = { Cs, {i,t}Spriv }
Stamp Cancellation Protocol
Protocol Objectives False Positives – Hash is unique and one way Untrusted Enforcer – Returns a proof of reuse (fingerprint) Privacy – Hash of the stamp is used instead of the stamp itself An adversary cannot cancel a victim’s stamp before it is created – Stamp contains Sender’s private key
Enforcer Comprises of thousands of untrusted storage nodes Enforcer stores the fingerprints of stamps cancelled in the current and previous epochs List of approved nodes are published by a trusted authority (Bunker) Node receiving the client’s request is called the portal for that request – A client can discover a portal via hard-coding or DNS
Enforcer Design
TEST – Local check – If not found, sequentially send request to other nodes (assigned-nodes) Assigned-nodes are determined by k and r independent hash functions, similar to Chord. r is configurable system parameter – If any node contains k’s value, return it, otherwise return “not found”
SET – Local store – Also store the value in a randomly chosen node from assigned-nodes
TEST and SET Algorithm
Stamp Reuse and Fault Tolerance False negative is possible. Byzantine faults and crash faults are the same – Outcome of adversarial nodes giving false negatives (not-found response) are the same a nodes not responding (crash fault) Depends on the parameters r and p – p – fraction of n total machines that fail during a 2 day cycle – Expected number of times a stamp is used before stamp’s fingerprint has been placed on a good node - 1/(1-2p)+p r *n – If we assume r = 1+log 1/p n, use = 1+3p = 1.3 for p = 0.1
Improvement of Fault Tolerance (our speculation) Randomly chose two or more nodes from the assigned nodes to store the (key, value) pair in the PUT algorithm. Increase the overall storage usage, but significantly i mprove the stamp reuse detection rate.
GET and PUT
GET and PUT (Continue) PUTs are fast Crash recovery of previously cancelled keys Key-value pairs are small in size “Not Found” answers are almost always fast “Found” answers are slow
Avoiding Distributed Livelock Distributed Pipeline: 1. TEST/SET requests from clients. 2. GET/PUT requests from other enforcer no des. 3. GET/PUT responses. Drop the beginning of a pipeline to maximize throughput.
Resource Exhaustion Attacks Attacks: flood of spurious TEST/SET requests. Assumption: Attackers (or zombies they control) have some bandwidth limit. Solution: Max out attackers’ bandwith by requiring large size or multiple copies of TEST/SET packets.
Performance Evaluation
Performance Evaluation (Continue) Enforcer Size billion s daily 2. 65% spam billion disk seeks / day (pessimistic) disk seeks/second/node seconds/day 1881 nodes (3GHz CPU, 1G RAM, 3 Mbits/ sec Bandwith)
Performance Evaluation (Continue)
Question ?