Protecting Cyber-TA Contributors: Risks and Challenges Vitaly Shmatikov The University of Texas at Austin
Intrusion detection data Security alerts Firewall data How to do collaborative analysis if networks don’t trust each other? Goal: stop attackers from abusing these data Big Picture
Sample Intrusion Detection Alert may contain victim’s IP address reveals relationships with other networks reveals target’s IP address reveals topology of targeted network and attack propagation leaks information stored on targeted systems may reveal organization that owns it
Basic Tradeoffs tradeoffs privacy and anonymity utility efficiency Do not enable attackers to track attack propagation Do not announce site defenses Do not reveal network topology, configuration, enabled services Support (at least) coarse-grained analysis: event trends, identification of common attack sources, connection patterns, blacklisting, etc. Low overhead; no complicated crypto
lAlerts may be used to track progress of attacks and find new vulnerabilities lHard to tell the difference between an attacker and a legitimate researcher lSometimes, the only difference is intent - Hard to tell by looking at data requests Fundamental Problem alert database
Example: Probe-Response Attack attack a particular IP address attack is detected and alert reported to repository alert attacker looks up the alert and learns the address of the detecting IDS sensor IP hashing doesn’t help! Attacker knows targeted subnet, stages simple dictionary attack with small (<256) dictionary repository
Unique attack signature Port combinations Rare IDS rules Multiple scans (to cross statistical thresholds) Attack is detected and alert reported to repository alert Attacker completely maps out network defenses and avoids them in the future “Fingerprinting” Attacks [E.g., see Bethencourt et al., USENIX Security 2005] Attacker wants attack to be detected
A and B can compare their observations of events on C’s network Dictionary attack possible, but address space is large Enables detection of widely observed IP addresses Current IP Address Sanitization Is this IP address on my network? Yes: use HMAC with secret key No: use SHA-1 Can only be compared for equality with IP addresses reported by IDS on the same network Dictionary attack not feasible
Current Alert Sanitization lContent fields scrubbed - InfectedFile, CapturedData, etc. lTimestamps rounded - Tradeoff: limit sequence analysis lHigh port numbers rounded - Tradeoff: limit port analysis possibilities lUnique contributor IDs (not stored) - Rely on source anonymity to hide identity
lFormalization of fingerprinting attacks + secure alert correlation schemes lIP address virtualization that preserves topological structure of address space without revealing true addresses - Reconstruct topology of attack graphs lProtocols that reveal attack data only if similar attack has been observed by a threshold number of contributors Data Sanitization Challenges
Internet Overlay peer-to-peer randomized routing (robust even if some nodes are compromised) Based on Tor (low-latency TCP-level anonymity) Protecting Source Identity
Internet Overlay peer-to-peer randomized routing Future Work: Backpropagation Propagate analysis results back to contributors (e.g., hashed IP addresses for filtering)
lDataset poisoning and denial of service - Deliberate attacks or accidental flooding lPre-registration and vetting are needed lGroup membership credentials - Issued through “blind” registration; unlinkable to contributor’s true identity - Hard to guess, easy to check - Linkability of same-source contributions? lPossible attacks on registration process Source Anonymity Issues
lContributor IDs issued by Cyber-TA Coordination Center - Random IDs unlinkable to true identity lRepositories can blacklist certain contributor IDs lCurrent research: - Prevention of flooding and data poisoning - Revocation mechanisms - Reputation systems Contributor Registration
Timing Attack Internet Observe outgoing connection (sniff or attack 1 st overlay node) De-anonymize alert origin by correlating message timings Overlay peer-to-peer randomized routing
Additional Protection lRe-keying by alert repository - Additional keyed hashing of IP addresses lRandomized hot list thresholds - Publish only the hot list of reported alerts that have something in common Need randomness to prevent flushing attacks lDelayed alert publication … all of these rely on repository integrity!
Source address: can be used as a marker to learn sensor coverage Port number: rare port numbers can be used as markers to link alerts to sensors Destination address: reveals sensor coverage, capabilities, network topology Port number: reveals network services Timestamp: can be used to link an alert to the sensor that produced it SensorID: reveals defensive services and capabilities, organization that owns sensor EventID: reveals defensive services, capabilities, policies Outcome: reveals target site’s vulnerabilities, topologies, policies, etc. Captured data, Infected file: reveals private user data, topology and applications, vulnerabilities. Sample Intrusion Detection Alert