Download presentation
Presentation is loading. Please wait.
Published byEzra Matthews Modified over 6 years ago
1
De-anonymizing the Internet Using Unreliable IDs By Yinglian Xie, Fang Yu, and Martín Abadi Presented by Peng Cheng 03/22/2017
2
Introduction The Internet is designed to be open and anonymous.
expand quickly but security concerns. Security rests on host accountability Cannot use IP identifier attacker changes its IP address numerous proxies and NAT devices share common IP addresses
3
Goal Develop an immediate, practical approach to associate traffic with hosts. to what extent can we use IP addresses to track hosts? can we use the binding information between hosts and IP addresses to strengthen network security?
4
Related work Stepping-stone techniques attack
packet-time analysis and content analysis Source-address spoofing ingress and egress filtering
5
Related work Accountable Internet protocol
uses self-certifying addresses to ensure that hosts and domains can prove their identities without relying upon a global trusted authority. Remote device fingerprinting leverage packet-level traffic characteristics or clock skews of a host to generate OS or device fingerprints.
6
HostTracker Relies on application-level events to automatically infer host-IP bindings Leverages IDs derived from application-layer logs to create unique host identifiers and to track the bindings of hosts to IP addresses over time. problem formulation tracking host activities host-tracking results and validations applications conclusion
7
1.1 Host-Tracking Graph The host-tracking graph G : H × T → IP
1. problem formulation 1.1 Host-Tracking Graph The host-tracking graph G : H × T → IP Attribute input events map to the responsible hosts
8
1.2 Host Representation 1. problem formulation In network security applications, it is desirable to represent a host as an entire hardware and software stack and track all its network activity. Since we lack strong authentication mechanisms, we consider leveraging application-level identifiers such as user IDs, messenger login IDs, social network IDs, or cookies.
9
1.3 Goals and Challenges 1. problem formulation Goal: using events logs with unreliable IDs generate two outputs: identity-mapping table host-tracking graph Challenges not one to one mapping between user and IDs dynamic IP proxies and NATs malicious IDs
10
2.1 Overview Three possibilities: multiple user IDs share a host
2.Tracking host activities 2.1 Overview Three possibilities: multiple user IDs share a host IPi i is a proxy associated with multiple hosts U22 is a guest user (either a real guest account, or introduced by an attacker)
11
2.Tracking host activities
2.1 Overview
12
2.2 Application-ID Grouping
2.Tracking host activities 2.2 Application-ID Grouping Compute the probability of two independent user IDs u1 and u2 appearing consecutively (u1,u2) is related user ID pair if P(u1,u2) is smaller than pre-set threshold.
13
2.3 Host-Tracking Graph Construction
2.Tracking host activities 2.3 Host-Tracking Graph Construction Mark all inconsistent bindings in the graph, There are two types of inconsistent bindings: Conflict bindings Concurrent bindings
14
2.4 Resolving Inconsistency
2.Tracking host activities 2.4 Resolving Inconsistency Proxy Identification resolve conflict bindings. To find both types of proxies/NATs. Guest Removal reats events from the untracked group as guest events Splitting Groups resolve concurrent bindings adjusts the grouping by splitting the subset of IDs
15
2.4 Closing the Loop Remove proxy-only users and guest-only groups.
2.Tracking host activities 2.4 Closing the Loop Remove proxy-only users and guest-only groups. Iteratively refine the remaining groups Expands binding window size to increase the coverage.
16
3. host-tracking results and validations
3.1 Input Dataset A month-long user-login trace collected at a large Web- service (330 GB). Each entry has 3 fields: an anonymized user ID (550 million user IDs) the IP address that was used to perform login (220 million IP addresses) the timestamp of the login event
17
3. host-tracking results and validations
3.2 Tracked Coverage
18
3. host-tracking results and validations
3.2 Tracked Coverage
19
3. host-tracking results and validations
3.3 Validation results Although HostTracker and the naive method achieved comparable accuracies in associating user IDs to hardware IDs, HostTracker significantly outperforms the naive method in terms of the accuracy of associating hardware IDs with host IDs.
20
3.4 Mobility Analysis for Users
3. host-tracking results and validations 3.4 Mobility Analysis for Users About 94% of the untracked IDs were signed up very recently since July For the set of tracked user IDs, their account signup dates were evenly distributed over time
21
3.4 Mobility Analysis for Users
3. host-tracking results and validations 3.4 Mobility Analysis for Users Histogram of the number of s sent by tracked and untracked IDs who logged in from at least 10 IP ranges. Our study suggests that whether a user is associated with tracked host or not can be a discriminating feature to identify botnet activities
22
4.1 Estimation of Malicious-Host Population
4. applications 4.1 Estimation of Malicious-Host Population Estimation of Malicious-Host Population to overcome the dynamic IP-address assignment. HostTracker can compute the number of malicious hosts more accurately
23
4.2 Building Profiles for Normal Users
4. applications 4.2 Building Profiles for Normal Users Generate user statistics to help understand normal user behavior and distinguish abnormal activities. 220 million tracked users and the 5.6 million known malicious IDs. Only 50.2K (0.02% of the 220 million) are in this intersection.
24
4.3 Postmortem Forensic Analysis
4. applications 4.3 Postmortem Forensic Analysis Using known malicious activities as a seed, we can conduct postmortem forensic analysis to identify more malicious activities
25
4. applications 4.4 Real-time Tracklists Following the activity of the tracked hosts, thus can block malicious activities despite the change of IP address.
26
Conclusions This paper demonstrates when IP addresses are augmented with unreliable application IDs, many activities can be attributed to the responsible hosts despite the existence of dynamic IP addresses, proxies, and NATs. The host-IP binding information can be used to effectively identify and block malicious activities by host rather than by IP address.
27
Conclusions Binding malicious accounts to fixed hosts not only increases their risk of being detected, but also limits the attack traffic. The malicious accounts appearing at proxies may have a higher chance of evading detection by mimicking legitimate account profiles. But binding many malicious accounts to proxies also increases the chance of them being all detected and blocked, and proxy activities may be subject to stricter security tests.
28
Thank you! Question?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.