Measuring and Mitigating AS-level Adversaries Against Tor UMass Amherst October 16, 2015 Measuring and Mitigating AS-level Adversaries Against Tor Phillipa Gill Stony Brook University Work done in collaboration with: Rishab Nithyanand (SBU), Oleksii Starov (SBU), Adva Zair (HUJI), Michael Schapira (HUJI)
Anonymity on the Internet Challenge: By observing Internet traffic one can infer who is talking to whom Meta data is the message! Track communications over time… …behaviors, interests, activities Tor aims to solve this: One of the challenges of the Intenret is that it was not designed with anonymity in mind. So somebody observing network traffic can observe the source and destination of each connection and learn about what sites people are visiting even if the connection is encrypted. As we’ve seen with the NSA revelations in recent years even this sort of meta data about who talks to who can be incredibly valuable for tracking peoples behaviors intersts and activities. Tor is a system that tries to resolve this issue by providing users an anonymous way to access content online. It does this using encryption and by bouncing the users traffic off of three relays or Tor routers refered to as the entry, middle and exit relay. The basic idea is that someone observing at the entry relay only learns the identify of the source but not the destination and the exit knows the destination but not the source. On the Internet, by default there is no anonymity, someone observing packets can easily observe the source and destination of the connection and infer who is talking to whom, even if they use encryption . And as we’ve seen with the NSA relevations this meta data is actually quite critical, you can track communications, behaviors, interests and so on. So what systems like to try to do is provide users with the ability to use the internet in an anonymous way. Tor Does not know destination Does not know source
Tor 101 Tor Tor circuit is constructed out of three Tor routers/relays Middle Tor Exit Entry Constructs the path out of three Tor routers/relays. - HOW ARE RELAYS chosen? based on capacity/load ; limited set of entry relays to mitigate relay-based attacks Client iteratively exchanges keys with relays on the path and tunnels to the next relay - When the relay decrypts the message it learns the identity of the next hop on the path. In this way each relay only learns the hop before it and after it in the circuit. - exit see the actual traffic so if the data contains information about the client the exit can learn who it is. Tor browser is important. Client iteratively tunnels and exchanges keys with the relays
Internet routing dynamics make timing attacks easier than you’d think! Threat model Which user is visiting the site? Middle Tor Internet routing dynamics make timing attacks easier than you’d think! Exit Entry Constructs the path out of three Tor routers/relays. - HOW ARE RELAYS chosen? based on capacity/load Client iteratively exchanges keys with relays on the path and tunnels to the next relay - When the relay decrypts the message it learns the identity of the next hop on the path. In this way each relay only learns the hop before it and after it in the circuit. - exit see the actual traffic so if the data contains information about the client the exit can learn who it is. Tor browser is important. Network-based attacks Timing attacks can deanonymize users Actually being tried by gov’t agencies! (What we focus on) Relay-based attacks Finger print Web sites based on packet timing Exit relay can observe users’ traffic (Lots of work on this)
Timing attacks & routing Customer $ Exit relay Entry relay Provider AS3 AS4 Peer AS1 $ AS2 Peer This slide shows the location of the source, entry, exit and destination on the Internet. Here we are considering routing at the AS level. You can think of an AS … -- when I talk about routing here I’m talking about routing at the level of autonomous systems, you can think of an autonomous system as a network under control of a single organization, for example AT&T or stony brook university. -- the client’s traffic will traverse an Internet path consisting of multiple autonomous systems on its way into and out of Tor. Internet routing is based on business relationships! Destination AS Source AS
Timing attacks & routing Asymmetric routing makes things worse! ACK #s leak information! ASes prefer cheaper paths! Exit relay Entry relay AS3 AS3 AS4 AS1 AS2 AS2 However, properties of the internet’s routing system actually make this worse. Specifically traffic from the entry back to the client and from the destination to exit may actually take different paths. Using ack numbers an AS that lies on these reverse paths is also in a position to carry out correlation attacks. For example if we take reverse paths into consideration now AS 3 can also perform the attack. Destination AS Source AS
Attack criteria Any AS that lies on the Forward OR Reverse path, between the Source and Entry … …AND Exit and Destination can execute the attack Challenge: How to measure the potential for these attacks in practice? We can’t actually measure reverse paths! Our approach: Use simulations on empirical AS graphs Consider all paths compliant with a model of routing policies Gives upper bound on potential attacks Middle relay Entry relay Exit relay Destination AS Source AS To make this more formal we say an attacker can perform a timing attack if they lay on the forward or reverse path between the client and entry, and forward OR reverse path between the exit and the destination. We wanted to understand how prevalent this sort of attack would be in practice, but we faced a nasty network measurement challenge which is that we can’t actually measure the reverse paths. We get around this by using simulations of routing policies on empirical AS graphs to understand the set of potential paths that may be chosen for these different routes. This gives us an upper bound on the number of adversaries there may be.
Modeling Internet Routing Standard model: (Gao-Rexford) Based on practices employed by a large ISP Provide an intuitive model of path selection and export policy Path Selection: 1. LocalPref: Prefer customer paths over peer paths over provider paths 2. Prefer shorter paths 3. Arbitrary tiebreak Export Policy: 1. Export customer path to all neighbors. 2. Export peer/provider path to all customers.
From Model to Paths Compute paths satisfying Path Selection and Export Policy Do this on an empirically derived AS topologies State of the art topology derived by CAIDA [IMC 2014] Real topology has ~36K ASes and ~126K edges Requires efficient algorithms for computing paths! We use a breadth first search algorithm to make this feasible Described in: Gill et al. [ ACM CCR 2012] D
Sanity check: How often do modeled paths match real paths? Basic model: 65% of the time measured paths match the model Augmenting model with known complexities: 85% of measured paths match More details in our upcoming IMC paper http://www3.cs.stonybrook.edu/~phillipa/papers/AnwarIMC15.pdf
Understanding the threat to Tor Method: Use VPN to connect to 200 sites (100 popular, 100 likely censored) through Tor Examine AS-level paths between source and destination and chosen entry/exit relays. We use this idea to see how often Tor selects relays that are potentially subject to this timing attack. We used VPN end points in 10 countries and used them to access a set of 200 sites through Tor. We then looked at the paths between the client and entry, and exit and destinations to infer the potential for attack. This graph shows the fraction of web sites that had the circuit for their main page content be vulnerable to this attack and the faction that had any of their circuits vulnerable. More than half of the sites have at least some of their content delivered over a vulnerable circuit. One thing we notice here is that these numbers vary pretty widely between countries for example russia, us and china have more locally hosted content so it’s more likely they will have a path that crosses a national ISP going to the content, Variation between countries (e.g., US, Russia and China have most popular content hosted locally so more of their content is vulnerable). Things are even worse when we consider state-level adversaries 53% of sites have at least some content delivered over a vulnerable Tor circuit
Solution: Astoria Choose an entry/exit relay to avoid attackers Usually there is such an option Otherwise, use a linear program to minimize damage Choose probabilistically to minimize the amount of data observed by an adversary over time Tor Entry 1 Exit 1 To mitigate these vulnerable circuits we propose Astoria. The basic idea is that if you have an entry-exit relay selection that will avoid a potential attacker you should choose it. Otherwise we use a linear program to choose relays probabilistically such that it minimizes the number of circuits an adversary could observe over time. Additional considerations of designing this system were that the path computations had to be done on the client, so we can’t send the destination out to some look up service. We also look at Ases owned by the same organization or under control of the same government. And we also work to be a good network citizen, so when there are multiple safe relay selection options we choose amongst them probabilistically using the same load balancing function as vanilla Tor. Entry 2 Exit 2 Destination Source
Additional considerations Path computations need to be done on the client Client cannot ask about paths to destination! What are the paths? Which relay selection is safe? Entry relays Exit relays Destination
Additional considerations Path computations need to be done on the client Client cannot ask about paths to destination! Minimize performance impact Cannot pre-construct circuits as in vanilla Tor ASes may collude (e.g., sibling ASes, state-level actors) We resolve sibling ASes (e.g., 701, 702, 703 = Verizon) …and evaluate country-level adversaries Being a good network citizen: don’t overload popular relays If there are multiple safe options load balance across them Relay-based attackers Guard relays meant to limit risk of entry-relay attackers Astoria evaluated with and without guards
What if there is no safe option? What if all relay selections contain at least one AS that can perform the timing attack? Astoria minimizes the amount any given attacker can learn Linear program Entry AS 1 1/3 1/4 ISP 1 can snoop with prob. 1/2 ISP 1 can snoop with prob. 2/3 ISP 1 Source AS Entry AS 2 1/3 1/4 ISP 2 Entry AS 3 1/3 1/2
Astoria mitigates network level attackers Fraction of sites with content delivered over vulnerable circuits decreases from 53% to 13% with Astoria AS-level adversary Nation-state adversary
Astoria load balancing Astoria matches vanilla Tor load balancing properties PG: will cut if no time. Reason we match tor is that most of the time it’s load balancing across safe circuits vs. using the LP which doesn’t load balance.
Performance of Astoria Most overhead comes from on-demand circuit set up
Challenges of AS-aware Tor Clients Resilience to network- and relay-based attackers Guard nodes: Less = better defense against relay attacker… … but also reduces number of safe path options Relay selection can leak information to the middle relay Entry and exit relay are now a function of the source and destination! AS-aware Tor clients need solid network data Real time alerts of routing anomalies to combat BGP hijacks Subject of my ongoing work with CAIDA Nodes need high fidelity data about network paths!
Conclusion We quantify the potential for AS-level adversaries to perform timing attacks in light of asymmetric network paths Astoria is able to exploit safe relay selections to reduce the likelihood of timing attacks by >4X … while maintaining load balancing similar to vanilla Tor …and minimizing risk when there is no safe path Lots of future work incorporating network measurements with AS-aware Tor clients Novel path measurement systems Incorporating real-time feeds of BGP anomalies (e.g., hijacks)
Thank you! Questions? phillipa@cs.stonybrook.edu @phillipa_gill http://www.nrg.cs.stonybrook.edu Work presented is funded by: NSF Grants: CNS 1350720, CNS 1422566, CNS 1518845 and a Google Faculty Research Award
Astoria load balancing Astoria matches vanilla Tor load balancing properties PG: will cut if no time. Reason we match tor is that most of the time it’s load balancing across safe circuits vs. using the LP which doesn’t load balance.
Backup slides
Solution: Astoria Goals: Deal with attackers on asymmetric paths Select entry/exit relays to avoid attackers when possible Deal with colluding attackers E.g., sibling ASes owned by common organizations Consider the worst case scenario When there is no safe relay selection minimize damage We use an LP for this (can talk more offline) Minimize performance impact Cannot pre-construct circuits as in vanilla Tor Be a good network citizen Avoid overloading popular relays If there are multiple safe relay options apply Tor’s load balancing
Back up slide: Intuition LP Entry AS1 Uniform Optimal Entry AS1 1/3 1/4 Entry AS2 Entry AS3 1/2 AS1 Src AS Entry AS2 AS2 Entry AS3