Security in P2P Networks A study of the gnutella protocol and it’s weaknesses By: Imran Qureshi Date: December 9, 2004
Gnutella Security - Overview - What is Gnutella? The history - The topology of Gnutella - no central server (de-centralized - second generation) - direct peer connection - Gnutella Protocol - Gnutella Descriptors - 5 descriptors - ping, pong, query, queryhit, push - byte structure of the descriptors - descriptor header - byte structure - Communication in Gnutella - Finding and connecting to other servents - Downloading resources - offline - Firewalled servents
Overview - Security Risks -Spamming - Denial of service attacks - Pong attack - IP harvesting - Spreading viruses through the push descriptor - Man in the Middle attacks - Solutions - Validation - Gnutella Proxy Server
Gnutella History
History of Gnutella Gnutella was developed at Nullsoft, a subsidiary of AOL, by Justin Frankel and Tom Pepper Justin Frankel, as some call him “the world’s most dangerous geek, created Winamp at the age of 18 and a few years later, Gnutella while working for AOL. Gnutella was released on 14th March 2000 During those days, Napster was under scrutiny of lawsuits regarding illegal copyrighted material. When people came to know about Gnutella, a large number of people downloaded it. AOL forced Nullsoft to take down all links to Gnutella from it’s website since it promoted piracy. But for the small time that gnutella was available, one day, a large group of people already had it. Gnutella was open source, so people started reverse engineering the protocol and now we have different programs using the Gnutella protocol:
Gnutella Clients Source: Peer-to-Peer Networks, by Prof. N.Vlajic
Gnutella Topology
Gnutella - Topology Gnutella topology is known as “de-centralized topology”. Meaning that the communication between two peers or users or nodes on the network take place directly. Each node acts as a client or a server, giving permission to download resources or asking other nodes to access there resources. Famous P2P clients; Napster, Kazaa, Gnutella The total number of peers found on the Gnutella network during a weekday is around 43,546, sharing approximately 1,843,549 files. The communication does not go through a central server, unlike Napster. Each node or peer on the network is called a “servent”. The word servent comes from: Each peer = SERVer + cliENT = “SERVENT”
Gnutella – Topology (contd…) Napster (central server)Gnutella (no central server)
Gnutella Protocol
The Gnutella Protocol are a set of rules by which users communicate over the network. All the communication is done via the use of “descriptors” There are 5 basic descriptors used, namely : - Ping, Pong, Query, QueryHit and Push Each descriptor is preceded by a “descriptor header” In the following slides, we will describe the purposes of the descriptors and there byte structure.
Gnutella Protocol – Byte Structures The Descriptors: When a peer talks to another peer, the communication is done via descriptors. The byte structure of a typical message is as follows: Note: - All the following structures are in little-endian byte order (least significant value is stores first) - All IP addresses are in IPv4 format: Descriptor HeaderDescriptor Payload variable,0…max 0xD00x110x320x04 byte1 byte2 byte3 byte4
Gnutella Protocol – Byte Structures Descriptor Header: Byte Structure - Descriptor ID – Unique identifier for the descriptor on the network (16-byte String) - Payload Descriptor – This value depends on the descriptor being sent: ping -0x00 pong-0x01 query-0x80 queryhit-0x81 push-0x40 - TTL (Time to live or Horizon) – The number of times that the descriptor will be forwarded. Each servent that receives a descriptor, will decrement the value of TTL and forward it on to the next peer. When TTL reaches 0, the descriptor is no longer forwarded. TTL is the best way available to reduce the amount of network traffic and prevent poor performance. Descriptor IDPayload Descriptor TTLHopsPayload Length
Gnutella Protocol – Byte Structures - Hops – Total number of times the descriptor has already been forwarded. The hop value is incremented by each peer who receives it. TTL(initial) = TTL(current) + Hops(current) - Payload Length – The length of the next descriptor. Used to find the beginning of the nest descriptor. Right after the descriptor header, is a descriptor payload. This payload could be : Ping A ping descriptor is used by a servent to find or search for other servents on the network. A servent who receives a ping descriptor, responds back with a pong. Ping have a length of 0 and have no payload. Hence they have no byte structure. The descriptor header identifies a ping by having a value of 0x00 in the payload descriptor field and a value of 0x in the payload length field
Gnutella Protocol – Byte Structures Pong Sent as a response to a ping Defining values: - Port: the port at which this responding can accept incoming connections - IP Address: IP Address of the responding host (big-endian format) - Number of files shared: Total number of files the responding is sharing on the network (usually found in the “shared folder” - Number of Kb’s shared: Total number of Kb’s the responding host (with the given IP and Port) is sharing. PortIP AddressNumber of files shared Number of Kb shared
Gnutella Protocol – Byte Structures Query After a servent has the IP address and the port of other servents, it may search for particular files using the query descriptor. Defining values: - Minimum Speed: The minimum speed (in kb/s) of the servents who should respond to this query request. A query with the minimum speed requirements of m (kb/s), should only responded to with a queryhit by a servent who has a speed greater than m. - Search Criteria: A search string terminated by a null (0x00). The maximum length is bounded by the payload_length field of the descriptor header. eg: “nameofthesong.mp3” Minimum SpeedSearch Criteria
Gnutella Protocol – Byte Structures QueryHit - No. of Hits: Total number of hits or matches for the query in the result set - Port: the port at which this responding can accept incoming connections - IP Address: IP Address of the responding host (big-endian format) - Speed: Speed of the responding host - Result Set: Set of No. of hits responses for the correspoding query. In otherwords, how many files in the shared folder of the responding host met the search criteria. Each of the set of the No. of hits elements, has the following structure: - File Index: Location and the ID of the file matching the query. (assigned by the responsing host) - File size: Size in bytes of the file. - File Name: name of the file (double null terminated 0x0000) - Servent Identifier: Unique 16-byte string identifier of the responding servent on the network. No. of Hits PortIP Address SpeedResult SetServent Identifier File IndexFile SizeFile Name
Gnutella Protocol – Byte Structures Push The basic purpose of a push descriptor is to connect to a servent who is behind a firewall. This topic is discussed in detail later on. Defining values: - Servent Identifier: targeted or firewalled servents unique 16-byte string identifer on the network, being requested to push the file with a index of File Index - File Index: index of the file to be pushed on the targeted servents shared folder. - IP Address: IP Address of the servent (big-endian format) to whom will be pushed - Port: the port on the targeted host, through which the file should be pushed. Servent Identifier File IndexIP AddressPort
Communication in Gnutella
Finding servents In order to connect to a gnutella network and share files, a servent needs to run one of the many gnutella clients (ex; bearshare, morpheus etc..). After the network is launched, this peer or node will let it’s neighboring node (let’s say B) know of its existence. (You should know the Domain Name Server DNS or IP Address of some neighbor at the start). A will let’s its neighbors know of its existence by sending out the ping descriptor. B in turn will forward the ping to it’s neighbors and this descriptor will keep going throughout the network letting the nodes know of A’s existence. Like that, the information is broadcasted, and will keep on going to different nodes on the network until the time-to-live (TTL) packet expires or reaches 0.
Communication in Gnutella Now, A has become part of the network and everyone know of it’s existence. If a servent wants to acknowledge, it will send a pong descriptor to A, letting it know which of its port is accepting traffic and what’s the IP address. Like that, A will have a file of all the IP address and ports of the servents who responded with a pong descriptor.
Communication in Gnutella Servent A announcing existence to peers Source: Prof. Igor Ivkovic, Dept. of Compt. Science at Univ. of Waterloo “Improving Gnutella Protocol”
Communication in Gnutella Connecting to Servents Now that A has a file containing other servents addresses and ports, it will try to connect to one of those servents (lets say B) After an TCP session is established with B, A will then send the following commands in ASCII : GNUTELLA CONNECT/ \n\n where protocol version is the current version of Gnutella (ex: “0.4”) If B wants to connect, it responds to the command by sending: GNUTELLA OK\n\n Now, there is a valid direct connection between A and B. If B responds with any other command, A will know that B has no willingness to create a connection.
Communication in Gnutella Now that this connection has been established, the communication between A and B will carry on with the use descriptor and descriptor headers, as described before. Ping, Pong, Query, Queryhit and Push
Communication in Gnutella Downloading resources or files from other servents Before downloading is done, we need to search for the files. Searching for files Let’s again take our two servents A and B Suppose that A wants to search for a file called “ushersong.mp3”. It will send out a query descriptor as follows: - Let’s suppose that the minimum speed requirements are x: If a servent has a file or files which has the file “ushersong.mp3” and has a speed >= x (kb/s), it may chose to send a queryhit descriptor as follows: Xushersong.mp3
Communication in Gnutella > xResult SetServent Identifier bytes“ushersong.mp3” Result Set:
Communication in Gnutella A will receive the queryhit descriptor and ask for downloading the file. Downloading All searches on the gnutella network are done online while the downloads are done offline Hence, two servents who wish to download, communicate using HTTP commands. So, in our example A creates a TCP connection with B and sends the following command to download the file: GET /get/ / / HTTP/1.0\r\n Connection: Keep-Alive\r\n Range: bytes=0-\r\n User-Agent: Gnutella\r\n \r\n source: Mattias Jansson, “Gnutella” Feb 1, 2004
Communication in Gnutella For our example, the HTTP command will read: GET /get/2/ushersong.mp3/ HTTP/1.0\r\n Connection: Keep-Alive\r\n Range: bytes=0-\r\n User-Agent: Gnutella\r\n \r\n A response to this could be : HTTP 200 OK\r\n Server: Gnutella\r\n Content-type: application/binary\r\n content-length: \r\n \r\n … data … source: Mattias Jansson, “Gnutella” Feb 1, 2004
Communication in Gnutella Fire walled Servents: If a targeted servent, from whom a file needs to be downloaded, is behind a firewall, it is not possible to create a direct connection in order to download the file. The fire wall will not allow incoming connections to it’s gnutella port. Hence, the requesting servent sends a push descriptor. Upon receiving the push request, the targeted servent tries to create a TCP/IP connection with that host. If this connection is not established, then it means that both the servents/hosts are behind a firewall. So the targeted servent sends the following command: GIV/ : / \n\n After receiving this command, the requesting servent sends the following HTTP GET request:
Communication in Gnutella GET /get/ / / HTTP/1.0\r\n Connection: Keep-Alive\r\n Range: bytes=0-\r\n User-Agent: Gnutella\r\n \r\n The rest of the download process is similar to what I described before.
Security Risks of Gnutella
Spamming and Denial of Service Attacks In s, spammed messages can easily be deletd and there will be no further harm. But, if you accept a spammed query, the consequences can be very harsh and you could actively play a part in the Denial of Service Attacks (DOS) DOS attacks in Gnutella are achieved very very easily. If a user (A) asks for a file to be downloaded from another peer(B), it will query it. Let’s say that B in our case is a malicious peer and is misbehaving on the network. B will receive the query from A and respond positively, and urge A to download the file from C (the host under attack) Hence A will start downloading the files from C, without knowing that it is actually downloading it from C.
Security Risks of Gnutella This way, the malicious B will direct many peers to download files from C and hence create a denial of service attack The important to understand in this concept is that, any body could be playing a role in a DOS, with out knowing it. At some point, the load on C could be so much that it could be unable to allows connections to more peers and may even crash. It will also be very hard for any to identify who originated this attack, since request to C could be coming from many different IP and many different Domains.
Security Risks of Gnutella Pong Attack The concept behind a pong attack is the same as the DOS attack When the malicious B receives a ping from A, it might reply back with a pong, containing the IP and port of C (host under attack) A believes that a connection has been established with B, and will start forwarding queries, even though they are going to C PortIP AddressNumber of files shared Number of Kb shared
Security Risks of Gnutella IP Harvesting Hackers are always in search for people’s IP addresses. They continuously search and scan the internet in order to see people’s IP addresses. Since most web servers have highly protective firewalls, it is hard for them for break through. But in Gnutella, IP are easily derived. P2P networks work in a way that requires you to advertise your IP address. A hacker could easily gather or harvest IP addresses and attack vulnerable user on the network. This is not a problem for people with Dial-up Connections, wince there IP keeps on changes. But the people with static IP addresses (such as montclair state university or “.edu” domains) are in trouble.
Security Risks of Gnutella Transferring viruses through the push descriptor A typical push descriptor contains the IP addresses of the responding host and the port that is accepting traffic. When a user sends out a query to a peer, that peer might lie and say that it has the file even though it doesn’t. Then the user will send a push request to the responding peer and the responding peer will create a TCP/IP connection with the user. Now, the responding host can easily transfer any files to the user, since it has already gained trust by lying. These files could be “.exe” files, that could transfer a virus to the user’s computer
Security Risks of Gnutella Man in the Middle Attacks: I will describe this with the use of an example: We have three people: A – searching for a file B – has the file C – malicious user A pings the network searching for a file. B has the file, and responds back with a query. Suppose C receives one of these queries, changes it to it’s own IP and port, and directs it to A A, who gets the reply from C, creates the connection with C but not B C, on the other hand, download the original file from B, infects it with malicious content, and then transfers it to A
Solutions
1)Validation 2)Unique Network Identifier 3)Reduce Network traffic
Thank You For Your Attention Questions or Suggestions about any concepts discussed ?