Design and Implementation of HTTP-Gnutella Gateway Baoning Wu (baw4) Wei Zhang (wez5) CSE Department Lehigh University.

Slides:



Advertisements
Similar presentations
Inktomi Confidential and Proprietary The Inktomi Climate Lab: An Integrated Environment for Analyzing and Simulating Customer Network Traffic Stephane.
Advertisements

ITIS 1210 Introduction to Web-Based Information Systems Chapter 44 How Firewalls Work How Firewalls Work.
Improving Search in Peer-to-Peer Networks Beverly Yang Hector Garcia-Molina Presented by Shreeram Sahasrabudhe
University of Cincinnati1 Towards A Content-Based Aggregation Network By Shagun Kakkar May 29, 2002.
1 Content Delivery Networks iBAND2 May 24, 1999 Dave Farber CTO Sandpiper Networks, Inc.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
EEC-484/584 Computer Networks Lecture 6 Wenbing Zhao
Rheeve: A Plug-n-Play Peer- to-Peer Computing Platform Wang-kee Poon and Jiannong Cao Department of Computing, The Hong Kong Polytechnic University ICDCSW.
Peer-to-peer Multimedia Streaming and Caching Service Jie WEI, Zhen MA May. 29.
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
EEC-484/584 Computer Networks Discussion Session for HTTP and DNS Wenbing Zhao
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
1 Improving Web Servers performance Objectives:  Scalable Web server System  Locally distributed architectures  Cluster-based Web systems  Distributed.
CDNs & Replication Prof. Vern Paxson EE122 Fall 2007 TAs: Lisa Fowler, Daniel Killebrew, Jorge Ortiz.
Part 1: Overview of Web Systems Part 2: Peer-to-Peer Systems Internet Computing Workshop Tom Chothia.
TCP Splicing for URL-aware Redirection
A Distributed Proxy Server for Wireless Mobile Web Service Kisup Kim, Hyukjoon Lee, and Kwangsue Chung Information Network 2001, 15 th Conference.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
1 Web Content Delivery Reading: Section and COS 461: Computer Networks Spring 2007 (MW 1:30-2:50 in Friend 004) Ioannis Avramopoulos Instructor:
7DS: Node Cooperation in Mostly Disconnected Networks Henning Schulzrinne (joint work with Arezu Moghadan, Maria Papadopouli, Suman Srinivasan and Andy.
7DS Seven Degrees of Separation Suman Srinivasan IRT Lab Columbia University.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
 Proxy Servers are software that act as intermediaries between client and servers on the Internet.  They help users on private networks get information.
Everything. MACIP End-host IP: MAC: 11:11:11:11:11 gateway IP: MAC: 22:22:22:22:22 Google server IP: MACIP MACInterfaceMACInterface.
CS 4700 / CS 5700 Network Fundamentals Lecture 17.5: Project 5 Hints (Getting a job at Akamai) Revised 3/31/2014.
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
On the Use and Performance of Content Distribution Networks Balachander Krishnamurthy Craig Wills Yin Zhang Presenter: Wei Zhang CSE Department of Lehigh.
KaZaA: Behind the Scenes Shreeram Sahasrabudhe Lehigh University
P2P File Sharing Systems
INTRODUCTION TO WEB DATABASE PROGRAMMING
Introduction Widespread unstructured P2P network
Slow Web Site Problem Analysis Last Update Copyright 2013 Kenneth M. Chipps Ph.D. 1.
思科网络技术学院理事会. 1 Application Layer Functionality and Protocols Network Fundamentals – Chapter 3.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.1 ISP Services Working at a Small-to-Medium Business or ISP – Chapter 7.
DNS (Domain Name System) Protocol On the Internet, the DNS associates various sorts of information with domain names. A domain name is a meaningful and.
INTERNET DATA FLOW Created by David Whitchurch for ISDS 4120 Louisiana State University.
Postacademic Interuniversity Course in Information Technology – Module C1p1 Contents Data Communications Applications –File & print serving –Mail –Domain.
HTTP HTTP stands for Hypertext Transfer Protocol. It is an TCP/IP based communication protocol which is used to deliver virtually all files and other.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Application Layer Functionality and Protocols.
Vulnerabilities in peer to peer communications Web Security Sravan Kunnuri.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 23 How Web Host Servers Work.
HOW WEB SERVER WORKS? By- PUSHPENDU MONDAL RAJAT CHAUHAN RAHUL YADAV RANJIT MEENA RAHUL TYAGI.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
GNUTELLA PEER-TO-PEER NETWORKING. GNUTELLA n What is Gnutella n Relation to the World Wide Web n How it Works n Sites / Links / Information.
1 Welcome to CSC 301 Web Programming Charles Frank.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
The Intranet.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Module 7: Advanced Application and Web Filtering.
1 Web Servers (Chapter 21 – Pages( ) Outline 21.1 Introduction 21.2 HTTP Request Types 21.3 System Architecture.
WEB SERVER Mark Kimmet Shana Blair. The Project Web Server Application  Receives request for web pages or images from a client browser via the internet.
JS (Java Servlets). Internet evolution [1] The internet Internet started of as a static content dispersal and delivery mechanism, where files residing.
Web Server.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
27.1 Chapter 27 WWW and HTTP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
CSI 3125, Preliminaries, page 1 Networking. CSI 3125, Preliminaries, page 2 Networking A network represents interconnection of computers that is capable.
Peer-to-peer systems (part I) Slides by Indranil Gupta (modified by N. Vaidya)
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
How Web Database Architectures Work CPS181s April 8, 2003.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
P2P Networking: Freenet Adriane Lau November 9, 2004 MIE456F.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
BitTorrent Vs Gnutella.
Unit 5: Providing Network Services
Processes The most important processes used in Web-based systems and their internal organization.
Working at a Small-to-Medium Business or ISP – Chapter 7
Working at a Small-to-Medium Business or ISP – Chapter 7
Working at a Small-to-Medium Business or ISP – Chapter 7
Computer Networks Protocols
The Internet and Electronic mail
Presentation transcript:

Design and Implementation of HTTP-Gnutella Gateway Baoning Wu (baw4) Wei Zhang (wez5) CSE Department Lehigh University

Motivation Peer-to-peer networking is a hot topic. Peer-to-peer networking is a hot topic. Can P2P nodes search and get files from Web sites? Can P2P nodes search and get files from Web sites? Can one P2P network search and get files from other P2P networks? Can one P2P network search and get files from other P2P networks? In our project, we have built a special gateway between Gnutella and Web sites. In our project, we have built a special gateway between Gnutella and Web sites.

Related Work David McNab has launched Freenet search engine. David McNab has launched Freenet search engine. Asiayeah is a Gnutella search engine. Asiayeah is a Gnutella search engine. Filedonkey.com is an Edonkey search engine. Filedonkey.com is an Edonkey search engine. Kalepa Networks, Inc is doing work about connecting different P2P systems. Kalepa Networks, Inc is doing work about connecting different P2P systems. Our work is kind of reverse to all above works. Our work is kind of reverse to all above works.

Mechanism of Gnutella Searching Node A sends a query to its neighbor B; Node A sends a query to its neighbor B; Node B boardcasts the query to its neighors C, D; Node B boardcasts the query to its neighors C, D; Node C has the objects node A needs and then returns a query hit message to node B; Node C has the objects node A needs and then returns a query hit message to node B; Node B forwards the query hit message by consulting the local states. Node B forwards the query hit message by consulting the local states.

Architecture of HTTP-Gnutella Gateway

Mechanism of the gateway 1. Node A broadcasts a query message directly or indirectly to the HTTP-Gnutella gateway; 2. The HTTP-Gnutella gateway forwards the translated query message to search engine; 3. The search engine returns a bunch of query results to the gateway; 4. The gateway translates the results into Gnutella formats and then forwards them to node A; 5. If node A initializes a download requests to the gateway, the gateway will translate the Gnutella request into a well-formatted HTTP request to the Web server; 6. The gateway fetches the data from the Web server; 7. The gateway forwards the data from the Web server to node A.

Handle Query Messages We still use the original Gnutella mechanism to judge whether to forward the message or not. We still use the original Gnutella mechanism to judge whether to forward the message or not. The gateway captures all of queries with hops# < 5 and sends them to search engine. The gateway captures all of queries with hops# < 5 and sends them to search engine.

Search Engine API Google search engine API has a limit of up to 1,000 requests per day. Google search engine API has a limit of up to 1,000 requests per day. Search engine API consists of three main functions: Search engine API consists of three main functions: Query conversion Query conversion Extraction of URLs Extraction of URLs Measurement of content size Measurement of content size

Generate Query Hit Messages Two considerations: Two considerations: Let Gnutella nodes contact Web servers directly Let Gnutella nodes contact Web servers directly Let the gateway work as a proxy Let the gateway work as a proxy The gateway fills its own IP address and a specific port number (currently 9999) in the query hit messages. The gateway fills its own IP address and a specific port number (currently 9999) in the query hit messages. File names are URLs of Web objects. File names are URLs of Web objects.

Downloading Service Translate Gnutella download request into a well- formatted HTTP request. e.g. Translate Gnutella download request into a well- formatted HTTP request. e.g. GET /get/1234/ HTTP/1.1 GET /get/1234/ HTTP/1.1 User-Agent: Gnutella User-Agent: Gnutella Host: :6346 Host: :6346=> GET HTTP/1.1 GET HTTP/1.1 User-Agent: Gnutella User-Agent: Gnutella Host: Host: It should handle Gnutella handshakes properly. It should handle Gnutella handshakes properly. It also records the bytes transferred. It also records the bytes transferred.

Problems & Solutions Irregular handshakes Irregular handshakes We handle all possibilites We handle all possibilites File size File size We use HTTP HEAD request to get file size We use HTTP HEAD request to get file size Broken Pipe signal Broken Pipe signal We use forked process We use forked process

Experiment Results Outline Outline Basic verification and validation Basic verification and validation Log file format Log file format Results #1 to #4 Results #1 to #4

Basic Verification & Validation Run our special gateway on machine 1 and run a normal gtk-gnutalla client on machine 2. After machine 2 connects to machine 1, we use machine 2 to send query messages and downloading request to machine 1. Run our special gateway on machine 1 and run a normal gtk-gnutalla client on machine 2. After machine 2 connects to machine 1, we use machine 2 to send query messages and downloading request to machine 1. For downloaded files from machine 1, we use wget to get the same file from web server directly and use diff to test if they are identical. For downloaded files from machine 1, we use wget to get the same file from web server directly and use diff to test if they are identical.

Log File Format Log 1 Log 1 Time stamp, MUID, IP address, Type, Query Time stamp, MUID, IP address, Type, Query Log 2 Log 2 Time stamp, IP address, URL, Size, Code, Success Time stamp, IP address, URL, Size, Code, Success

Results #1 No. of Query messages: 319,245 No. of Query messages: 319,245 No. of Query Hit messages: 930,860 No. of Query Hit messages: 930,860 No. of served requests: 113,391 No. of served requests: 113,391 Average Response Time: seconds Average Response Time: seconds

Result #2

Result #3 No. of Downloading requests: 952 No. of Downloading requests: 952 No. of Different IP addresses: 67 No. of Different IP addresses: 67 No. of served Requests: 945 No. of served Requests: 945 No. of sucessfully served requests: 740 No. of sucessfully served requests: 740 Total size transfered: 244,227,881 bytes Total size transfered: 244,227,881 bytes Average response time: 3.15 seconds Average response time: 3.15 seconds Average total download time: seconds Average total download time: seconds

Result #4

Future Work Support a variety of file types and measure their popularity Support a variety of file types and measure their popularity Build a gateway to connect different P2P systems Build a gateway to connect different P2P systems Deployment of such gateways Deployment of such gateways

Conclusion An HTTP-Gnutella gateway was built and worked for the Gnutella users. An HTTP-Gnutella gateway was built and worked for the Gnutella users. Only 5 days, the gateway transferred about 244MB data from the Web sites to the Gnutella nodes. Only 5 days, the gateway transferred about 244MB data from the Web sites to the Gnutella nodes. The systems achieved all goals of our design. The systems achieved all goals of our design.

Question?