Algorithms for Selecting Mirror Sites for Parallel Download

Slides:

Advertisements

Similar presentations

Congestion Control and Fairness Models Nick Feamster CS 4251 Computer Networking II Spring 2008.

Advertisements

Introduction 2 1: Introduction.

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck.

The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco.

Measurements of Congestion Responsiveness of Windows Streaming Media (WSM) Presented By:- Ashish Gupta.

Lesson 1-Introducing Basic Network Concepts

How do Networks work – Really The purposes of set of slides is to show networks really work. Most people (including technical people) don’t know Many people.

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.

Internet Traffic Patterns Learning outcomes –Be aware of how information is transmitted on the Internet –Understand the concept of Internet traffic –Identify.

Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?

1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.

1 Emulating AQM from End Hosts Presenters: Syed Zaidi Ivor Rodrigues.

Dynamic parallel access to replicated content in the Internet Pablo Rodriguez and Ernst W. Biersack IEEE/ACM Transactions on Networking, August 2002.

1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.

Switching Techniques Student: Blidaru Catalina Elena.

Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.

Design and Implement an Efficient Web Application Server Presented by Tai-Lin Han Date: 11/28/2000.

Chapter 4. After completion of this chapter, you should be able to: Explain “what is the Internet? And how we connect to the Internet using an ISP. Explain.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

HOW WEB SERVER WORKS? By- PUSHPENDU MONDAL RAJAT CHAUHAN RAHUL YADAV RANJIT MEENA RAHUL TYAGI.

Parallel Access For Mirror Sites in the Internet Yu Cai.

Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &

Switching breaks up large collision domains into smaller ones Collision domain is a network segment with two or more devices sharing the same Introduction.

The Transmission Control Protocol (TCP) Application Services (Telnet, FTP, , WWW) Reliable Stream Transport (TCP) Connectionless Packet Delivery.

240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.

Peer-Assisted Content Distribution Pablo Rodriguez Christos Gkantsidis.

Copyright 2008 Kenneth M. Chipps Ph.D. Controlling Flow Last Update

The concept of RAID in Databases By Junaid Ali Siddiqui.

Distributed Systems CS Consistency and Replication – Part IV Lecture 21, Nov 10, 2014 Mohammad Hammoud.

OLE Slide No. 1 Object Linking and Embedding H OLE H definition H add other information to documents H copy.

Computer Simulation of Networks ECE/CSC 777: Telecommunications Network Design Fall, 2013, Rudra Dutta.

Introduction to Active Directory

Computer Networks Chapter 8 – Circuit Switching versus Packet Switching.

TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).

Day 13 Intro to MANs and WANs. MANs Cover a larger distance than LANs –Typically multiple buildings, office park Usually in the shape of a ring –Typically.

THE FUTURE IS HERE: APPLICATION- AWARE CACHING BY ASHOK ANAND.

INTRODUCTION TO WEB HOSTING

Vocabulary Prototype: A preliminary sketch of an idea or model for something new. It’s the original drawing from which something real might be built or.

COMPUTER NETWORKS CS610 Lecture-21 Hammad Khalid Khan.

Design Components are Code Components

1.4 Wired and Wireless Networks

Prepared by: Assistant prof. Aslamzai

Vocabulary Prototype: A preliminary sketch of an idea or model for something new. It’s the original drawing from which something real might be built or.

Mohammad Malli Chadi Barakat, Walid Dabbous Alcatel meeting

Web Caching? Web Caching:.

CHAPTER 3 Architectures for Distributed Systems

The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.

Vocabulary Prototype: A preliminary sketch of an idea or model for something new. It’s the original drawing from which something real might be built or.

Whether you decide to use hidden frames or XMLHttp, there are several things you'll need to consider when building an Ajax application. Expanding the role.

Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.

Database Performance Tuning and Query Optimization

Internet Networking recitation #12

Enterprise Application Architecture

Chapter 16: Distributed System Structures

Economics, Administration & Information system

Client side & Server side scripting

Cluster Resource Management: A Scalable Approach

Grid Computing Done by: Shamsa Amur Al-Matani.

Switching Techniques.

Design Components are Code Components

Multithreaded Programming

Introduction to Networking & TCP/IP

Chapter 11 Database Performance Tuning and Query Optimization

Management From the memory view, we can list four important tasks that the OS is responsible for ; To know the used and unused memory partitions To allocate.

EE 122: Lecture 22 (Overlay Networks)

Internet: Quality of Service Mechanisms at Application Level

COMP755 Advanced Operating Systems

Presentation transcript:

Algorithms for Selecting Mirror Sites for Parallel Download Sonali Patankar CS522 Semester Project Dec 05, 2001

Download Mechanism What happens when we download? Popular Software is required to be downloaded frequently Mirror site concept With the advent of internet , information sharing has become easy. Whenever we need a document that exists on one of the computers on the internet, we download it for our use. What really happens when we download. In simplest words, we set up a TCP connection, with the computer that has the required file. Depending on the characteristics of the path selected for this download, our download may take more or less time, which may be because of variety of reasons such as packet loss, bottleneck bandwidth etc In case of freeware/shareware , the cheapest way to offer a software is to offer it online for download. If the software is popular, it is going to have many requests for download. No matter how nice our server is, on which we have put the software for download, the actual download performance is dependent on many factors that are outside our network , and also probably outside, the users network , who is trying to download. With more people to access the same site makes this scenario even worse. How do we solve this problem ? Mirror Sites are the sites the contain the exact replica of the file that we are interested in offering for download. By doing that, the users will be able to select the site geographically closer to their location, thereby distributing traffic on the network and not putting the load on a single site. This approach improves the overall performance for download, but is still limited by the allowed bandwidth on the path. How can we improve this ?

Parallel Access of Mirror Sites Access Mirror sites in parallel Is it that simple ? What Techniques we can use ? History Based TCP parallel-access Dynamic TCP parallel access Now that we have multiple mirror sites, we can divide the work amongst the mirror sites and get better results. What we really do here is for example if we have a file of 10 MB to be downloaded, and we have 10 mirror sites, we will get 1Mb piece of the file from each mirror site and combine them once we receive all the parts. That way we divide the work and get a better download time, than downloading from a single site. Though this approach looks very simple, there are many considerations that we need to account for. If most of the mirror sites we used are geographically located farther then it will take considerable time for that site to send its piece. While it is taking its own time, the mirror site nearer to us, might have already sent its piece, and is idle from our point of view. To achieve benefit of this technique, we need to have a balance between the parameters which decide how much time is involved in this procedure. Research has been conducted in this area , to determine the techniques that allow us to use this technique with performance improvement. These techniques are History based TCP Parallel access and Dynamic TCP parallel access.

History based TCP parallel-access History data of the servers is used Client decided how much data a server should send Limitations of this approach In this technique, we use the history data for a given server, which will mainly include the bandwidth and response time. This data is typically obtained by querying a database which maintains and provides such vital statistics for the server. With the knowledge of this data, now client has a little better understanding of the server capabilities. Very often, it will not decide to request equal size of file from each server. The paths/mirror sites which have a lower bandwidth , will be asked to carry less data than those capable of carrying more data in less time. Based on these decisions the client will request separate parts from different servers, collect the data , request more data from that server if required , and the download will be complete. For this technique to work at its best efficiency, we expect that all the servers must deliver useful data to client until the download is complete. There is a possibility that one of the servers might be done with sending its piece, and the download is not yet complete, since we have not yet fully received one of the pieces from another server. As with any technique there are pros and cons with this one too. The pro is that our download performance is improved. As to cons, there are some which can make this technique not look so good. The weakest link in the path to a server plays a major role on the outcome of the download. Hence the dialup modem users may not benefit from this technique as their connecting link becomes the bottleneck( weakest link). Another important factor is the history data that we are using for deciding what servers should carry how much data. Many times network/server conditions are not the same at two different occasions. At one time server/network may be not so busy and at other times it may be heavily loaded. If we happen to use this algorithm, at the busiest time of server/network, we may not get the best results possible. Now we look at the next technique

Dynamic TCP parallel access Client partitions the document into small size blocks and makes the first request On receipt of the block it negotiates with the server for the next block This technique does not base its decisions on any history data. Client first divides the document to be received in small chunks. Since it does not know about the characteristics of different servers, it requests a separate block from each server. When one of the blocks is received by the client from a server, it request the next block that has not yet been requested earlier. This process of deciding which block to request and actually requesting it from the server is called negotiation. For every such negotiation, the client spends sometime in which no useful data transmission is taking place from the server in negotiation. The size of block determined by the client also plays in to how many or how few negotiations may be necessary. The immediate benefit, of this technique is that based on the current network/server conditions, this technique will keep all the servers busy working for the download, there by providing performance improvement. But sometimes it may take a long time, for a slow server/network combination to send very little data it has been asked for. This may ultimately result in the waiting on client side because of the missing data that is yet to be received.

Selecting a subset of mirror sites Both techniques are good but there are limitations How can we select the best mirror sites we should use and not worry about the rest As we saw before both techniques are good but there are some limitations. In both of the above techniques, we are using all the mirror sites in the set that has been provided. What if we do not want to use all the mirror sites . In that case how do we choose what mirror site should be used or not used. Using the two techniques learnt before, we can create some sort of hybrid techniques.

Hybrid Algorithm (Best 5 mirror sites) Choose the 5 paths which have highest bottleneck bandwidths and lowest roundtrip time (request a sample piece of file) For the paths chosen, request a relatively small piece of data from all the paths Upon receipt of response, use the actual time required to retrieve the data, to determine, how much efficient that path really is, and decide how much data we should request from that server Given a set of 10 mirror sites, we need to find out the best five which we should use for parallel download. As we have learnt earlier, the weakest link in the path is the most important part of that path. Using the TCP diagnostic utilities such as pathchar, we can find out the bottleneck bandwidth for a given path and the RTT(round trip time). We request a small piece of the file for all the sites and figure out the time , for the client to receive that packet. Now based on the bandwidth (if available from earlier pathchar) and the time required for receiving data, we can choose the 5 paths which have the highest bottleneck bandwidths and lowest roundtrip time. Now request a relatively small piece of the file, from each of the chosen 5 servers, when the servers respond, we will be able to figure out how much time the server is really taking to deliver our data. The server that responds early, we can increase the next data size requested by 2 times. If a server responded worse than its last request, we will not increase the data size requested. The servers that are responding late we can continue to request same amount of data and continue to watch if there is an improvement in the response. If there is an improvement, we can increase the size requested. The increase in size requested helps us to compensate for the time spent in negotiations. This way we can utilize the path more that has given us better results, which makes best use of the current condition of the network to the best use.

Reference Parallel Aceess for mirror Sites in the Internet – Pablo Rodrigues, Andreas Kirpal, Ernst Biersack