Presentation is loading. Please wait.

Presentation is loading. Please wait.

High Performance Cooperative Data Distribution [J. Rick Ramstetter, Stephen Jenks] [A scalable, parallel file distribution model conceptually based on.

Similar presentations


Presentation on theme: "High Performance Cooperative Data Distribution [J. Rick Ramstetter, Stephen Jenks] [A scalable, parallel file distribution model conceptually based on."— Presentation transcript:

1 High Performance Cooperative Data Distribution [J. Rick Ramstetter, Stephen Jenks] [A scalable, parallel file distribution model conceptually based on the BitTorrent protocol] [jramstet@uci.edu] · www.research.calit2.net/students/surf-it2006 · www.calit2.net S ummer U ndergraduate 2 R esearch 0 F ellowship in 0 I nformation 6 T echnology Abstract: Current large file transfer approaches used in high performance cluster computing are point-to-point, normally based on GridFTP or Secure Copy (scp). If the file is to be disseminated to local disks on cluster nodes, either each transfer happens separately or the connection to the front-end becomes a bottleneck. In either case, the overall distribution time is proportional to the number of destinations times a single transfer time. With files in the tens becoming common, this time becomes quite long. Peer-to-peer filesharing offers a solution in the form of BitTorrent, which allows receivers to send data as well, thus multiplying the available bandwidth and decreasing download time of all participants. However, BitTorrent was designed for unreliable, shared, slow home Internet connections and does not perform as well in high-performance environments with gigabit or faster networks. Using concepts from the BitTorrent protocol, a high performance cooperative file distribution method has been developed in C++, allowing data transfer times to multiple destinations to come significantly closer to the data transfer time to a single destination. With continued work on the project, the transfer time to multiple destinations will continue to drop. BitTorrent protocol overview: BitTorrent is a filesharing protocol commonly used for the transfer of large files (such as images of CD-roms or DVDs) across the internet- a relatively slow, unreliable link. BitTorrent is unique in that it takes advantage of the upload speed of all end users downloading a file (called “peers”). This is accomplished by breaking a file into “chunk” and distributing these chunks to peers. Once a peer receives a full chunk of the file, it can begin sharing that chunk with other peers. To initiate a BitTorrent transfer, a (soon to be) peer contacts a tracker, which is a computer responsible for managing the peers. The tracker responds to the contacting peer with a list of other peers interested in the same file. The contacting peer contacts its newfound neighboring peers to see which peers have any needed file chunks. If a neighboring peer has such a file chunk, our peer is said to be “interested” in that peer, and a file transfer begins between the two. Normal file transfer: File is distributed to nodes one at a time Results: The following table illustrates transfer speed results as obtained through the cooperative file transfer method when syncing a 575Mb file between 4 peers, only one of which holds the file at the start. These results are displayed next to the results for a normal transfer from one node to the other three, and also the theoretical maximum speed obtainable by this project. They are bounded by factors including switch throughput and disk I/O. Method used:Time taken for 4 node synchronization to complete Cooperative distribution model – actual results 38 seconds This distribution model – theoretical maximum (same as a 2 node sync) 22 seconds Normal transfer (copy to one node at a time) 65 seconds Conclusion: Speeds faster than normal transfer from the starting node to the other nodes are possible with the cooperative data distribution method. As of now, the speeds achieved do not live up to the project’s original goals. This is most likely due to high overhead and poor programming, things which should be fixed with continued work on the project. Parallel Filesystem: Data is striped across multiple hosts Can result in network bottlenecks as no one host holds the entire file Cooperative Distribution All hosts utilize their upload bandwidth to send file “chunks to other hosts”


Download ppt "High Performance Cooperative Data Distribution [J. Rick Ramstetter, Stephen Jenks] [A scalable, parallel file distribution model conceptually based on."

Similar presentations


Ads by Google