T3 data access via BitTorrent Charles G Waldman USATLAS/University of Chicago USATLAS T2/T3 Workshop Aug 19 2009.

Slides:



Advertisements
Similar presentations
Welcome to Middleware Joseph Amrithraj
Advertisements

The BitTorrent Protocol. What is BitTorrent?  Efficient content distribution system using file swarming. Does not perform all the functions of a typical.
Bit Torrent (Nick Feamster) February 25, BitTorrent Steps for publishing – Peer creates.torrent file and uploads to a web server: contains metadata.
End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 6 2/13/2015.
Technical Architectures
Presented by Stephen Kozy. Presentation Outline Definition and explanation Comparison and Examples Advantages and Disadvantages Illegal and Legal uses.
Lesson 20 – OTHER WINDOWS 2000 SERVER SERVICES. DHCP server DNS RAS and RRAS Internet Information Server Cluster services Windows terminal services OVERVIEW.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
Introduction to client/server architecture
Middleware for P2P architecture Jikai Yin, Shuai Zhang, Ziwen Zhang.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Application Layer Functionality and Protocols Network Fundamentals – Chapter.
1 Chapter Overview Introduction to Windows XP Professional Printing Setting Up Network Printers Connecting to Network Printers Configuring Network Printers.
COMPUTER TERMS PART 1. COOKIE A cookie is a small amount of data generated by a website and saved by your web browser. Its purpose is to remember information.
Windows 2008 Overview Lecture 1. Windows Networking Evolution Windows for Workgroups – peer-to-peer networking built into the OS Windows NT – separate.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 7 Configuring File Services in Windows Server 2008.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Hands-On Microsoft Windows Server 2008 Chapter 8 Managing Windows Server 2008 Network Services.
Name Title Microsoft Windows Azure: Migrating Web Applications.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Application Layer Functionality and Protocols Network Fundamentals – Chapter 3.
The Bittorrent Protocol
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
BitTorrent Presentation by: NANO Surmi Chatterjee Nagakalyani Padakanti Sajitha Iqbal Reetu Sinha Fatemeh Marashi.
BitTorrent Internet Technologies and Applications.
14 Publishing a Web Site Section 14.1 Identify the technical needs of a Web server Evaluate Web hosts Compare and contrast internal and external Web hosting.
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
Introduction to Threads CS240 Programming in C. Introduction to Threads A thread is a path execution By default, a C/C++ program has one thread called.
Do incentives build robustness in BitTorrent? Michael Piatek, Tomas Isdal, Thomas Anderson, Arvind Krishnamurthy, Arun Venkataramani.
Bit Torrent A good or a bad?. Common methods of transferring files in the internet: Client-Server Model Peer-to-Peer Network.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Bittorrent Protocol Implementation. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It has.
FailSafe SGI’s High Availability Solution Mayank Vasa MTS, Linux FailSafe Gatekeeper
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Module 3 Planning and Deploying Mailbox Services.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
B IT T ORRENT T ECHNOLOGY Anthony Pervetich. H ISTORY Bram Cohen Designed the BitTorrent protocol in April 2001 Released July 2, 2001 Concept Late 90’s.
Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012.
CCNA4 v3 Module 6 v3 CCNA 4 Module 6 JEOPARDY K. Martin.
UNIT-3 Performance Evaluation UNIT-3 IT2031. Web Server Hardware and Performance Evaluation Key question is whether a company should host their own Web.
Threads. Readings r Silberschatz et al : Chapter 4.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.

Bit Torrent Nirav A. Vasa. Topics What is BitTorrent? Related Terms How BitTorrent works Steps involved in the working Advantages and Disadvantages.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
Distributed Data Management Miguel Branco 1 DQ2 discussion on future features BNL workshop October 4, 2007.
File Transfer And Access (FTP, TFTP, NFS). Remote File Access, Transfer and Storage Networks For different goals variety of approaches to remote file.
End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.
Multimedia Retrieval Architecture Electrical Communication Engineering, Indian Institute of Science, Bangalore – , India Multimedia Retrieval Architecture.
CernVM-FS Infrastructure for EGI VOs Catalin Condurache - STFC RAL Tier1 EGI Webinar, 5 September 2013.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
IPEmotion License Management PM (V1.2).
Presented by Deepak Varghese Reg No: Introduction Application S/W for server load balancing Many client requests make server congestion Distribute.
Windows 2008 Overview Lecture 1.
An example of peer-to-peer application
Torrent-based software distribution
Torrent-based software distribution
Conditions Data access using FroNTier Squid cache Server
Introduction to Networks
Introduction to client/server architecture
An Introduction to Computer Networking
Information Technology Ms. Abeer Helwa
Distributed computing deals with hardware
The BitTorrent Protocol
PEER-TO-PEER SYSTEMS.
Client/Server and Peer to Peer
Presentation transcript:

T3 data access via BitTorrent Charles G Waldman USATLAS/University of Chicago USATLAS T2/T3 Workshop Aug

Introduction BitTorrent is an interesting protocol which is used to move large amounts of data across the Internet Can we use the p2p features of BT to: – Improve performance of downloads to T3? – Improve reliability by accessing multiple T2s for a single file? – Reduce effects of transient SRM and network loads at the T2s? – Easily share data among T3s?

BitTorrent protocol Metadata file ('.torrent') contains list of files, sizes and block checksums, and a “tracker” address. Client contacts tracker and is connected with one or more peers. Tracker does not move data. As soon as any blocks are downloaded and checked, client shares with other peers. Distinction between client and server blurred. A client which has all blocks is a 'seed'. Works well for large files (ISOs, movies, etc). Failed downloads can be resumed. Spec:

Client software Very little to install/configure on client side.torrent file fetched using Web browser, curl, etc Many clients available, GUI (nice monitoring) and command-line (scriptable), interfaces for Python, Java, C++ ctorrent: CLI client with nice features: select individual files for download, remote-control, daemon mode

Possible advantages Less to install/configure, site does not need to be DQ2 endpoint Reduced load at T2s, since load is spread across multiple sites, and T3's can share data Resilient: Servers or sites can go down and transfers will continue All data integrity-checked (sha1) Possibly faster, depending on # seeds, CPU, network and disk speed

Disadvantages Uses more CPU Does not take advantage of multi-core hosts Ports may be blocked

Differences from common BitTorrent scenario Fewer peers / more files – cannot assume that data will be seeded Faster networking Multi-core/clustered hosts

Implementation Modified tracker: instead of returning empty list of peers, it requests seeds. Daemon process (bt-server) runs at T2 sites, listens for messages and starts seeds. Dynamic creation of metadata (about 4 minutes for 25GB dataset, so good to create these in advance) use LD_PRELOAD to read directly from storage (/pnfs).

Implementation, cont'd Server creates directory populated with symlinks, so that: – Dataset looks like single directory – Filename = LFN (no suffixes) – Optimization: symlink to file on local disk pool when possible

Example DPD_IDCOMM _00001.pool.root.1 -> /pnfs/uchicago.edu/atlasdatadisk/data08_cos/DPD_IDCOMM/data08_cos physics_IDCosmic.merge.DPD_IDCOM M.o4_r653_p26_tid051203/DPD_IDCOMM _00001.pool.root.1 DPD_IDCOMM _00002.pool.root.1 -> /pnfs/uchicago.edu/atlasdatadisk/data08_cos/DPD_IDCOMM/data08_cos physics_IDCosmic.merge.DPD_IDCOM M.o4_r653_p26_tid051203/DPD_IDCOMM _00002.pool.root.1 DPD_IDCOMM _00003.pool.root.1 -> /pnfs/uchicago.edu/atlasdatadisk/data08_cos/DPD_IDCOMM/data08_cos physics_IDCosmic.merge.DPD_IDCOM M.o4_r653_p26_tid051203/DPD_IDCOMM _00003.pool.root.1 DPD_IDCOMM _00004.pool.root.1 -> /pnfs/uchicago.edu/atlasdatadisk/data08_cos/DPD_IDCOMM/data08_cos physics_IDCosmic.merge.DPD_IDCOM M.o4_r653_p26_tid051203/DPD_IDCOMM _00004.pool.root.1 DPD_IDCOMM _00005.pool.root.1 -> /dcache/pool5/data/ F53C0 DPD_IDCOMM _00006.pool.root.1 -> /dcache/pool3/data/ D5970 DPD_IDCOMM _00007.pool.root.1 -> /dcache/pool6/data/ D5A18 DPD_IDCOMM _00008.pool.root.1 -> /pnfs/uchicago.edu/atlasdatadisk/data08_cos/DPD_IDCOMM/data08_cos physics_IDCosmic.merge.DPD_IDCOM M.o4_r653_p26_tid051203/DPD_IDCOMM _00008.pool.root.1 DPD_IDCOMM _00009.pool.root.1 -> /dcache/pool1/data/ F5308 DPD_IDCOMM _00010.pool.root.1 -> /dcache/pool5/data/ F5680 DPD_IDCOMM _00011.pool.root.1 -> /pnfs/uchicago.edu/atlasdatadisk/data08_cos/DPD_IDCOMM/data08_cos physics_IDCosmic.merge.DPD_IDCOM M.o4_r653_p26_tid051203/DPD_IDCOMM _00011.pool.root.1

Tests Test dataset: 25 GB, 29 files 3 seeds running on uct2-s[1-3] (Dell 2950 storage servers, 8 core, 32GB, 10Gb NIC) Fetch to uct3: – dq2 takes 22 minutes (19MB/sec) – bittorrent takes 29 (15 MB/sec) Dq2 is faster, but: – It's using more cores (2 lcg-cp's active) – It's not validating data (other than file size)

More tests at uct3 How to get multiple transfers running on same host? Fetch to 4 WNs, each receiving ¼ of the files – Finished in 3,5,5 and 6 minutes – Overall rate: 70MB/sec (using longest time)

Tests at IllinoisHEP (UIUC) Fetch to single host: – dq2-get: 22 minutes (19MB/sec) – Bittorrent: 26 minutes (16MB/sec) – Disk write speed: ~33MB/sec 4-host test – Finished in 7,9,15,18 minutes (24MB/sec)

Next steps Deployment and more testing of bt-server Repository for.torrent files, browse/search for datasets Investigate slowdowns, bottlenecks, tuning Multi-threaded transfers on single host Splitting of transfers across cluster Single command 'bt-get' that gets.torrent file and starts client(s)