WP2/WP7 Demonstration WP7 High Throughput Data transfers WP2/WP7 Replica Selection based on Network Cost Functions WP2 Replica Location Service.

WP2/WP7 Demonstration WP7 High Throughput Data transfers WP2/WP7 Replica Selection based on Network Cost Functions WP2 Replica Location Service Good afternoon Ladies and Gentlemen. My name is Gabriel Zaquine from WP12 (project Management) where I have the role of Quality Engineer. I’m from CS-SI which is one of the industrial partners on the project. I’m going to present you the QUALITY ASSURANCE on the DataGrid Project.

High Throughput Data Transfers
Richard Hughes-Jones Jules Wolfrat Good afternoon Ladies and Gentlemen. My name is Gabriel Zaquine from WP12 (project Management) where I have the role of Quality Engineer. I’m from CS-SI which is one of the industrial partners on the project. I’m going to present you the QUALITY ASSURANCE on the DataGrid Project.

Demo Setup GEANT NIKHEF CERN
We will show data transfers from Mass Storage system at CERN to Mass Storage system at NIKHEF/SARA 2 systems at CERN, Geneva with datasets from experiment LHCb 4 Linux systems at NIKHEF/SARA, Amsterdam to where data has to be transferred; each with a disk sub-system I/O bandwidth of ~70 MB/s All systems have Gigabit Ethernet connectivity Use GridFTP and Measure disk to disk performance NIKHEF GEANT SurfNet CERN During the presentation I’ll describe the following items.

Demo Consists of: GridFTP GridFTP Raid0 Disk Raid0 Disk
Data over TCP Streams GridFTP GridFTP Raid0 Disk Raid0 Disk Node Monitoring Site Monitoring Dante Monitoring

European Topology: NRNs, Geant, Sites
Sara & NIKHEF SURFnet SuperJANET4 CERN

Some Measurements of Throughput CERN -SARA
Using the GÉANT Backup Link 1 GByte file transfers Standard TCP Average Throughput 167 Mbit/s Users see Mbit/s! High-Speed TCP Average Throughput 345 Mbit/s Scalable TCP Average Throughput 340 Mbit/s

WP7 High Throughput Achievements
Close Collaboration with Dante “Low” layer QOS testing over GEANT LBE IP premium iGrid 2002 and ER 2002 : UDP with LBE Network performances evaluation EU Review 2003 : application level transfer with real data between EDG sites proof of concept During the presentation I’ll describe the following items.

Conclusions More research on the TCP stacks and its implementation is needed Continued the collaboration with Dante to: Understand the behavior of GEANT backbone Learn the benefits of QoS deployment WP7 is taking the “Computer Science” research and knowledge of the TCP protocol & implementation and applying it to the network for real Grid users Enabling Knowledge Transfer to sysadmins and end users EDG release 1.4.x has configuration scripts for TCP parameters for SE and CE Firewalls rules recommendations Network tutorials for end users Work with users – focus on 1 or 2 sites to try to get improvements

WP2/WP7 Replica Selection based on Network Cost Functions
Franck Bonnassieux (WP7) Kurt Stockinger (WP2) Good afternoon Ladies and Gentlemen. My name is Gabriel Zaquine from WP12 (project Management) where I have the role of Quality Engineer. I’m from CS-SI which is one of the industrial partners on the project. I’m going to present you the QUALITY ASSURANCE on the DataGrid Project.

NetworkCost functionality
13,08 4,04 6,53 4,5 CNAF 7,08 6,24 10,38 5,03 IN2P3 2,66 11,86 3,25 11,13 NIKHEF 4,35 7,12 2,44 7,46 RAL 35,44 44,87 77,78 46,75 CERN CERN RAL NIKHEF IN2P3 CNAF CERN RAL NIKHEF IN2P3 CNAF getNetworkCost FileSize = 10 MB Results = time to transfer (sec.)

NetworkCost Architecture
Processing NetworkCost Collect And Storage R-GMA Globus MDS Archive Raw Distributed Data Collector The overall plan defined for the quality objectives is described here after : Measure PCP PingEr IPerf UDPmon GridFTP

NetworkCost model The current cost model is designed for data intensive computing and especially large files transfers The most relevant metric for that cost model is available throughput Implementation Iperf Measurements (current) GridFTP Logs (future) Other metrics (future) : UDP, RTT, Jitter, ... Synchronisation (PCP)

Replica Management Services
VO Membership Service Replica Management Services Replica Manager Client Optimization Information Service Replica Metadata The Replica Manager Client makes use of existing Replica Management Services, the Information Service, the File Transfer Service and the Replica Location Service to achieve its high-level data management functionality. It makes use of the Virtual Organization Membership Service (VOMS) for authentication and to acquire a grid proxy certificate. The relevant Replica Management Services (RMS) for this Demo are the Replica Optimization Service and the Replica Metadata Catalog. There are other subservices in the RMS that we don’t mention at this point. File Transfer: GridFTP … Replica Location Service RLS

Testbed Sites & Replica Manager Commands
edg-rm copyAndRegisterFile -l lfn:higgs CERN  LYON edg-rm listReplicas -l lfn:higgs edg-rm replicateFile -l lfn:higgs  NIKHEF edg-rm listBestFile -l lfn:higgs  CERN edg-rm getAccessCost -l lfn:higgs CERN NIKHEF LYON edg-rm getBestFile -l lfn:higgs  CERN Replica Management Demo Scenario A user produced a file at CERN that for some reason cannot be put into a CERN Grid Storage (say the site is down for maintenance). So the user will register the file at LYON instead. Therefore the first Grid instance of the file will be at LYON. The user uploads the file into the Grid at LYON using the copyAndRegisterFile command. The listReplicas command produces exactly one line of output – it lists the one file at LYON. The source instance at CERN is at a location that is not known to the Grid. Say the user wants to make this data also accessible at NIKHEF (because his collaborators work at NIKHEF). In order to achieve this, he issues the replicateFile command. Since the user is probably still at CERN, he is interested in which one of the two replicas is the best one for him to access from CERN. The listBestFile command will give him the answer. This command takes the requestor’s reference location as its argument, in this case CERN – but it could be any other site. The network costs can be viewed also simultaneously for many sites. The getAccessCost is designed to be used by the job scheduling service. It takes a list of destination sites (in our case three, CERN, NIKHEF and LYON) and computes the access cost for the best replica with respect to each destination. So this does the same as listBestFile for each site, but the output is more verbose: it also gives the (remote) access cost in seconds in addition to the name of the best file. If the file is local, the access cost will be zero. So for NIKHEF and LYON this cost will be zero, and for CERN the best file will be either at LYON or at NIKHEF depending on the current network metrics. Now the user has realized that the CERN site is actually available again for storage (the maintenance work was completed) so he wants to have a local copy of the file as well. By issuing getBestFile he will copy the best replica to the given destination (CERN in this case). Now there are three replicas, one each at CERN, NIKHEF and LYON. Because the file at LYON is actually never accessed, the user deletes it again using deleteFile. Now listBestFile of course will return the name of the local CERN copy. edg-rm deleteFile -l lfn:higgs  LYON edg-rm listBestFile -l lfn:higgs  CERN

WP2 Replica Location Service
Peter Kunszt WP2 – Data Management Peter Kunszt (from CERN) is the WP2 Manager. Kurt Stockinger (from CERN, EU-funded) is responsible for Optimization aspects of Data Access within WP2.

Replica Location Service RLS
Local Catalogs hold the actual name mappings Remote Indices redirect inquiries to LRCs actually having the file LRCs are configured to send index updates to any number of RLIs Indexes are Bloom Filters The Replica Location Service has two sub-service components: Local Replica Catalogs (LRC) reside at each site close to the actual data store, cataloging only the data that is at the site. So usually there will be one LRC per Storage Element. Replica Location Indices (RLI) are light-weight services that hold Bloom Filter index bitmaps of LRCs. The LRCs can be configured dynamically to send indices to any number of RLIs. The LRC computes a Bloom Filter bitmap that is compressed and sent to each subscribed RLI. Bloom filters are compact data structures for probabilistic representation of a set in order to support membership queries (i.e. queries that ask: “Is element X in set Y?”). This compact representation is the payoff for allowing a small rate of false positives in membership queries; that is, queries might incorrectly recognize an element as member of the set. The ratio of false positives can be optimized by tuning various parameters of the filter. Small false positive rates require more computation time and memory. Acceptable false positive rates are around Hence RLIs answer lookups by responding with the names of the LRCs that might have a copy of the given file. The LRCs also need to be contacted to get a definitive answer. This method is used in Peer-to-Peer networks successfully, up to a certain size of the network. In the EU DataGrid we don’t foresee more than a few dozen sites, at most O(100), for which this algorithm scales still very well.

RLS Demo at SC2002 The original demo setup at SuperComputing 2002 in Baltimore, USA, in November The demo has been run during 4 days with individual presentations to conference attendees.

RLS Demo Topology Today
CERN lxshare0344.cern.ch Glasgow grid03.ph.gla.ac.uk California dc-n4.isi.edu Melbourne wombat.unimelb.edu.au Replica Location Index The EU Review demo setup: One LRC and one RLI at each demo site. Each site has an RLI that receives Bloom Filter indices from all LRCs, so a local service can answer questions about the location of a given file with a high probability. CERN lxshare0342.cern.ch Glasgow grid01.ph.gla.ac.uk California dc-n2.isi.edu Melbourne koala.unimelb.edu.au Local Replica Catalog

SUMMARY Replica Optimization Replica Location Service
WP7 Network cost functions are integrated into the Replica Management functionality providing an essential functionality that was missing up to now. This gives us the necessary framework to start work on high-level optimization algorithms. Replica Location Service Scalable distributed catalog as a much-needed replacement for the current Replica Catalog. Addresses all issues brought up by the experiments. Tests have been conducted with very large catalogs The lookup time for an entry is independent of the number of catalog. Tested for up to 108 entries. The catalog withstands simultaneous user queries of over 1000 queries or inserts per second.

WP2/WP7 Demonstration WP7 High Throughput Data transfers WP2/WP7 Replica Selection based on Network Cost Functions WP2 Replica Location Service.

Similar presentations

Presentation on theme: "WP2/WP7 Demonstration WP7 High Throughput Data transfers WP2/WP7 Replica Selection based on Network Cost Functions WP2 Replica Location Service."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

WP2/WP7 Demonstration WP7 High Throughput Data transfers WP2/WP7 Replica Selection based on Network Cost Functions WP2 Replica Location Service.

Similar presentations

Presentation on theme: "WP2/WP7 Demonstration WP7 High Throughput Data transfers WP2/WP7 Replica Selection based on Network Cost Functions WP2 Replica Location Service."— Presentation transcript:

Similar presentations

About project

Feedback