Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.

Slides:

Advertisements

Similar presentations

Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki

Advertisements

Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.

Copy on Demand with Internal Xrootd Federation Wei Yang SLAC National Accelerator Laboratory Create Federated Data Stores for the LHC IN2P3-CC,

ATLAS computing in Geneva Szymon Gadomski, NDGF meeting, September 2009 S. Gadomski, ”ATLAS computing in Geneva", NDGF, Sept 091 the Geneva ATLAS Tier-3.

1 TCP-LP: A Distributed Algorithm for Low Priority Data Transfer Aleksandar Kuzmanovic, Edward W. Knightly Department of Electrical and Computer Engineering.

Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS Computing Integration.

ALICE DATA ACCESS MODEL Outline ALICE data access model - PtP Network Workshop 2  ALICE data model  Some figures.

Site report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2014/12/10Tomoaki Nakamura1.

Experiences in Design and Implementation of a High Performance Transport Protocol Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data.

ALICE data access WLCG data WG revival 4 October 2013.

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.

Overcast: Reliable Multicasting with an Overlay Network CS294 Paul Burstein 9/15/2003.

Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.

Multi-Tiered Storage with Xrootd at ATLAS Western Tier 2 Andrew Hanushevsky Wei Yang SLAC National Accelerator Laboratory 1CHEP2012, New York

Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,

BNL Facility Status and Service Challenge 3 Zhenping Liu, Razvan Popescu, Xin Zhao and Dantong Yu USATLAS Computing Facility Brookhaven National Lab.

USATLAS Network/Storage and Load Testing Jay Packard Dantong Yu Brookhaven National Lab.

GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh

Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.

SLAC Experience on Bestman and Xrootd Storage Wei Yang Alex Sim US ATLAS Tier2/Tier3 meeting at Univ. of Chicago Aug 19-20,

11-July-2008Fabrizio Furano - Data access and Storage: new directions1.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.

BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.

DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.

Status & Plan of the Xrootd Federation Wei Yang 13/19/12 US ATLAS Computing Facility Meeting at 2012 OSG AHM, University of Nebraska, Lincoln.

Testing the UK Tier 2 Data Storage and Transfer Infrastructure C. Brew (RAL) Y. Coppens (Birmingham), G. Cowen (Edinburgh) & J. Ferguson (Glasgow) 9-13.

Optimisation of Grid Enabled Storage at Small Sites Jamie K. Ferguson University of Glasgow – Jamie K. Ferguson – University.

ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.

Efi.uchicago.edu ci.uchicago.edu Using FAX to test intra-US links Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computing Integration.

BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing.

Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.

Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012.

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1.

SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.

Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.

BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.

US ATLAS Western Tier 2 Status Report Wei Yang Nov. 30, 2007 US ATLAS Tier 2 and Tier 3 workshop at SLAC.

FAX PERFORMANCE TIM, Tokyo May PERFORMANCE TIM, TOKYO, MAY 2013ILIJA VUKOTIC 2  Metrics  Data Coverage  Number of users.

David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.

ALICE DATA ACCESS MODEL Outline 05/13/2014 ALICE Data Access Model 2  ALICE data access model  Infrastructure and SE monitoring.

Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.

PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.

Efi.uchicago.edu ci.uchicago.edu Ramping up FAX and WAN direct access Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.

Stephen Gowdy FNAL 9th Feb 2015CMS Computing Model Simulation 1.

Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.

Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

Introduce Caching Technologies using Xrootd Wei Yang 10/28/14ATLAS TIM 2014 Univ. Chicago1.

Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.

An Analysis of AIMD Algorithm with Decreasing Increases Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data Mining.

Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.

An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)

VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -

New Features of Xrootd SE Wei Yang US ATLAS Tier 2/Tier 3 meeting, University of Texas, Arlington,

SLACFederated Storage Workshop Summary Andrew Hanushevsky SLAC National Accelerator Laboratory April 10-11, 2014 SLAC.

Data Distribution Performance Hironori Ito Brookhaven National Laboratory.

G. Russo, D. Del Prete, S. Pardi Frascati, 2011 april 4th-7th The Naples' testbed for the SuperB computing model: first tests G. Russo, D. Del Prete, S.

Efi.uchicago.edu ci.uchicago.edu Sharing Network Resources Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago Federated Storage.

a brief summary for users

ALICE internal and external network

ALICE Monitoring

Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.

Storage elements discovery

Brookhaven National Laboratory Storage service Group Hironori Ito

Ákos Frohner EGEE'08 September 2008

Presentation transcript:

Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object is to bring up discussion on WAN related issue when using FAX infrastructure Also see URL for much broader SW options when running over WAN =slides&confId= ATLAS TIM meeting, Tokyo1

FAX and WAN Requirement and Configuration Wei Yang ATLAS TIM meeting, Tokyo 2

Diff: FAX and GridFTP WAN Usage Difference: Job driven: no 0-access files On demand, un-managed, no flow control – No help from FTS, demand go up and down quickly  Q: will file level caching at server site and client site help? – Working dataset is very large. Need large cache, or very smart caching algorithm NOT to replace data pre-placement  Q: can we stop GridFTP data distribution and use FAX to pull – haven’t given it a thought ATLAS TIM meeting, Tokyo3

SSD Box net IO (1-day) Dell R610 2x 10Gbps, 12TB 1032 xrootd client Data 413 files / 2.7TB 19.3K IOPS / 12 SSDs No. of analysis jobs ATLAS TIM meeting, Tokyo4 GridFTP net IO (1-day): Sum of In and Out SLAC SLAC SE net IO (1-day) (including IO with SSD box)

Tuning Network Parameters for FAX GridFTPs are dedicated for WAN transfer – Buffer size, etc. well tuned for high latency WAN – Large IO and transfer block size – Multiple TCP streams With FAX, batch nodes won’t be dedicated for WAN transfer – Auto tuning on batch nodes are required  Q: Any LAN related tuning? – Small IO block size, single TCP stream for remote direct IO – Good read ahead algorithm is important, otherwise: Assuming average 8KB read size: 0.2ms (LAN) : < 8 * 1/0.2ms = 40MB/s 20ms (WAN) : < 8 * 1/20ms = 0.4MB/s (extreme case) About RTT In general, RTT is proportional to geographic distance, but not always: – SLAC -> AGLT2, BU, MWT2 ~ 80ms – SLAC -> BNL ~70ms ATLAS TIM meeting, Tokyo5

Site Models Expose SE to remote client Data fly between client and storage data server, better performance requires N2N and GSI handled by storage element – GSI needs to be implemented to prevent circumvention Example: DPM xrootd door; dCache xrootd door at AGLT2 and MWT2 Likely need helper (xrootd/cmsd pairs) to join FAX Proxy between SE and remote client Proxy is easy to setup – N2N and GSI handled by Proxy (more N2N optimization work to do) – It may limit the performance, but also be a natural throttling mechanism – Expandable with Proxy cluster (BNL, SLAC). Proxy is useful if data servers are behind firewall Reverse proxy is useful if batch nodes have no outbound TCP connection – What is a reverse proxy – Large RTT between reverse proxy and FAX sites, So readv() passage is important (Xrootd release 4.x) ATLAS TIM meeting, Tokyo6

All FAX sites need to prepare SE for some direct (remote) IO Small IO block, random/sparse Large # of open files hanging around, etc. Adequate WAN bandwidth Operational experience ATLAS TIM meeting, Tokyo7 Example FAX proxy traffic at SLAC By test jobs.

Above the Site Level Improve Redirection Algorithm Cost awareness redirection? There are Pros and Cons – Cost matrix measure the past, average but not current cost Cost may includes RTT, SE/net performance, bandwidth/congestions, etc. – LAMBDA project propose research on real time identification of most capable sites, avoid bad choice Efficiency of the Redirection Network: – Topology, Parameters, Algorithm, real-time info Optimal data pre-placement Need to consider site storage and network capacity/performance Nearby sties’ CPU and network capacity Cost to nearby sites Spread files in a (hot) dataset to multiple sites? DDM nightmare ATLAS TIM meeting, Tokyo8

WAN Load by FAX Rough estimation on how much load FAX will put on WAN Start with SLAC’s internal network load: RTT ~ 0.2ms Think of this as the demand from jobs Q: what if RTT = 2ms or 20ms? -- for all are remote jobs, 50% remote jobs, etc. – E.g. 40% 0.2ms + 30% 10ms + 20% 20ms + 5% 40ms + 5% 80ms ATLAS TIM meeting, Tokyo9 Each bin is a day (daily avg.) On average days, 1-1.5GB/s Much large than GridFTP IOs But there are many >2GB/s days SLAC SE net IO 3-month

What can FAX offer to compensate the WAN cost? More slots to reduce waiting/pending time In theory FAX reduce job failure rate caused by missing files Caching mechanism (TTree, etc) may help What about those >2GB/s bins (days) Are sites ready for this? Is data locality aware scheduling still necessary? and those 6GB/s bins? ATLAS TIM meeting, Tokyo10