Florida Tier2 Site Report USCMS Tier2 Workshop Fermilab, Batavia, IL March 8, 2010 Presented by Yu Fu For the Florida CMS Tier2 Team: Paul Avery, Dimitri.

Slides:

Advertisements

Similar presentations

Storage Workshop Summary Wahid Bhimji University Of Edinburgh On behalf all of the participants…

Advertisements

CHEPREO Tier-3 Center Achievements. FIU Tier-3 Center Tier-3 Centers in the CMS computing model –Primarily employed in support of local CMS physics community.

DOSAR Workshop VI April 17, 2008 Louisiana Tech Site Report Michael Bryant Louisiana Tech University.

Southwest Tier 2 Center Status Report U.S. ATLAS Tier 2 Workshop - Harvard Mark Sosebee for the SWT2 Center August 17, 2006.

Duke Atlas Tier 3 Site Doug Benjamin (Duke University)

Scale-out Central Store. Conventional Storage Verses Scale Out Clustered Storage Conventional Storage Scale Out Clustered Storage Faster……………………………………………….

Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.

Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team.

Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.

March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.

SouthGrid Status Pete Gronbech: 4 th September 2008 GridPP 21 Swansea.

1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.

Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.

Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.

Computing/Tier 3 Status at Panjab S. Gautam, V. Bhatnagar India-CMS Meeting, Sept 27-28, 2007 Delhi University, Delhi Centre of Advanced Study in Physics,

Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

OSG Site Provide one or more of the following capabilities: – access to local computational resources using a batch queue – interactive access to local.

UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.

Preparation of KIPT (Kharkov) computing facilities for CMS data analysis L. Levchuk Kharkov Institute of Physics and Technology (KIPT), Kharkov, Ukraine.

David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\

INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.

Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.

Tier1 Status Report Martin Bly RAL 27,28 April 2005.

INDIACMS-TIFR Tier 2 Grid Status Report I IndiaCMS Meeting, April 05-06, 2007.

Wide Area Network Access to CMS Data Using the Lustre Filesystem J. L. Rodriguez †, P. Avery*, T. Brody †, D. Bourilkov *, Y.Fu *, B. Kim *, C. Prescott.

Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.

03/03/09USCMS T2 Workshop1 Future of storage: Lustre Dimitri Bourilkov, Yu Fu, Bockjoo Kim, Craig Prescott, Jorge L. Rodiguez, Yujun Wu.

PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.

São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.

UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.

UMD TIER-3 EXPERIENCES Malina Kirn October 23, 2008 UMD T3 experiences 1.

ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.

Purdue Tier-2 Site Report US CMS Tier-2 Workshop 2010 Preston Smith Purdue University

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Hadoop on HEPiX storage test bed at FZK Artem Trunov.

ATLAS Great Lakes Tier-2 (AGL-Tier2) Shawn McKee (for the AGL Tier2) University of Michigan US ATLAS Tier-2 Meeting at Harvard Boston, MA, August 17 th,

KOLKATA Grid Site Name :- IN-DAE-VECC-02Monalisa Name:- Kolkata-Cream VO :- ALICECity:- KOLKATACountry :- INDIA Shown many data transfers.

Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012.

1 Cluster Development at Fermilab Don Holmgren All-Hands Meeting Jefferson Lab June 1-2, 2005.

IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.

December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.

Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.

US-CMS T2 Centers US-CMS Tier 2 Report Patricia McBride Fermilab GDB Meeting August 31, 2007 Triumf - Vancouver.

US ATLAS Western Tier 2 Status Report Wei Yang Nov. 30, 2007 US ATLAS Tier 2 and Tier 3 workshop at SLAC.

ATLAS Midwest Tier2 University of Chicago Indiana University Rob Gardner Computation and Enrico Fermi Institutes University of Chicago WLCG Collaboration.

RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,

Florida Tier2 Site Report USCMS Tier2 Workshop Livingston, LA March 3, 2009 Presented by Yu Fu for the University of Florida Tier2 Team (Paul Avery, Bourilkov.

Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.

Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE1 University of Wisconsin-Madison CMS Tier-2 Site Report D. Bradley, S. Dasu, A. Mohapatra, T. Sarangi, C. Vuosalo.

INRNE's participation in LCG Elena Puncheva Preslav Konstantinov IT Department.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.

Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.

9/22/10 OSG Storage Forum 1 CMS Florida T2 Storage Status Bockjoo Kim for the CMS Florida T2.

The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.

Virtual Server Server Self Service Center (S3C) JI July.

Storage at SMU OSG Storage 9/22/2010 Justin Ross Southern Methodist University.

Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.

Atlas Tier 3 Overview Doug Benjamin Duke University.

STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,

Jefferson Lab Site Report Sandy Philpott HEPiX Fall 07 Genome Sequencing Center Washington University at St. Louis.

GPCF* Update Present status as a series of questions / answers related to decisions made / yet to be made * General Physics Computing Facility (GPCF) is.

S. Pardi Computing R&D Workshop Ferrara 2011 – 4 – 7 July SuperB R&D on going on storage and data access R&D Storage Silvio Pardi

BeStMan/DFS support in VDT OSG Site Administrators workshop Indianapolis August Tanya Levshina Fermilab.

Experience of Lustre at QMUL

Southwest Tier 2 Center Status Report

Experience of Lustre at a Tier-2 site

Composition and Operation of a Tier-3 cluster on the Open Science Grid

Presentation transcript:

Florida Tier2 Site Report USCMS Tier2 Workshop Fermilab, Batavia, IL March 8, 2010 Presented by Yu Fu For the Florida CMS Tier2 Team: Paul Avery, Dimitri Bourilkov, Yu Fu, Bockjoo Kim, Yujun Wu

Yu Fu, University of Florida USCMS Tier2 Workshop, Batavia, IL 2 3/8/2010 Hardware Status Computational Hardware: –UFlorida-PG (dedicated) 126 worker nodes, 504 cores (slots) 84 * dual dual-core Opteron GHz + 42 * dual dual-core Opteron GHz, 6GB RAM, 2x250 (500) GB HDD 1GbE, all public IP, direct outbound traffic 3760 HS06, 1.5 GB/slot RAM.

Yu Fu, University of Florida USCMS Tier2 Workshop, Batavia, IL 3 3/8/2010 Hardware Status –UFlorida-HPC (shared) 642 worker nodes, 3184 cores (slots) 112 * dual quad-core Xeon E GHz + 16 * dual quad- core Xeon E GHz + 32 * dual hex-core Opteron GHz * dual dual-core Opteron GHz + 76 * single-core Opteron GHz Infiniband or 1GbE, private IP, outbound traffic via 2Gbps NAT HS06, 4GB/slot, 2GB/slot or 1GB/slot RAM. Managed by UF HPC Center, Tier2 invested partially in three phases. Tier2’s official quota is 845 slots, actual usage: typically ~1000 slots (~30% total slots). Tier2’s official guaranteed HS06: 6895 (normalized 845 slots).

Yu Fu, University of Florida USCMS Tier2 Workshop, Batavia, IL 4 3/8/2010 Typical UFlorida-HPC cluster usage (30-day average)

Yu Fu, University of Florida USCMS Tier2 Workshop, Batavia, IL 5 3/8/2010 Hardware Status –Interactive analysis cluster for CMS 5 nodes + 1 NIS server + 1 NFS server 1 * dual quad-core Xeon E * dual dual-core Opteron GHz, 2GB RAM/core, 18 TB total disk. Roadmap and plans for CE’s: –Total Florida CMS Tier2 official computing power (Grid only, not including the interactive analysis cluster): HEP-SPEC06, 1349 batch slots. –Have exceeded the 2010 milestone of 7760 HS06. –Considering new computing nodes in FY10 to enhance the CE as the worker nodes are aging and dying (many are already 5 years old). –AMD 12-core processors in testing at UF HPC, may be a candidate for the new purchase.

Yu Fu, University of Florida USCMS Tier2 Workshop, Batavia, IL 6 3/8/2010 Hardware Status Storage Hardware: –User space data RAID: gatoraid1, gatoraid2, storing CMS software, $DATA, $APP, etc. 3ware controller based RAID5, mounted via NFS. –Resilient dCache: 2 x 250 (500) GB SATA drives on each worker node. –Non-resilient RAID dCache: FibreChannel RAID5 (pool03, pool04, pool05, pool06) + 3ware-based SATA RAID5 (pool01, pool02), with 10GbE or bonded multiple 1GbE network. –Lustre storage, our main storage system now: Areca-controller-based RAID5 (dedicated) + RAID Inc Falcon III FibreChannel RAID5 (shared), with InfiniBand. –2 dedicated production GridFTP doors: 2x10Gbps, dual quad-core processors, 32 GB memory. –20 test GridFTP doors on worker nodes: 20x1Gbps –Dedicated 2*PhEDEx servers, 2*SRM servers, 2*PNFS servers, dCacheAdmin server, dCap server, dCacheAdminDoor server.

Yu Fu, University of Florida USCMS Tier2 Workshop, Batavia, IL 7 3/8/2010 Hardware Status SE Structure: –Co-existing of dCache and Lustre + BestMan + GridFTP currently, migrating to Lustre. Motivation of choosing dedicated RAID’s: –Our past experience shows resilient dCache pools on worker nodes relatively often went down or lost data due to CPU load and memory usage as well as hard drive etc hardware issues. –Therefore we want to separate CE and SE in hardware so job load on CE worker nodes will not affect storage. –High-performance-oriented RAID hardware is supposed to perform better. –Dedicated, specially designed RAID hardware is expected to be more robust and reliable. –Hardware RAID will not need full replicated copies of data like in resilient dCache and Hadoop, more efficient in disk usage. –Fewer equipments are involved in dedicated large RAID’s, easier to manage and maintain.

Yu Fu, University of Florida USCMS Tier2 Workshop, Batavia, IL 8 3/8/2010 Hardware Status Motivation of choosing Lustre: –Best in comparison with various available parallel filesystems according to UF HPC’s tests. –Widely used, proven performance, reliability and scalability. –Relatively long history. Mature and stable. –Existing local experience and expertise at UF Tier2 and UF HPC. –Excellent support from the Lustre community, SUN/Oracle and UF HPC experts, frequent updates, prompt patch for new kernels. Hardware selection: –With proper carefully chosen hardware, RAID’s can be inexpensive yet with excellent performance and reasonably high reliability. –Areca ARC-1680ix-16 PCIe SAS RAID controller with 4GB memory, each controller supports 16 drives. –2TB enterprise-class SATA drives, most cost effective at the time. –iStarUSA drive chassis connected to I/O server through extension PCIe cable: one I/O server can drive multiple chassis, more cost effective. –Cost: <$300/TB raw space, ~$400/TB net usable space with configuration of 4+1 RAID5’s and a global hot spare drive for every 3 RAID5’s in a chassis.

Yu Fu, University of Florida USCMS Tier2 Workshop, Batavia, IL 9 3/8/2010 Hardware Status Lustre + BestMan + GridFTP

Yu Fu, University of Florida USCMS Tier2 Workshop, Batavia, IL 10 3/8/2010 Hardware Status UF Tier2 Lustre Structure

Yu Fu, University of Florida USCMS Tier2 Workshop, Batavia, IL 11 3/8/2010 Hardware Status UF Tier2 Luster OSS/OST

Yu Fu, University of Florida USCMS Tier2 Workshop, Batavia, IL 12 3/8/2010 Hardware Status Total SE Capacity: (Disks on the UFlorida-HPC worker nodes are not counted since they are not deployed in dCache system or the Lustre system.) ResourceRawUsable (after redundancy, overhead and/or resilient) dCache non-resilient pools (RAID’s)181.2TB144TB dCache resilient pools (worker nodes)93.0TB35.6TB Dedicated Tier2 production Lustre320TB215TB Shared HPC Lustre (Tier2’s share)N/A30TB Total594.2TB424.6TB

Yu Fu, University of Florida USCMS Tier2 Workshop, Batavia, IL 13 3/8/2010 Hardware Status Roadmap and plans for SE’s: –Total raw ~600 TB, net usable 425 TB. –There is still a little gap to meet the 570 TB 2010 milestone. –Planning to deploy more new RAID’s (similar to current OSS/OST configuration) in the production Lustre system in FY10 to meet the milestone. –Satisfied with Lustre+Bestman+GridFTP and will stay with it. –Will put future SE’s into Lustre. –Present non-resilient dCache pools (RAID’s) will gradually fade out and migrate to Lustre. Resilient dCache pools on worker nodes will be kept.

Yu Fu, University of Florida USCMS Tier2 Workshop, Batavia, IL 14 3/8/2010 Software Status Most systems running 64-bit SLC5/CentOS5. OSG on UFlorida-PG and interactive analysis clusters, OSG on UFlorida-HPC cluster. Batching system: Condor on UFlorida-PG and Torque/PBS on UFlorida-HPC. dCache PhEDEx BestMan SRM 2.2 Squid 4.0 GUMS …… All Tier2 resources managed with a 64-bit customized ROCKS 5, all rpm’s and kernels are upgraded to current SLC5 versions.

Yu Fu, University of Florida USCMS Tier2 Workshop, Batavia, IL 15 3/8/2010 Network Status Tier2 Cisco 6509 switch –All 9 slots populated –2 blades of 4 x 10 GigE ports each –6 blades of 48 x 1 GigE ports each 20 Gbps uplink to the campus research network. 20 Gbps to UF HPC. 10 Gbps to UltraLight via FLR and NLR. Running perfSONAR. GbE for worker nodes and most light servers, optical/copper 10GbE for servers with heavy traffics. Florida Tier2’s own domain and DNS. All Tier2 nodes including worker nodes on public IP, directly connected to outside world without NAT. UFlorida-HPC worker nodes and Lustre on InfiniBand. UFlorida-HPC worker nodes on private IP with a 2Gbps NAT for communication with outside world.

Yu Fu, University of Florida USCMS Tier2 Workshop, Batavia, IL 16 3/8/2010 Analysis Operations Provide local CMS analysis users with interactive analysis cluster and direct access to the batch system of both UFlorida-HPC (PBS torque) and UFlorida-PG (condor flocking) clusters. Associated with Muon, SUSY, Particle Flow, and JetMet analysis groups. Able to afford Higgs group now, where one of the conveners was asking us to include Higgs datasets, as we have obtained new storage space. GroupUsed Space (TB) AnalysisOps21.11 DataOps1.44 Higgs2.70 Local0.72 Muon11.91 SUSY42.99 Tau-pflow5.11 Undefined32.76 Total118.76

Yu Fu, University of Florida USCMS Tier2 Workshop, Batavia, IL 17 3/8/2010 Analysis Operations Experience with associated analysis groups –Muon Mainly interacted with Alessandra Fanfani to arrange write access to the dCache and the fair share configuration during the OctX Dataset request. Local users also requested large portion of muon group datasets for charge ratio analysis. –SUSY In 2009, we provided the space for the private SUSY production. Large number of datasets that are produced by the local SUSY group are still in the analysis 2 DBS. Recently Jim Smith from Colorado is managing dataset requests for the SUSY group and we approved one request. –Particle Flow We were not able to help much with this group because we did not have enough space at the time because most space was occupied by the Muon group. –JetMet Not much interaction, only with the validation datasets were hosted for this group. Recently one of our postdocs (Dayong Wang) is interacting with us to coordinate the necessary dataset hosting. –Higgs Andrey Korytov was asking if we could afford Higgs dataset before our recent Lustre storage was added. We could not afford Higgs group datasets at the time. Now we can do it easily.

Yu Fu, University of Florida USCMS Tier2 Workshop, Batavia, IL 18 3/8/2010 Summary Already exceeded the 7760-HS CE milestone, considering to enhance the CE and replace old dead nodes. Near the 570TB 2010 SE milestone, planning more SE to fulfill this year’s requirement. Established a production Lustre+BestMan+GridFTP storage system. Satisfied with Lustre, will put dCache non-resilient pools and future storage in Lustre. Associated with 4 analysis groups and can take more groups as storage space increases.