Tier2 Site Report HEPiX Fall 2016

Slides:

Advertisements

Similar presentations

 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.

Advertisements

IDC HPC User Forum Conference Appro Product Update Anthony Kenisky, VP of Sales.

Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.

NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.

Marilyn T. Smith, Head, MIT Information Services & Technology DataSpace IS&T Data CenterMIT Optical Network 1.

© Hitachi Data Systems Corporation All rights reserved. 1 1 Det går pænt stærkt! Tony Franck Senior Solution Manager.

Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.

CMS Data Transfer Challenges LHCOPN-LHCONE meeting Michigan, Sept 15/16th, 2014 Azher Mughal Caltech.

Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.

Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.

1 Advanced Storage Technologies for High Performance Computing Sorin, Faibish EMC NAS Senior Technologist IDC HPC User Forum, April 14-16, Norfolk, VA.

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

SoCal Infrastructure OptIPuter Southern California Network Infrastructure Philip Papadopoulos OptIPuter Co-PI University of California, San Diego Program.

PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.

Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.

Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.

Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.

Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.

ANSE: Advanced Network Services for Experiments Institutes: –Caltech (PI: H. Newman, Co-PI: A. Barczyk) –University of Michigan (Co-PI: S. McKee) –Vanderbilt.

A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE1 University of Wisconsin-Madison CMS Tier-2 Site Report D. Bradley, S. Dasu, A. Mohapatra, T. Sarangi, C. Vuosalo.

10-Feb-00 CERN HepCCC Grid Initiative ATLAS meeting – 16 February 2000 Les Robertson CERN/IT.

RAL Site Report HEP SYSMAN June 2016 – RAL Gareth Smith, STFC-RAL With thanks to Martin Bly, STFC-RAL.

DATA ACCESS and DATA MANAGEMENT CHALLENGES in CMS.

Extreme Scale Infrastructure

Academia Sinica Grid Computing Centre (ASGC), Taiwan

Bob Jones EGEE Technical Director

Dynamic Extension of the INFN Tier-1 on external resources

Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.

Instructor Materials Chapter 1: LAN Design

Joint Genome Institute

Harvey Newman and Maria Spiropulu

Azher Mughal Caltech UCSD PRP Conference (2/21/2017)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING

Experience of Lustre at QMUL

Scalable sync-and-share service with dCache

Nexsan iSeries™ iSCSI and iSeries Topologies Name Brian Montgomery

Reducing Risk with Cloud Storage

Harvey Newman and Maria Spiropulu

“A Data Movement Service for the LHC”

WP18, High-speed data recording Krzysztof Wrona, European XFEL

Clouds , Grids and Clusters

Data Federation & Data Management for the CMS Experiment at the LHC

Module 2: DriveScale architecture and components

HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL

Yaodong CHENG Computing Center, IHEP, CAS 2016 Fall HEPiX Workshop

Western Analysis Facility

Experience of Lustre at a Tier-2 site

Joint Genome Institute

Welcome! Thank you for joining us. We’ll get started in a few minutes.

Thoughts on Computing Upgrade Activities

Regional Software Defined Science DMZ (SD-SDMZ)

SENSE: SDN for End-to-end Networked Science at the Exascale

AGLT2 Site Report Shawn McKee/University of Michigan

Introduction to Networks

Enabling High Speed Data Transfer in High Energy Physics

Azher Mughal Caltech UCSD PRP Conference (2/21/2017)

Real IBM C exam questions and answers

NSF cloud Chameleon: Phase 2 Networking

ExaO: Software Defined Data Distribution for Exascale Sciences

Dell Data Protection | Rapid Recovery: Simple, Quick, Configurable, and Affordable Cloud-Based Backup, Retention, and Archiving Powered by Microsoft Azure.

Hadoop Technopoints.

CLUSTER COMPUTING.

IBM Power Systems.

Cloud-Enabling Technology

Virtunet Systems Elevator Pitch

Enterprise Class Virtual Tape Libraries

Nolan Leake Co-Founder, Cumulus Networks Paul Speciale

OpenStack for the Enterprise

Presentation transcript:

Tier2 Site Report HEPiX Fall 2016 High Energy Physics hep.caltech.edu/cms Wayne Hendricks – twh@caltech.edu

CMS Computing Team Harvey Newman Principal Investigator Maria Spiropulu Principal Investigator Jean-Roch Vlimant Postdoc, Development Justas Balcas Computing, Development Dorian Kcira Computing, Tier2 Azher Mughal Networking, Tier2 Wayne Hendricks Computing, Tier2 Josh Bendavid Postdoc, Development Tier2 Update - HEPiX Fall 2016

Caltech CMS Group’s Hybrid Physics/Engineering Team A team with expertise in multiple areas, focused on R&D leading to production improvements in data and network operations For the LHC experiments now and in future LHC Runs Feeding into advanced planning for the next Runs, into the HL LHC era New concepts: intelligent network-integrated systems; new modes of operation for DOE’s Leadership HPC facilities with petabyte datasets Areas of Expertise: State of the art long distance high throughput data transfers Pervasive real-time monitoring of networks and end-systems, Real-time monitoring of hundreds of XrootD data servers used by the LHC experiments supported MonALISA system Autonomous steering and control of large distributed systems using robust agent- based architectures Software defined networks with end-to-end flow control Integration of network awareness and control into the experiments’ data and workflow management as in Caltech’s OliMPS, ANSE and SDN NGenIA projects Exploration of Named Data Networking as a possible architecture replacing the current Internet, with leading NDN groups in climate science including Colorado State that has deployed an NDN testbed Development (as in US LHCNet) of software driven multilayer dynamic circuits and programmable optical patch panels. A virtualized connection service applicable to megadata centers and cloud computing Tier2 Update - HEPiX Fall 2016

The Caltech Tier2+3 at Powell Booth Computing and Networking at Caltech LHC Data Production and Analysis + Next-Gen Developments The Caltech CMS Group, which designed and developed the 1st ( MONARC) LHC Computing Model and the Tier2 Center in 1998-2001 Operates, maintains and develops several production- and R&D- oriented computing, storage and network facilities on campus: The Caltech Tier2 facility at Powell-Booth and Lauritsen Lab (the 1st of 160+ today) provides substantial and reliable computing resources to US CMS and CMS (67 kHS06 CPU, 4.8 petabyte storage) in the worldwide LHC Grid The group operates and develops the campus’ 100G connections to nat’l and international networks through the NSF “CHOPIN” project Leading edge software defined network (SDN) and Data Transfer Node testbeds linked to CENIC, ESnet, Internet2, Starlight and key LHC sites The Caltech Tier2+3 at Powell Booth The Caltech Tier2 was the first to deploy a 100G uplink and meet the 20G+ throughput milestone; helped other US CMS groups achieve the milestone Tier2 Update - HEPiX Fall 2016

SDN State of the Art Development Testbed Caltech, Fermilab, StarLight, Michigan, UNESP; + CERN, Amsterdam, Korea 13+ Openflow switches: Dell, Pica8, Inventec, Brocade, Arista; Huawei Many 40G, N X 40G, 100G Servers: Dell, Supermicro, 2CRSI, Echostreams; and 40G and 100G Network Interfaces: Mellanox, QLogic Caltech Equipment funded through the NSF DYNES, ANSE, CHOPIN projects, and vendor donations https://sdnlab.hep.caltech.edu UMich Starlight Real-time Auto- Discovered SDN Testbed Topology Tier2 Update - HEPiX Fall 2016

Tier2 Testbed: Targeting Improved Support for Data Federations and Data Ops A focus is to develop and test the Use of HTTP data federations analogous to today’s AAA production infrastructure. As a first step, we have built server systems that produce and ingest network data transfers up to 80 Gbps In collaboration with the AAA project, HTTP support has been enabled at the US redirector and the Caltech testbed. As part of the project, the Caltech group has developed a plugin for CMSSW based on Davix library for HTTP[S] access, and performance measurements tested between different protocols. More details: https://cms-mgt-conferences.web.cern.ch/cms-mgt- conferences/conferences/pres_display.aspx?cid=1764&pid=13512 The group also is targeting improved file system performance, using well-engineered file system instances of the Hadoop Distributed File System (HDFS) Clarify and improve HDFS throughput CEPH File System: Known for high performance; engineered with SSDs + SAS3 Storage Arrays + 40G, 100G network interfaces Partnering with Samsung, HGST, 2CRSI We are setting up multi-GPU systems for machine learning with support from Orange Labs Silicon Valley and Echostreams Tier2 Update - HEPiX Fall 2016

CMS Tier2 Stats Around 7000 cores/threads contributing to CMS Around 300 compute nodes and 20 storage datanodes in 18 racks Compute and storage is mostly combined Most network connections are 1Gb, with a portion of large datanodes on 10Gb Disk sizes range from 1TB to 8TB with a mix of enterprise and commodity drives Use of virtual machines limited to testing and experimental hosts 100Gb link between Powell-Booth Lab and Lauritsen Lab Tier2 Update - HEPiX Fall 2016

Current Hardware Deployed Most of our current hardware is in these two configurations. We also have some older 1U machines due to be retired in the next few months. Supermicro Twinpro Combined Compute and Storage Supermicro 4U Storage Chassis Tier2 Update - HEPiX Fall 2016

Tier2 Improvements The move to CentOS7 is in progress A Ganeti cluster will be implemented to virtualize most core services Bestman will be retired and replaced by LVS to balance CentOS7 gridftps Replacing Ganglia with Telegraf/InfluxDB to make metrics exportable and compatible with other tools Working with UCSD to implement an XrootD cache system for hot datasets between Caltech & UCSD HTTP as a Data Access Protocol for CMS data over XrootD CephFS build and testing is underway Tier2 Update - HEPiX Fall 2016

Current Tier2 Challenges There are drawbacks to combining compute and storage at scale Distributed filesystems such as HDFS work best when most nodes in the cluster have similar transfer capabilities Upgrading network or storage capabilities for the entire cluster can be expensive or impossible with combined hardware A split into two classes of nodes makes sense for us - each focused on a specific operation, fast data transfer or compute Tier2 Update - HEPiX Fall 2016

Benefits of Host Classes Two classes of hosts will allow us to allocate funds directly for our needs No risk of compute jobs or data transfer processes interfering with each other We can tune each class of host specifically for the operation it performs Newer blade systems and dedicated storage systems require less power and rackspace than our current hardware Network cabling can be reduced with only high speed links between chassis Tier2 Update - HEPiX Fall 2016

Storage Commodity disks may be cheaper initially, but various quality issues we have experienced negate this in the long run Cheaper backplanes also suffer from the same issues with quality and reliability HGST has an attractively priced 4U60 backplane with 60 fast reliable disks 5 year warranty on disks and backplane Can connect to two hosts to two onboard controllers and each serve half the disks Once a critical mass of high performance datanodes is reached, we will be able to retire all of the remaining commodity disks We will retire the oldest enterprise disks gradually and move completely to the dedicated storage chassis Tier2 Update - HEPiX Fall 2016

Prototype Storage Nodes 40Gb 40Gb Supermicro Twin 2U Node 1 Node 2 2x LSI 9380-8e 4x 12Gb SAS ports 4x 12Gb SAS ports HGST 4U60 SAS Controller 1 Controller 2 60 6TB SAS disks, 30 served by each node Tier2 Update - HEPiX Fall 2016

CephFS Test Configuration Performance evaluation of CephFS with a cluster of three host configurations Will investigate potential for use in the CMS Tier2 Should have deployment done in the next two weeks Live file transfers at Supercomputing 2016 Hope to use in production for VM block storage Clients Metadata Hosts SSD RAID10 HA 40Gb Switch Storage Host SSD Journals + SATA Storage Host NVME Journals + SAS Storage Host SSD Journals + SATA & Bcache Tier2 Update - HEPiX Fall 2016

Compute With the next round of purchasing, all 4 and 8 core nodes will be retired By deploying storage separately, we can focus on density and efficiency Supermicro has a 6U microblade chassis with 28 nodes (1300 E5-2640v4 threads) Redundant 10Gb network links in the backplane for all nodes Intel switch module has 4x40Gb uplinks and also has SDN capability Each node can support two 2.5” 6Gb SATA SSDs, so local job IO is fast Tier2 Update - HEPiX Fall 2016

Compute Chassis Supermicro X11 Microblade Chassis Intel Switch Module Tier2 Update - HEPiX Fall 2016

Results Limited rackspace and power consumption is markedly reduced Equivalent Twinpro PSU wattage 21000W, vs 6U Microblade 16000W Data transfer speeds between the storage nodes is greatly increased Limits our risk of exposure during a hardware failure by ensuring data is replicated very quickly between high speed storage nodes Tier2 Update - HEPiX Fall 2016

Thank You Stop by our booth at Supercomputing 2016 next month to see live high-speed data transfer demonstrations Thank You Tier2 Update - HEPiX Fall 2016