Presentation is loading. Please wait.

Presentation is loading. Please wait.

CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t HEPiX Report Helge Meinhard, Edoardo Martelli, Giuseppe Lo Presti / CERN-IT Technical.

Similar presentations


Presentation on theme: "CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t HEPiX Report Helge Meinhard, Edoardo Martelli, Giuseppe Lo Presti / CERN-IT Technical."— Presentation transcript:

1 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t HEPiX Report Helge Meinhard, Edoardo Martelli, Giuseppe Lo Presti / CERN-IT Technical Forum/Computing Seminar 11 November 2011

2 Outline Meeting organisation; site reports (Helge Meinhard) Networking and security; computing; cloud, grid, virtualisation (Edoardo Martelli) Storage; IT infrastructure (Giuseppe Lo Presti) 20 years of HEPiX (Helge Meinhard) HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011

3 HEPiX Global organisation of service managers and support staff providing computing facilities for HEP Covering all platforms of interest (Unix/Linux, Windows, Grid, …) Aim: Present recent work and future plans, share experience, advise managers Meetings ~ 2 / y (spring in Europe, autumn typically in North America) HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011

4 HEPiX Autumn 2011 (1) Held 24 – 28 October at Simon Fraser University, Vancouver, BC, Canada –Hosted jointly by TRIUMF, SFU, University of Victoria –Excellent local organisation Steven McDonald and his team proved up to expectations for 20 th anniversary meeting –Nice auditorium –Vancouver: very vivid city, all kinds and classes of restaurants. Nice parks, mountains within easy reach… Banquet at 1’100 m altitude in the snow, Grizzly bears not far Special session at the occasion of HEPiX’ 20 th anniversary Sponsored by a number of companies HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011

5 HEPiX Autumn 2011 (2) Format: Pre-defined tracks with conveners and invited speakers per track –Extremely rich, interesting and packed agenda –Judging by number of submitted abstracts, no real hot spot: 8 infrastructure, 8 Grid/clouds/virtualisation, 7 network and security, 6 storage, 4 computing… plus 17 site reports –Special track on 20 th anniversary with 5 contributions –Some abstracts submitted late (Thu/Fri before meeting!), planning difficult Full details and slides: http://indico.cern.ch/conferenceDisplay.py?confId=138424 http://indico.cern.ch/conferenceDisplay.py?confId=138424 Trip report by Alan Silverman available, too http://cdsweb.cern.ch/record/1397885 http://cdsweb.cern.ch/record/1397885 HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011

6 HEPiX Autumn 2011 (3) 98 registered participants, of which 10/11 from CERN –Cass, Lefebure, Lo Presti, Martelli, Meinhard, Rodrigues Moreira, Salter, Schröder, (Silverman), Toebbicke, Wartel –Many sites represented for the first time: Canadian T2s, Melbourne, Ghent, Trieste, Wisconsin, Frascati, … –Vendor representation: AMD, Dell, RedHat –Compare with GSI (spring 2011): 84 participants, of which 14 from CERN; Cornell U (autumn 2010): 47 participants, of which 11 from CERN –Record attendance for a North American meeting! HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011

7 HEPiX Autumn 2011 (4) 55 talks, of which 15 from CERN –Compare with GSI: 54 talks, of which 13 from CERN –Compare with Cornell U: 62 talks, of which 19 from CERN Next meetings: –Spring 2012: Prague (April 23 to 27) –Autumn 2012: Beijing (hosted by IHEP; date to be decided, probably 2 nd half of October) HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011

8 Site reports (1): Hardware CPU servers: same trends –12...48 core boxes, AMD and Intel mentioned equally frequently, 2...4 GB/core. Some nodes with 128 GB, even 512 GB –Quite a number of problems reported with A-brand suppliers and their products Disk servers –Still a number of problems in interplay of RAID controllers with disk drives – controllers throwing perfectly healthy drives –Severeness of disk drive supply not yet known at HEPiX Tapes –A number of sites mentioned T10kC in production (preferred over LTO at major sites such as FNAL) –LTO very popular, many sites investigating (or moving to) LTO5 HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011

9 Site reports (2): Software OS –Quite some sites mentioned migration to RHEL 6 / SL 6 FNAL hired replacement for Troy Dawson Triggers bug in Nehalem sleep states –Windows 7 is in production at many sites –Exots: Tru64, Solaris; CentOS Storage –Lustre: used at at least 7 sites –CVMFS mentioned in at least 6 site reports (of 17) –EOS at CMS T1 at FNAL – they are quite happy –NFS: GSI getting out; BNL reported bad results with NFS 4.1 tests using Netapp and Bluearc HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011

10 Site reports (3): Software (cont’d) Batch schedulers –Grid Engine rather popular. All but IN2P3 going for UNIVA version. In fact, not much mention about Oracle this time at all… –Some (scalability?) problems with PBSpro / Torque- MAUI, negative comments about PBSpro support –Condor, SLURM mentioned – mostly positively Virtualisation –Many sites experimenting with KVM, XEN on its way out (often linked with SL5 to SL6 migration) –Some very aggressive use of virtualisation (gatekeepers, AFS servers, Condor and ROCKS masters, Lustre MGS, …) Service management –FNAL, PDSF migrating from Remedy to Service-now HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011

11 Site reports (4): Infrastructure Infrastructure –Cube prototype for FAIR: 2 storeys, 96 racks, PUE 1.07 –LBNL data centre construction hindered by lawsuits Configuration management –Puppet mentioned a number of times –Chef, cfengine2/3 used as well HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011

12 Site reports (5): Miscellaneous Tendency is multidisciplinary labs –More focus on HPC and GPU than in HEP IP telephony / VoIP mentioned at least twice Business continuity is a hot topic for major sites –Dedicated track at next meeting HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011

13 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Report from HEPiX 2011: Computing, Networking, Security, Clouds, Virtualization Geneva – 11 th November 2011 edoardo.martelli@cern.ch

14 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 14 Computing

15 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 15 AMD Interlagos New AMD 16 cores processor : Interlagos Interlagos with Bulldozer design: two parallel threads, extended instruction set, power efficiency (unused core are switched off), best value per unit. Better to add cores rather than Hz: 50% more performance require 3 times the power. Evolution: 2005: 2 cores 1,8-3.2GHz 7-13 Gflops, 95W 2007: 3 cores 1.9-2.5GHz 30-20 Gflops, 95W 2008: 4 cores 2.5-2.9GHz 40-46Gflops 95W 2009: 6 cores 1.8-2.8GHz 43-67Gflops 95W 2010: 8-12 cores, 1.8-2.6G 58-120Gflops 105W

16 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 16 Intel Sandy Bridge/Dell Stampede DELL is building Stampede. It will be among the top ten of supercomputers Commissioned by TACC (Texas Advancedd Comp Centre); 27.5M USD from NSF. 10petaflops peak. 12800 Intel Sandy Bridge. 272TB of memory. 14PB of storage, with 150GBps Lustre file system. Intel Sandy Bridge can execute one floating point instruction per clock cycle. Will be available 2012Q1. Intel MIC architecture: Many cores with many threads per core. HEPspec: AMD Interlagos has slower single core speed, but the total processor power is higher (16 cores vs 8 of Sandy bridge).

17 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 17 CPU benchmarking at Gridka Presented the new generation of chip AMD Interlagos (16 core) and Intel Sandy Bridge (8cores) Benchmarking for tenders is difficult because performance vary depending on the version of the software used and on the OS type (32 or 64 bits).

18 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 18 Observations While the aggregated computing capacity of processors are increasing, the single core is getting slower. Thus, single thread application will be executed slower than before. To take advantage of new processors, the applications have to be rewritten to support multi thread. CPU power more abundant than disk space and network bandwidth.

19 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 19 Networking and Security

20 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 20 LHCONE The WLCG computing model is changing and moving towards a full mesh correlation of the sites. LHCONE is the network dedicate which will interconnect Tier1s and major Tier2s and Tier3s. LHCONE is a network built on top of Open Exchange Points interconnected by long distance links provided by R&E Network Operators. A work in progress

21 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 21 IPv6 at CERN and FZU IPv6 deployment has started at CERN and FZU IPv6 is still lacking some functionality, but it will be necessary. Changes to management tools will require time and money It's not only a matter of the network department: developers, sys admins, operations will have to act.

22 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 22 HEPiX IPv6 WG 16 groups from Europe and the US and one experiment (CMS) have joined the WG. Testbed activity: an IPv6 VO hosted by INFN have been created with five connected sites. Test of grid data transfer will start next month. If OK, CMS will do data transfer tests from December. Gap analysis activity: the WG will perform a gap analysis about readiness of grid applications. A survey is being prepared. Collaboration with EGI for a source code checker.

23 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 23 Computer Security Attackers are becoming professionals, motivated by profits. Trust is being compromised: - Certification Authorities compromised - Social networks used to drive to malicious sites - Popular web sites used to spread infections - Governments using spying softwares Smartphones easier to compromise than personal computer HEP is also a target: CPU power needed to coin bitcoins. Primary infection vector: stolen accounts.

24 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 24 Computer Security Attackers are becoming professionals, motivated by profits. Trust is being compromised: - Certification Authorities compromised - Social networks used to drive to malicious sites - Popular web sites used to spread infections - Governments using spying softwares Smartphones easier to compromise than personal computer HEP is also a target: CPU power needed to coin bitcoins. Primary infection vector: stolen accounts.

25 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 25 IPv6 Security IPv6 has many security weakness: - by design: it was designed when many IPv4 weaknesses hadn't been yet exploited. - by implementation: many stacks are still partially implemented; specs and RFCs are often inconsistent. - by configuration: with dual-stack, using two protocols at the same time may help to evade packet inspection. The huge address space is more difficult to control or block. Everything will have to be verified.

26 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 26 Observations Jefferson Lab was hacked: undetected for 6 weeks, offline for 2 weeks, long time to go back to full speed Lot of interest on LHCONE Most (all) of the new servers come with 10G NIC; thus lot of sites are buying “cheap”, high density 10G switches. No mention of 40G or 100G Not many planning for IPv6, although lot of interest.

27 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 27 Grids, Clouds and Virtualization

28 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 28 Clouds and Virtualization Presented several tools for cloud management: - Cloudman - OpenNebula - Eucalyptus - Openstack Lxcloud: several tools and hypervisors evaluated (OpenNebula, Openstack, LSF, Amazon EC2) Clouds and virtualization at RAL: Hyper-V was chosen. Now evaluating OpenStack and StratusLab Virtualization WG: working on policy and tools for image distribution.

29 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 29 Observations No clear best/preferred tool Many activities on going

30 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t 30 Thank you

31 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t HEPiX Fall 2011 Highlights IT infrastructure Storage Giuseppe Lo Presti / IT-DSS CERN, November 11 th, 2011

32 IT Infrastructure  8 Talks –CERN Computing Facilities –Deska, a Fabric Management Tool –Scientific Linux –SINDES: Secure file storage and transfer –Use of OCS for hw/sw inventory –Configuration Management at GSI –Hardware failures at CERN –TSM Monitoring at CERN HEPiX Fall 2011 Highlights – Giuseppe Lo Presti

33 CERN CC  An overview of the current status and plans of the CC –Cooling issues (similarly to most sites): addressed for now by increasing room temperature and using outside fresh air  Estimated gain: ~GWh per year! –Civil engineering works well advanced, to finish by December 2011 –Some water leaks…  Luckily without any serious consequence to equipment –Large scale hosting off-site: call for tender is out  And an overview of the most common failures in the CC –Largely dominating: hard drive failures  MTTF measured at 320 khours, specs say 1.2 Mhours –A rather long debate after the talk… HEPiX Fall 2011 Highlights – Giuseppe Lo Presti

34 Fabric Management  Different solutions in different centres… –DESKA at FZU, Prague –Chef at GSI, Darmstadt –OCS for hardware inventory at CERN  (I know, not exactly fitting the same scope)  Same issue: no one has a clean solution to be happy with –Complains span from missing features to scalability issues –What follows is an overview of some of the software used at different centres HEPiX Fall 2011 Highlights – Giuseppe Lo Presti

35 Fabric Management  DESKA: a language to describe hw configuration –Based on PostgreSQL + PgPython + git for version control –CLI, Python binding –Not yet deployed, concerns about being ‘too’ flexible  You can describe pretty much anything, what is the real effort in describing a CC?  Chef at GSI –A ‘buzzword bingo’ –Based on the Ruby language  Sysadmins were trained –Tried on real life on a brand new batch cluster  OCS: an external tool being adopted at CERN to do inventory of computing resources HEPiX Fall 2011 Highlights – Giuseppe Lo Presti

36 Scientific Linux  A “standard” update on SL releases and usage  Starting with a quote from Linux Format –if it’s good enough for CERN, it’s good enough for us  Well, kind of…  People –Troy Dawson left Fermilab to join RedHat –Two new members have joined the team  SL 6.1 released in July  Overall world-wide usage greatly increasing –Mostly SL5, SL6 ramping up, SL3(!) still used HEPiX Fall 2011 Highlights – Giuseppe Lo Presti

37 Secure file storage and transfer  With SINDES, the Secure INformation DElivery System  New version 2 –To overcome shortcomings with current version  E.g. lack of flexibility in authorizations  A number of new features –E.g. plug-ins for authentication and authorization, versioning  To be deployed at CERN during 2012 HEPiX Fall 2011 Highlights – Giuseppe Lo Presti

38 Storage and File Systems  6 Talks –Storage at TRIUMF –EMI, the 2 nd year –Storage WG update –Migrating from dCache to Hadoop –CASTOR and EOS at CERN –CVMFS update HEPiX Fall 2011 Highlights – Giuseppe Lo Presti

39 Storage at TRIUMF  Disk: 2.1 PB Usable (ATLAS)  Tape: 5.5 PB on LTO4 & LTO5 cartridges –Using IBM high-density library –Quite painful experience during 2010, issues with tapes inventory only fixed after IBM released firmware in Oct 2010  Optimizing tape read performance –Tapeguy, in-house development –Reorders staging requests to minimize mounts  Provided they’re large enough. Not always the case… HEPiX Fall 2011 Highlights – Giuseppe Lo Presti

40 EMI Status  A (partially political) update on EMI by P. Fuhrmann  Goal: bringing together different existing grid/DM middleware's, and ensure long term operations –However, long term planning still not clear  First release just out  Highlights on Data Management –pNFS is a ‘done deal’ –WebDAV frontend for LFC and SEs with http redirects  Completely ignoring the SRM semantics –dCache labs (preliminary): a data access abstraction layer to plug in any storage  Working on a proof of concept with Hadoop HEPiX Fall 2011 Highlights – Giuseppe Lo Presti

41 Storage WG Update  Goal: compare storage solutions adopted by HEP  Report about recent (October 2011) tests at FZK –AFS, NFS, xroot, Lustre, GPFS –Use cases: taken from ATLAS and CMS –Disclaimer: moving target!  Andrei provides details on the setup for each FS  Results: quite a number of plots –Xroot recovering the (previous) gap –CMS use case is CPU bound client-side  Next candidate to test: Swift (OpenStack), probably HDFS, … HEPiX Fall 2011 Highlights – Giuseppe Lo Presti

42 Migrating from dCache to HDFS  Report about the experience at a Tier2 –UW Madison, part of US CMS –1.1 PB usable storage  Very happy with dCache, still willing to migrate to Hadoop –And a technical opportunity came in Spring 2011: migrating to Hadoop in less time than converting dCache to Chimera? –Many constraints: being rollback capable, idempotent, online to the maximum extent… –Exploiting the Hadoop FUSE plugin  Took 2 months, one day downtime –Now ‘happy’, and able to leverage experience in cloud computing when hiring HEPiX Fall 2011 Highlights – Giuseppe Lo Presti

43 CASTOR and EOS at CERN  Recap on strategy: CASTOR for the Tier0, EOS for end-user analysis  Recent improvements in CASTOR –Transfer Manager for disk scheduling –Buffered Tape Marks for improving tape migration  EOS is being moved into a production service –A review of the basic design principles –Ramping up installed capacity, migrating CASTOR pools  A few comments on EOS –J.Gordon: “It seems you like doing many things from scratch”… –Support for SRM/BeStMan –… HEPiX Fall 2011 Highlights – Giuseppe Lo Presti

44 To conclude… HEPiX Fall 2011 Highlights – Giuseppe Lo Presti Vancouver Downtown view from Grouse Mountains HEPiX Banquet, October 27 th, 2011

45 20 th anniversary (1) Banquet on Thursday night –Warm thanks to Alan for 20 years of pivotal role for HEPiX (“HEPiX elder statesman”) 5 talks on Friday morning – quite some early HEPiX attendants present –Alan Silverman: HEPiX from the beginning –Les Cottrell: Networking –Thomas Finnern: HEPi-X-perience –Rainer Toebbicke: 20 years of AFS at CERN –Corrie Kost: A personal overview of computing HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011

46 20 th anniversary (2) HEPiX from the beginning – Alan Silverman –Learning from previous experience of HEP-wide collaboration on VM and VMS –Parallel meetings in Europe and North America until 1995 –Windows (HEPNT) joined 1997 –HEPiX working groups: HEPiX scripts; AFS; large cluster SIG; mail; X11 scripts; security; benchmarking; storage; virtualisation; IPv6 –Another success story: adoption of Scientific Linux HEP-wide –Alan’s personal rankings: Most western meeting(s): Vancouver (not much more so than SLAC) Most eastern meeting: Taipei Most northern meeting: Umeå Most southern meeting: Rio (most dangerous as well…) Most secure meeting: BNL Most exotic meeting: Taipei … HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011

47 20 th anniversary (3) Alan’s conclusion: HEPiX gives value to the labs for the money they spend Michel Jouvin, current European co-chair: “HEPiX is healthy after 20 years with plenty of topics to discuss for the next 20!” HEPiX report – Helge.Meinhard at cern.ch – 11-November-2011


Download ppt "CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t HEPiX Report Helge Meinhard, Edoardo Martelli, Giuseppe Lo Presti / CERN-IT Technical."

Similar presentations


Ads by Google