Download presentation
Presentation is loading. Please wait.
1
CSC Site Update HP Nordic TIG April 2008 Janne Ignatius Marko Myllynen Dan Still
2
CSC at glance Founded in 1970 as a technical support unit for Univac 1108 Reorganized as a company, CSC - Scientific Computing Ltd. in 1993 All shares to the Ministry of Education of Finland in 1997 Operate on a non-profit principle Facilities in Keilaniemi, Espoo, since March, 2005 MISSION CSC is the national IT center for science developing and providing services for universities, research institutes, and industry. VISION CSC is well known and appreciated in Finland as well as abroad as a pioneer, collaboration partner, and center of competence in the field of IT technology for science. CSC at a Glance
3
CSC’s Services FUNET SERVICES COMPUTING SERVICES APPLICATION SERVICES DATA SERVICES FOR SCIENCE AND CULTURE INFORMATION MANAGEMENT SERVICES
4
Louhi - Cray XT4 Supercomputer 1st phase installed 04/2007 1012 computing nodes each having 2.6 GHz AMD Opteron dual core processor High bandwidth low latency interconnect (SeaStar2) 1 - 2 GB memory per core Peak performance 10.6 teraflops Final configuration (to be installed Q3/2008) core count open, 1-2 GB memory per core Peak performance 70+ teraflops
5
Murska - HP CP4000 BL ProLiant Supercluster Installed 04/2007, expanded 11/2007 544 compute nodes each having two 2.6 GHz AMD Opteron dual core processor 2176 compute cores 4x DDR InfiniBand interconnect 5 TB total memory: 256 nodes * 4GB, 128 * 8GB, 128 * 16GB, 32 * 32GB 100 TB SFS/Lustre file system Peak performance 11.3 teraflops
6
Murska - HP CP4000 BL ProLiant, cont. RHEL 4 based HP XC 3.1 cluster operating system SLURM/LSF HP-MPI PGI, PathScale, GNU, TotalView, ACML, … HP Xtools, collectl, mpe2, … Blade hardware working surprisingly well Interconnect working nicely Disk system also working ok after initial issues MSA20 disk array failure recovery suboptimal SFS quota still limited to 4 TB System constantly in heavy use
7
Murska - HP CP4000 BL Availability Three unexpected breaks after Nov 2007 upgrades 29.1.2008: SFS hang, fixed with disk array reset 30.1.2008: Ethernet switch died (in the cabin where several power supplies had died few days earlier..) 12.3.3008: SFS hang, fixed with disk array reset System availability since Nov 2007 95%-100% System usage since Nov 2007 30%-100%
8
Sepeli - HP ProLiant DL145 Cluster Installed 2005 128 (earlier 256) compute nodes 512 cores and 2 TB memory 4x DDR InfiniBand / GigE interconnect 4 TB PVFS2 / NFS disk system Peak performance 3.1 teraflops Earlier part of national M-grid, now being dedicated to LHC use (particle collision data analysis)
9
Sepeli - HP ProLiant DL145 Cluster, cont. RHEL 4 based Rocks 3.1 cluster operating system SGE Overall system lifespan price/performance quite satisfactory InfiniBand hardware very stable Grid Engine tight integration with multiple MPI flavors labor-intensive DL145 iLO initially unreliable, improved over time
10
Material Sciences National Grid Infrastructure (M-grid) A joint project of CSC, 7 Finnish universities and Helsinki Institute of Physics funded by the Finnish Academy for the National Research Infrastructure Program in the Grid area Aims to build a homogeneous PC-cluster environment with theoretical peak of approx. 3 teraflops per 350 nodes Environment Hardware: Provided by HP. Dual AMD Opteron 1.8-2.2 GHz nodes with 2-8 GB memory, 1-2 TB shared storage, separate 2xGE (communications and NFS), remote administration OS: NPACI Rocks Cluster Distribution / 64 bit, based on Red Hat Enterprise Linux 3, 4 Grid middleware: NorduGrid ARC Grid MW compiled With Globus 3.2.1 libraries, Sun Grid Engine as LRMS Centrally managed configuration with Cfengine CSC Administration tasks Maintains Operating System, LRMS, Grid middleware, certain libraries Separate small test cluster for testing new software releases, Tools for system monitoring, integrity checking, etc. CSC Administration tasks Maintains Operating System, LRMS, Grid middleware, certain libraries Separate small test cluster for testing new software releases, Tools for system monitoring, integrity checking, etc.
11
Some international activities PRACE DEISA EGEE, EGI, NDGF; HPC-EUROPA, …
12
Thank You! Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.