QMUL e-Science Research Cluster Introduction (New) Hardware Performance Software Infrastucture What still needs to be done.

Slides:

Advertisements

Similar presentations

A Lightweight Platform for Integration of Mobile Devices into Pervasive Grids Stavros Isaiadis, Vladimir Getov University of Westminster, London {s.isaiadis,

Advertisements

Large-Scale, Adaptive Fabric Configuration for Grid Computing Peter Toft HP Labs, Bristol June 2003 (v1.03) Localised for UK English.

Slide 1 Insert your own content. Slide 2 Insert your own content.

Dynamic Resource Management for Virtualization HPC Environments Xiaohui Wei College of Computer Science and Technology Jilin University, China. 1 PRAGMA.

Open Science Grid Living on the Edge: OSG Edge Services Framework Kate Keahey Abhishek Rana.

LT 2 London Tier2 Status Olivier van der Aa LT2 Team M. Aggarwal, D. Colling, A. Fage, S. George, K. Georgiou, W. Hay, P. Kyberd, A. Martin, G. Mazza,

Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.

Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.

University of St Andrews School of Computer Science Experiences with a Private Cloud St Andrews Cloud Computing co-laboratory James W. Smith Ali Khajeh-Hosseini.

© University of Reading David Spence 20 April 2014 e-Research: Activities and Needs.

Liverpool HEP – Site Report May 2007 John Bland, Robert Fay.

Winnie Lacesso Bristol Site Report May Scope User Support / Servers / Config Security / Network UKI-SOUTHGRID-BRIS-HEP Upcoming: major infrastructure.

The Impact of Soft Resource Allocation on n-tier Application Scalability Qingyang Wang, Simon Malkowski, Yasuhiko Kanemasa, Deepal Jayasinghe, Pengcheng.

Clustering Architectures in GIS/SI

Site Report: The Linux Farm at the RCF HEPIX-HEPNT October 22-25, 2002 Ofer Rind RHIC Computing Facility Brookhaven National Laboratory.

Ted Krueger SQL Server MVP Data Architect Building a SQL Server Test Lab.

What's new?. ETS4 for Experts - New ETS4 Functions - improved Workflows - improvements in relation to ETS3.

Configuration management

CBPF J. Magnin LAFEX-CBPF. Outline What is the GRID ? Why GRID at CBPF ? What are our needs ? Status of GRID at CBPF.

ETS4 - What's new? - How to start? - Any questions?

Dynamic Data Partitioning for Distributed Graph Databases Xavier Martínez Palau David Domínguez Sal Josep Lluís Larriba Pey.

Opening Workshop DAS-2 (Distributed ASCI Supercomputer 2) Project vrije Universiteit.

5.9 + = 10 a)3.6 b)4.1 c)5.3 Question 1: Good Answer!! Well Done!! = 10 Question 1:

White Rose Grid Infrastructure Overview Chris Cartledge Deputy Director Corporate Information and Computing Services, The University of Sheffield

1 Chapter 11: Data Centre Administration Objectives Data Centre Structure Data Centre Structure Data Centre Administration Data Centre Administration Data.

National Grid's Contribution to LHCb IFIN-HH Serban Constantinescu, Ciubancan Mihai, Teodor Ivanoaica.

Performance Considerations of Data Acquisition in Hadoop System

Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.

Chris Brew RAL PPD Site Report Chris Brew SciTech/PPD.

IBM 1350 Cluster Expansion Doug Johnson Senior Systems Developer.

Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.

Database Infrastructure Major Current Projects –CDF Connection Metering, codegen rewrite, hep w/ TRGSim++ – Dennis –CDF DB Client Monitor Server and MySQL.

UCL Site Report Ben Waugh HepSysMan, 22 May 2007.

Operational computing environment at EARS Jure Jerman Meteorological Office Environmental Agency of Slovenia (EARS)

A Makeshift HPC (Test) Cluster Hardware Selection Our goal was low-cost cycles in a configuration that can be easily expanded using heterogeneous processors.

PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.

So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management.

March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.

David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\

12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.

UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPIX 2009 Umea, Sweden 26 th May 2009.

Tier1 Status Report Martin Bly RAL 27,28 April 2005.

Software Scalability Issues in Large Clusters CHEP2003 – San Diego March 24-28, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek RHIC.

ATLAS in LHCC report from ATLAS –ATLAS Distributed Computing has been working at large scale Thanks to great efforts from shifters.

Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.

RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.

ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.

Deploying a Network of GNU/Linux Clusters with Rocks / Arto Teräs Slide 1(18) Deploying a Network of GNU/Linux Clusters with Rocks Arto Teräs.

Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.

Cluster Software Overview

HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.

Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.

RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June

BaBar Cluster Had been unstable mainly because of failing disks Very few (

CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.

A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE1 University of Wisconsin-Madison CMS Tier-2 Site Report D. Bradley, S. Dasu, A. Mohapatra, T. Sarangi, C. Vuosalo.

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.

CFI 2004 UW A quick overview with lots of time for Q&A and exploration.

CernVM-FS vs Dataset Sharing

Experience of Lustre at QMUL

PROOF – Parallel ROOT Facility

GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, v. 0.8.

Experience of Lustre at a Tier-2 site

Quattor Usage at Nikhef

TYPES OF SERVER. TYPES OF SERVER What is a server.

Software Engineering Introduction to Apache Hadoop Map Reduce

DGAS Today and tomorrow

Presentation transcript:

QMUL e-Science Research Cluster Introduction (New) Hardware Performance Software Infrastucture What still needs to be done

Slide 2 Alex Martin QMUL e-Science Research Cluster Background Formed e-Science consortium within QMUL to bid for SRIF money etc (no existing central resource) Received money in all 3 SRIF rounds so far. Led by EPP + Astro + Materials+ Engineering Started from scratch in 2002, new machine room, Gb networking. Now have 230 kW of A/C Differing needs other fields tend to need parallel processing support MPI etc. Support effort a bit of a problem.

Slide 3 Alex Martin QMUL e-Science Research Cluster History of the High Throughput Cluster Already in its 4 th year (3 installation phases) In addition Astro Cluster of ~70 machines

Slide 4 Alex Martin QMUL e-Science Research Cluster

Slide 5 Alex Martin QMUL e-Science Research Cluster

Slide 6 Alex Martin QMUL e-Science Research Cluster dual dual – core 2 Ghz Opteron nodes with 8 Gbyte remainder with 4 Each with 2 x 250 Gbyte HD 3-COM Superstack network stack Dedicated second network for MPI traffic APC 7953 vertical PDU's Total measured power usage seems to be ~1A/machine ~ kW total

Slide 7 Alex Martin QMUL e-Science Research Cluster Crosscheck:

Slide 8 Alex Martin QMUL e-Science Research Cluster Ordered last week in March 1 st batch of machines delivered in 2 weeks 5 further batches 1 week apart 3 week delay for proper PDU's Cluster cabled up and powered 2 weeks ago Currently all production boxes running legacy sl3/x86 Issues with scalability of services torque/ganglia. Also shared experimental area is I/0 bottleneck

Slide 9 Alex Martin QMUL e-Science Research Cluster

Slide 10 Alex Martin QMUL e-Science Research Cluster Cluster has been fairly heavily used ~40-45% on average

Slide 11 Alex Martin QMUL e-Science Research Cluster Tier-2 Allocations

Slide 12 Alex Martin QMUL e-Science Research Cluster S/W Infrastructure MySQL database containing all static info about machines and other hardware + network + power configuration Keep s/w configuration info in a subversion database: os version and release tag Automatic (re)installation and upgrades using a combination of both, tftp/kickstart pulls dynamic pages from web (Mason).

Slide 13 Alex Martin QMUL e-Science Research Cluster

Slide 14 Alex Martin QMUL e-Science Research Cluster Ongoing work Commission SL4/x86_64 service (~30% speed improvement) (assume non-hep usage initially). Able to migrate boxes on demand. Tune MPI performance for jobs upto ~160 CPUs (non-ip protocol?) Better integrated monitoring (ganglia +pbs + opensmart? + existing db) dump Nagios? Add 1-wire Temp + power sensors.

Slide 15 Alex Martin QMUL e-Science Research Cluster Ongoing work continued Learn how to use large amount of distributed storage in efficient and robust way. Need to provide a POSIX f/s ( probably extending poolfs or something like lustre )