Download presentation
Presentation is loading. Please wait.
Published byDylan Manning Modified over 9 years ago
1
Wolfgang Friebel, 16.11.2001 C5 report HEPiX Fall 2001 Report (2) NERSC, Berkeley
2
Nov 16, 2001C5 Report2 Further topics covered Batch (Sun Grid Engine Enterprise Edition) Distributed Filesystems (Benchmarks) Security (again) (the concept at NERSC)
3
Nov 16, 2001C5 Report3 Batch systems Two talks on SGEEE (formerly known as Global Resource Director – GRD or Codine), see below FNAL presented new version of their batch system Main scope is resource management not load balancing FBSNG, written primarily in Python, Python API exists Comes with Kerberos 5 support NERSC reported experiences with LSF Not very pleased with LSF, will also evaluate alternatives
4
Nov 16, 2001C5 Report4 SGEEE Batch Ease of installation from source Access to source code Chance of integration into a monitoring system API for C and Perl Excellent load balancing mechanisms (4 scheduler policies) Managing the requests of concurrent groups Mechanisms for recovery from machine crashes Fallback solutions for dying daemons Weakest point is AFS integration and Token prolongation mechanism (basically the same code as for Loadleveler and for older LSF versions)
5
Nov 16, 2001C5 Report5 SGEEE Batch SGEEE has all ingredients to build a company wide batch infrastructure Allocation of resources according to policies ranging from departmental policies to individual user policies Dynamic adjustment of priorities for running jobs to meet policies Supports interactive jobs, array jobs, parallel jobs Can be used with Kerberos (4 and 5) and AFS, Globus integration underway SGEEE is open source maintained by Sun Getting deeper knowledge by studying the code Can enhance the code (examples: more schedulers, tighter AFS integration, monitoring only daemons) Code is centrally maintained by a core developer team Could play a more important role in HEP (component of a grid environment, open industry grade batch system as recommended solution within HEPiX?)
6
Nov 16, 2001C5 Report6 Scheduling policies Within SGEEE tickets are used to distribute the workload User based functional policy Tickets are assigned to projects, users and jobs. More tickets mean higher priority and faster execution (if concurrent jobs are running on a CPU) Share based policy Certain fractions of the system resources (shares) can be assigned to projects and users. Projects and users receive that shares during a configurable moving time window (e.g. CPU usage for a month based on usage during the past month) Deadline policy By redistributing tickets the system can assign jobs an increasing weight to meet a certain deadline. Can be used by authorized users only Override policy Sysadmins can give additional tickets to jobs, users or projects to temporarily adjust their relative importance.
7
Nov 16, 2001C5 Report7 Distributed Filesystems Candidates for benchmarking NFS versions 2 and 3 GFS (University of Minnesota/Sistina Software) AFS GPFS (IBM cluster file system, being ported to Linux) PVFS – Parallel Virtual Filesystem Not taken GPFS – IBM could get it working at NERSC under Linux (not ready?) PVFS – unstable in tests, single point of failure (metadata server) AFS – slower than NFS, tests done elsewhere, successfully running GFS – designed for SAN, runs over TCP with significant performance penalties, lock management not mature, stability for high number of clients not expected to be good. Good candidate for SAN’s
8
Nov 16, 2001C5 Report8 Distributed Filesystems Conclusion for NERSC: only NFS remains, AFS too heavy for them The talk discussed various combinations of Linux kernel versions (2.2.x and 2.4.x), NFS clients (v2 and v3) and servers (v2 and v3) Benchmarking tools used Bonnie Iozone Postmark Benchmarked equipment Dual 866Mhz PIII with 512MB RAM Escalade 6200 series 4 channel IDE RAID, with 3 72GB drives striped Results By carefully choosing Kernel and NFS Versions throughput can be increased For much more details consult the talk Other sites reported very bad NFS performance (confirms NERSC findings, that tuning for NFS is a must)
9
Nov 16, 2001C5 Report9 Distributed Filesystems: GFS Caspur is looking for a filesystem attached to a multinode Linux farm Looked for SAN based solutions NFS and GPFS discarded (NFS: performance, GPFS: extra HW & SW) Have chosen GFS, but trying to use GFS over IP (see next slide) By using a SCSI to IP converter (Axis from Dothill) they would be able to setup a serverless GFS Contradicting kernel requirements for GFS and AXIS currently Issues probably solved (11/2001) with equipment from Cisco Looks promising to them, more investigations to come
10
Nov 16, 2001C5 Report10
11
Nov 16, 2001C5 Report11 Computer Security at NERSC Very open community, need a balance between security and availability Main concepts used Intrusion detection using BRO (in house development, open source) Immediate actions against attackers (“shunning”) Scanning systems for vulnerabilities Keeping systems/software up to date Firewall for critical assets only(operation consoles, development systems) Virus wall for incoming emails Top level staff in computer security and networking Observed ever increasing scans (30-40 a day!!), threats Were able to track down hackers and reconstruct the attacks
12
Nov 16, 2001C5 Report12 Computer Security: BRO Passively monitors network Carefully designed to avoid packet drops at high speeds 622Mbps (OC-12) Two main components Event engine, converts network traffic into events (compression) Policy script interpreter (interprets output of event handlers) BRO interacts with the border router to drop hosts immediately (using ACL’s) on attacks BRO records all input in interactive sessions Allows to reconstruct data even if type ahead or completion mechanisms used
13
Nov 16, 2001C5 Report13 Computer Security: BRO Some of the analysis done in real time, deeper analysis done once a day offline NERSC is relying heavily on intrusion detection by BRO NERSC was able to quickly react on the “Code Red” worm (changes to BRO) Subsequently “Nimda” did very little damage Many more useful tips on practical security (have a look to the talk)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.