Download presentation
Presentation is loading. Please wait.
Published byJeffrey Leonard Modified over 9 years ago
1
James Annis Gabriele Garzoglio Peretz Partensky Chris Stoughton The Experimental Astrophysics Group, Fermilab The Terabyte Analysis Machine Project data intensive computing in astronomy
2
TAM Design n The TAM is a compact analysis cluster that is designed to n Bring compute power to bear on large datasets. n Make high I/O rate scans through large data sets n Compute power brought to large data sets (Hardware) n 10 processor Linux cluster n High memory (1 Gig RAM/node) and local disk (140 Gig/node) n SAN to Terabyte of global disk (FC network and HW RAID) n Global File System (9x faster read than NFS; HW limited) n High I/O rate scans (Software) n The SDSS science database SX n The Distance Machine framework
3
TAM Hardware Fast Ethernet Switch Gigabit Uplink To Slow Access Data Fibre Channel Switch 1 Terabyte of Global Disk 5 Compute Nodes 0.5 Terabyte of Local Disk IDE disk farms Enstore April 2001
4
The Terabyte Analysis Machine n System integrator u Linux NetworX u ACE cluster control box n Compute Nodes u Linux NetworX u Dual 600 MHz Pentium III u ASUS motherboard u 1 Gig RAM u 2x36 Gig EIDE disks u Qlogic 2100 HBA n Ethernet u Cisco Catalyst 2948G n Fibre Channel u Gadzoox Capellix 3000 n Global Disk u DotHill SanNet 4200 u Dual Fibre Channel controllers u 10x73 Gig Seagate Cheetah SCSI disk n Software u Linux 2.2.19 u Qlogic drivers u GFS V4.0 u Condor
5
GFS: The Global File System n Sistina Software (ex-University of Minnesota) Open source (GPL; now Sistina Public License) Linux and FreeBSD n 64-bit files and file system n Distributed, server-less metadata n Data synchronization via global, disk based locks n Journaling and node cast-out n Three major pieces: The network storage pool driver (on nodes) The file system (on disk) The locking modules (on disk or ip server)
6
GFS Performance Test setup 5 nodes 1 5-disk RAID Results RAID limited at 95 Mbytes/sec at >15 threads, disk head move limited Linear rate increase before hardware limits Circa 9x faster than NFS
7
Fibre Channel n Fibre Channel hardware has performed flawlessly, no maintence n Qlogic hardware bus adaptors (single channel) n Gadzoox Capellix Fibre Channel Switch One port per node Two ports per raid box n Dot Hill hardware raid system, with dual FC controllers n Hardware bus adaptor ~$800/node n Switches ~$1000/port n HBA shows up as a scsi device (/dev/sda) on Linux n HBA has driver code and firmware code Must down load driver code and compile into kernel n We haven't explored fibre channel's ability to connect machines over km’s.
8
Global File System I n We have never lost data to a GFS problem n Untuned GFS cleary outperformed untuned NFS n Linux kernel buffering an issue (“but I edited that file…”) n Sistina mailing list very responsive n Must patch Linux kernel (2.2.19 or 2.4.6) n Objectivity doesn’t behave on GFS. One must duplicate federation files for each machine that wants access. n GFS itself is on the complicated side, and is unfamiliar to sys- admins. n We haven't explored GFS machines as file servers.
9
Global File System II n Biggest issues are node death, power on, and IP locking GFS is a journaling FS; who replay journal? n STOMITH: shoot the other machine in the head Needs to be able to power cycle nodes so that other nodes can replay journal n Power on: /etc/rc.d/init.d scriptable We have never survived a power loss without human intervention at power up. n IP locking DMEP is the diskless protocol, very new; 2 vendor support Memexpd is an ip lock server –Allow any hardware for disk –Single point of failure and potential throughput bottleneck
10
Disk Area Arrangement n Data area on /GFS n Home area on /GFS n Standard products and UPS available from NFS n Normal user areas (desktops) available from NFS n /GFS area not NFS shared to desktops n Large amounts of local disk (ide disk) available on each node, not accesible from other nodes
11
The Data Volume n 15 Terabytes of corrected frames (2-d map) n 4 Terabytes of atlas images, binned sky and masks n 1 Terabyte of complex object catalogs u 120+ attributes u Radial profiles u Links to atlas image, spectra n 0.1 Terabyte of spectra catalogs (3-d map)
12
Summary n Astronomy faces a data avalanche n We are exploring datawolves, clusters aimed at large datasets. TAM is an University class * analysis datawolf. n Hardware u Server-less data sharing amongst cluster nodes n Software u The Distance Machine Framework u ANN algorithm library linked to SX u SX re-indexing and re-clustering optimally for ANN u Corba data serving * In Fermilab terms, “trailer class” A z=0.2 cluster of galaxies
13
Links http://www-tam.fnal.gov http://projects.fnal.gov/act/kdi.html http://sdss.fnal.gov:8000/~annis/astroCompute.html
14
The Sloan Digital Sky Survey n Dedicated telescope u Extremely wide field optics u Open-air, no dome design n The imaging camera u 138 Megapixels u 5 band passes u 2.5 degrees field of view n The twin spectrographs u 660 fibers u 3 degree field of view u high resolution (0.3 nm) u 400-900 nm
15
SDSS Science Aims SDSS Spring 2000Virgo Consortium Hubble Volume We will map the local universe (left) better than the numerical simulations (right) can predict.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.