Download presentation
Presentation is loading. Please wait.
2
Santa Fe 6/18/03 Timothy L. Thomas 2 “UCF” Computing Capabilities at UNM HPC Timothy L. Thomas UNM Dept of Physics and Astronomy
3
Santa Fe 6/18/03 Timothy L. Thomas 3
6
Santa Fe 6/18/03 Timothy L. Thomas 6
7
Santa Fe 6/18/03 Timothy L. Thomas 7
8
Santa Fe 6/18/03 Timothy L. Thomas 8
9
Santa Fe 6/18/03 Timothy L. Thomas 9
11
Santa Fe 6/18/03 Timothy L. Thomas 11
12
I have a 200K SU (150K LL CPU hour) grant from the NRAC of the NSF/NCSA, with which UNM HPC (“AHPCC”) is affiliated.
13
Peripheral Data Vs Simulation Simulation: Muons From Central Hijing (QM02 Project07) Data: Centrality by Perp > 60 (Stolen from Andrew…)
14
Simulated Decay Muons QM’02 Project07 PISA files (Central HIJING) Closest cuts possible from PISA file to match data (P T parent >1 GeV/c, Theta P orig Parent 155-161) Investigating possibility of keeping only muon and parent hits for reconstruction. 17100 total events distributed over Z=±10, ±20, ±38 More events available but only a factor for smallest error bar Zeff ~75 cm "(IDPART==5 || IDPART==6) && IDPARENT >6 &&IDPARENT 155 && PTHE_PRI 2002 && PTOT_PRI*sin(PTHE_PRI*acos(0)/90.) > 1." Not in fit (Stolen from Andrew…)
15
Now at UNM HPC: PBS Globus 2.2.x Condor-G / Condor (GDMP) …all supported by HPC staff. In Progress: A new 1.2 TB RAID 5 disk server, to host: AFS cache PHENIX software ARGO file catalog (PostgreSQL) Local Objectivity mirror Globus 2.2.x (GridFTP and more…)
16
Pre-QM2002 experience with globus-url-copy… Easily saturated UNM bandwidth limitations (as they were at that time) PKI infrastructure and sophisticated error-handling are a real bonus over bbftp. (One bug, known at the time is being / has been addressed.) (at left: 10 streams used) KB/sec
17
Multi-Jet cross section (theory) calculations, run using Condor(/PVM)… Three years of accumulated CPU time on desktop (MOU) machines at HPCERC and at the University of Wisconsin. Very CPU-intensive calculations… 6- and 9-dimensional Monte Carlo integrations: A typical job runs for a week and produces only about 100 KB of output histograms, such as those displayed here.
18
Santa Fe 6/18/03 Timothy L. Thomas 18 LLDIMU.HPC.UNM.EDU
19
Santa Fe 6/18/03 Timothy L. Thomas 19
20
Santa Fe 6/18/03 Timothy L. Thomas 20
21
Santa Fe 6/18/03 Timothy L. Thomas 21
22
Santa Fe 6/18/03 Timothy L. Thomas 22
23
Santa Fe 6/18/03 Timothy L. Thomas 23 RAID op system issues Easy re-installation / update of the op sys Grub or Lilo? (MBR or /boot?) Machine has an IDE CDROM (but not a burner)!!! Rescue CDs and/or floppies… Independence of RAID array o (1.5 hours for RAID 5 verification step) o Should install ext3 on the RAID. Partitioning of the system disk: o Independence of /home area o Independence of /usr/local area? o Jonathan says: Linux can’t do more than 2 GB swap partition o Jonathan says: / /usr/local/ /home/ (me: /home1/ /home2/ …?) NFS issues… o Synchronize UID/GIDs between RAID server and LL.
24
Santa Fe 6/18/03 Timothy L. Thomas 24 RAID op system issues Compilers and glibc…
25
Santa Fe 6/18/03 Timothy L. Thomas 25 RAID op system issues File systems… What quotas? Ext3? (Quotas working OK?) ReiserFS? (Need special kernel modules for this?)
26
Santa Fe 6/18/03 Timothy L. Thomas 26 RAID op system issues Support for the following apps: Raid software Globus… PHENIX application software Objectivity o gcc 2.95.3 PostGress Open AFS Kerberos 4
27
Santa Fe 6/18/03 Timothy L. Thomas 27 RAID op system issues Security issues… IP#: fixed or DHCP? What services to run or avoid? o NFS… Tripwire or equiv… Kerberos (for Open AFS)… Globus… ipchains firewall rules; /etc/services; /etc/xinetd config; etc…
28
Santa Fe 6/18/03 Timothy L. Thomas 28 RAID op system issues Application-level issues… Which framework? Both? Who maintains framework and how can this job be properly divided up among locals? SHOULD THE RAID ARRAY BE PARTITONED, a la the PHENIX counting house buffer boxes’ /a and /b file systems?
29
Resources Filtered event can be analyzed, but not ALL PRDF event Many trigger has overlap. Assume 90KByte/event and 0.1GByte/hour/CPU Signal TriggerLumi[nb^-1]#Event[M]Size[Gbyte]CPU[hour] 100CPU[day ] mu-mumue-mu ERT_electron19313.01170117004.9 1 MUIDN_1D_&BBCLL123834.030603060012.8111 MUIDN_1D&MUIDS_1D&BBC LL1 590.2181800.11 MUIDN_1D1S&BBCL12544.843243201.8111 MUIDN_1D1S&NTCN23018.01620162006.81 MUIDS_1D&BBCLL127410.796396304.0111 MUIDS_1D1S&BBCLL12931.311711700.51 MUIDS_1D1S&NTCS2785.045045001.91 ALL PRDF3506600.033,000330,000137.5
30
Rough calculation of real-data processing (I/O-intensive) capabilities: 10 M events, PRDF-to-{DST+x}, both mut & mutoo; assume 3 sec/event (*1.3 for LL), 200 200 KB/event. One pass: 7 days on 50 CPUs (25 boxes), using 56% of LL local network capacity. My 200K “SU” (~150K LL CPU hours) allocation allows for 18 of these passes (4.2 months) 3 MB/sec Internet2 connection = 1.6 TB / 12 nights (MUIDN_1D1S&NTCN) (Presently) LL is most effective for CPU-intensive tasks: simulations can easily fill the 512 CPUs; e.g, QM02 Project 07. Caveats: “LLDIMU” is a front-end machine; LL worker node environment is different from CAS/RCS node ( P.Power…)
31
Santa Fe 6/18/03 Timothy L. Thomas 31
32
Santa Fe 6/18/03 Timothy L. Thomas 32
33
Santa Fe 6/18/03 Timothy L. Thomas 33 On UNM Grid activities T.L.Thomas
34
I have a 200K SU (150K LL CPU hour) grant from the NRAC of the NSF/NCSA, with which UNM HPC (“AHPCC”) is affiliated.
35
. CPU time used: ~ 33,000 LosLobos hours. Number of files handled: > 2200 files. Data moved to BNL: > 0.5 TB (globus-url-copy) (NOTE: In 2001, did even more (~110,000 hours), as an as an exercise… see http://thomas.phys.unm.edu/tlt/phenix_simulations/ ). Comments: [from late summer... but still relevant]. Global storage and I/O (disk, network, network) management a headache; too human intensive. --> Throwing more people at the problem (i.e., giving people accounts at more remote sites) is not a particularly efficient way to solve this problem.. File naming standard essential (esp. for data base issues.). I have assembled a (still rough; not included here) standard request form for DETAILED information... --> This could be turned into an automatic interface... A PORTAL (to buzz). PWG contacts need to assemble as detailed a plan as they can, but without the kinds of system details that are probably going to be changed anyway. (e.g., "chunk" size hints welcome but may be ignored.). Use of varied facilities requires flexibility, including an "ATM" approach --> Simulation database needs to reflect this complexity.
36
. Generator config / management needs to be somewhat more sophisticated. --> E.g., random seeds, "back-end" generation.. An big issue (that others may understand better): the relationship and interface between the simulation data base and other PHENIX data bases.... Multiple levels of logs actually helped bookkeeping! --> Perhaps 'pseudo-parallelism' is the way to go.. Emerging reality (one of the main motivations "Grid" technology): no one has enough computing when it's needed but everyone has too much when they don't need it, which is much of the time. More than enough computing to get the work done is out there; you don't need your own! BUT: they they are "out there", and this must be dealt with. ==> PHENIX can and should form its own IntraGrid.
37
Reality Check #1: Perpetual computing person-power shortage; this pertains to both software production and data production, both real and M.C. Given that, M.C. is presently way too much work. Simple Vision: Transparently distributed processing should allow us to optimize our use of production computing person- power. Observed and projected massive increases in network bandwidth make this a not-so-crazy idea. Reality Check #2: What? Distributed real-data reco? Get Real! (...?) Fairly Simple Vision: OK, OK: Implement Simple Vision for M.C. first, see how that goes. If one can process M.C., then one is perhaps 75% of the way to processing real data. (Objy write-back problem is one serious catch.)
38
Santa Fe 6/18/03 Timothy L. Thomas 38 (The following slides are from a presentation that I was invited to give to the UNM-led multi-institutional “Internet 2 Day” this past March…)
39
Santa Fe 6/18/03 Timothy L. Thomas 39 Internet 2 and the Grid The Future of Computing for Big Science at UNM Timothy L. Thomas UNM Dept of Physics and Astronomy
40
Santa Fe 6/18/03 Timothy L. Thomas 40 Grokking The Grid Grok v. To perceive a subject so deeply that one no longer knows it, but rather understands it on a fundamental level. Coined by Robert Heinlein in his 1961 novel, Stranger in a Strange Land. (Quotes from a colleague of mine…) Feb 2002: “This grid stuff is garbage.” Dec 2002: “Hey, these grid visionaries are serious!”
41
Santa Fe 6/18/03 Timothy L. Thomas 41 So what is a “Grid”?
44
Santa Fe 6/18/03 Timothy L. Thomas 44 Ensemble of distributed resources acting together to solve a problem: ”The Grid is about collaboration, about people working together.” Linking people, computing resources, and sensors / instruments Idea is decades old, but enabling technologies are recent. Capacity distributed throughout an infrastructure Aspects of Grid computing: Pervasive Consistent Dependable Inexpensive
45
Santa Fe 6/18/03 Timothy L. Thomas 45 Virtual Organizations (VOs) Security implications Ian Foster’s Three Requirements: VOs that span multiple administrative domains Participant services based on open standards Delivery of serious Quality of Service
46
Santa Fe 6/18/03 Timothy L. Thomas 46 High Energy Physics Grids GriPhyN (NSF) CS research focusing on virtual data, request planning Virtual Data Toolkit: Delivery vehicle for GriPhyN products iVDGL: International Virtual Data Grid Laboratory (NSF) A testbed for large-scale deployment and validation Particle Physics Data Grid (DOE) Grid-enabling six High-Energy/Nuclear Physics experiments EU Data Grid (EDG): Applications areas… Particle physics Earth and planetary sciences: "Earth Observation“ “Biology” GLUE: Grid Laboratory Uniform Environment Link from US grids to EDG grids
47
Santa Fe 6/18/03 Timothy L. Thomas 47 >> (“Grids: Grease or Glue?”)
48
Santa Fe 6/18/03 Timothy L. Thomas 48 Natural Grid Applications High-energy elementary particle and Nuclear Physics (HENP) Distributed image processing Astronomy… Biological/biomedical research; e.g., pathology… Earth and Planetary Sciences Military applications; e.g., space surveillance Engineering simulations NEES Grid Distributed event simulations Military applications; e.g., SF Express Medicine: distributed, immersive patient simulations Project Touch Biology: complete cell simulations…
49
Santa Fe 6/18/03 Timothy L. Thomas 49 Processing requirements Two examples Example 1: High-energy Nuclear Physics 10’s of petabytes of data per year 10’s of teraflops of distributed CPU power o Comparable to today’s largest supercomputers…
50
Biological Databases: Complex interdependencies GenBank Swissprot TRRD GERD Transfac EpoDB EMBL DDBJ flow of data BEAD GAIA Domino-effect in data publishing Efficiently keep many versions Swissprot EpoDB GAIA BEAD Transfac (Yong Zhao, University of Chicago)
51
Data Mining Example
52
Santa Fe 6/18/03 Timothy L. Thomas 52 …and the role of Internet 2. It is clear that advanced networking will play a critical role in the development of an intergrid and its eventual evolution into The Grid… Broadband capacity Advanced networking protocols Well-defined, finely graded, clearly-costed high Qualities of Service
55
Connectivity of the web: one can pass from any node of IN through SCC to any node of OUT. Hanging off IN and OUT are TENDRILS containing nodes that are reachable from portions of IN, or that can reach portions of OUT, without passage through SCC. It is possible for a TENDRIL hanging off from IN to be hooked into a TENDRIL leading into OUT, forming a TUBE -- a passage from a portion of IN to a portion of OUT without touching SCC.
56
Santa Fe 6/18/03 Timothy L. Thomas 56 …In other words: barely predictable But no doubt inevitable, disruptive, transformative, …and very exciting!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.