Santa Fe 6/18/03 Timothy L. Thomas 2 “UCF” Computing Capabilities at UNM HPC Timothy L. Thomas UNM Dept of Physics and Astronomy.

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

4/2/2002HEP Globus Testing Request - Jae Yu x Participating in Globus Test-bed Activity for DØGrid UTA HEP group is playing a leading role in establishing.
Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Cooperative Computing for Data Intensive Science Douglas Thain University of Notre Dame NSF Bridges to Engineering 2020 Conference 12 March 2008.
Santa Fe 6/18/03 Timothy L. Thomas 1 “UCF” Computing Capabilities at UNM HPC Timothy L. Thomas UNM Dept of Physics and Astronomy.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Workload Management Massimo Sgaravatto INFN Padova.
The B A B AR G RID demonstrator Tim Adye, Roger Barlow, Alessandra Forti, Andrew McNab, David Smith What is BaBar? The BaBar detector is a High Energy.
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
Introduction to Grid Computing Ann Chervenak Carl Kesselman And the members of the Globus Team.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
What is Concurrent Programming? Maram Bani Younes.
Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.
Web Based Applications
Grappa: Grid access portal for physics applications Shava Smallen Extreme! Computing Laboratory Department of Physics Indiana University.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Grid Computing - AAU 14/ Grid Computing Josva Kleist Danish Center for Grid Computing
BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Guide to Linux Installation and Administration, 2e1 Chapter 2 Planning Your System.
Applications Requirements Working Group HENP Networking Meeting June 1-2, 2001 Participants Larry Price Steven Wallace (co-ch)
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Copyright © Clifford Neuman and Dongho Kim - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Advanced Operating Systems Lecture.
Block1 Wrapping Your Nugget Around Distributed Processing.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Using NMI Components in MGRID: A Campus Grid Infrastructure Andy Adamson Center for Information Technology Integration University of Michigan, USA.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
D0RACE: Testbed Session Lee Lueking D0 Remote Analysis Workshop February 12, 2002.
PHENIX and the data grid >400 collaborators Active on 3 continents + Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
Tools for collaboration How to share your duck tales…
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
Commodity Grid Kits Gregor von Laszewski (ANL), Keith Jackson (LBL) Many state-of-the-art scientific applications, such as climate modeling, astrophysics,
Authors: Ronnie Julio Cole David
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
PPDG update l We want to join PPDG l They want PHENIX to join NSF also wants this l Issue is to identify our goals/projects Ingredients: What we need/want.
CC-J Monthly Report Shin’ya Sawada (KEK) for CC-J Working Group
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Based upon slides from Jay Lepreau, Utah Emulab Introduction Shiv Kalyanaraman
PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
© Copyright AARNet Pty Ltd PRAGMA Update & some personal observations James Sankar Network Engineer - Middleware.
Grid Technologies for Distributed Database Services 3D Project Meeting CERN, May 19, 2005 A. Vaniachine (ANL)
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
Clouds , Grids and Clusters
U.S. ATLAS Grid Production Experience
Chapter 2: System Structures
Grid Computing.
Artem Trunov and EKP team EPK – Uni Karlsruhe
US CMS Testbed.
Grid Canada Testbed using HEP applications
Data Analysis in Particle Physics
Presentation transcript:

Santa Fe 6/18/03 Timothy L. Thomas 2 “UCF” Computing Capabilities at UNM HPC Timothy L. Thomas UNM Dept of Physics and Astronomy

Santa Fe 6/18/03 Timothy L. Thomas 3

Santa Fe 6/18/03 Timothy L. Thomas 6

Santa Fe 6/18/03 Timothy L. Thomas 7

Santa Fe 6/18/03 Timothy L. Thomas 8

Santa Fe 6/18/03 Timothy L. Thomas 9

Santa Fe 6/18/03 Timothy L. Thomas 11

I have a 200K SU (150K LL CPU hour) grant from the NRAC of the NSF/NCSA, with which UNM HPC (“AHPCC”) is affiliated.

Peripheral Data Vs Simulation Simulation: Muons From Central Hijing (QM02 Project07) Data: Centrality by Perp > 60 (Stolen from Andrew…)

Simulated Decay Muons QM’02 Project07 PISA files (Central HIJING) Closest cuts possible from PISA file to match data (P T parent >1 GeV/c, Theta P orig Parent ) Investigating possibility of keeping only muon and parent hits for reconstruction total events distributed over Z=±10, ±20, ±38 More events available but only a factor for smallest error bar Zeff ~75 cm "(IDPART==5 || IDPART==6) && IDPARENT >6 &&IDPARENT 155 && PTHE_PRI 2002 && PTOT_PRI*sin(PTHE_PRI*acos(0)/90.) > 1." Not in fit (Stolen from Andrew…)

Now at UNM HPC: PBS Globus 2.2.x Condor-G / Condor (GDMP) …all supported by HPC staff. In Progress: A new 1.2 TB RAID 5 disk server, to host: AFS cache  PHENIX software ARGO file catalog (PostgreSQL) Local Objectivity mirror Globus 2.2.x (GridFTP and more…)

Pre-QM2002 experience with globus-url-copy… Easily saturated UNM bandwidth limitations (as they were at that time) PKI infrastructure and sophisticated error-handling are a real bonus over bbftp. (One bug, known at the time is being / has been addressed.) (at left: 10 streams used) KB/sec

Multi-Jet cross section (theory) calculations, run using Condor(/PVM)… Three years of accumulated CPU time on desktop (MOU) machines at HPCERC and at the University of Wisconsin. Very CPU-intensive calculations… 6- and 9-dimensional Monte Carlo integrations: A typical job runs for a week and produces only about 100 KB of output histograms, such as those displayed here.

Santa Fe 6/18/03 Timothy L. Thomas 18 LLDIMU.HPC.UNM.EDU

Santa Fe 6/18/03 Timothy L. Thomas 19

Santa Fe 6/18/03 Timothy L. Thomas 20

Santa Fe 6/18/03 Timothy L. Thomas 21

Santa Fe 6/18/03 Timothy L. Thomas 22

Santa Fe 6/18/03 Timothy L. Thomas 23 RAID op system issues Easy re-installation / update of the op sys  Grub or Lilo? (MBR or /boot?)  Machine has an IDE CDROM (but not a burner)!!!  Rescue CDs and/or floppies…  Independence of RAID array o (1.5 hours for RAID 5 verification step) o Should install ext3 on the RAID.  Partitioning of the system disk: o Independence of /home area o Independence of /usr/local area? o Jonathan says: Linux can’t do more than 2 GB swap partition o Jonathan says: / /usr/local/ /home/ (me: /home1/ /home2/ …?)  NFS issues… o Synchronize UID/GIDs between RAID server and LL.

Santa Fe 6/18/03 Timothy L. Thomas 24 RAID op system issues Compilers and glibc…

Santa Fe 6/18/03 Timothy L. Thomas 25 RAID op system issues File systems…  What quotas?  Ext3? (Quotas working OK?)  ReiserFS? (Need special kernel modules for this?)

Santa Fe 6/18/03 Timothy L. Thomas 26 RAID op system issues Support for the following apps:  Raid software  Globus…  PHENIX application software  Objectivity o gcc  PostGress  Open AFS  Kerberos 4

Santa Fe 6/18/03 Timothy L. Thomas 27 RAID op system issues Security issues…  IP#: fixed or DHCP?  What services to run or avoid? o NFS…  Tripwire or equiv…  Kerberos (for Open AFS)…  Globus…  ipchains firewall rules; /etc/services; /etc/xinetd config; etc…

Santa Fe 6/18/03 Timothy L. Thomas 28 RAID op system issues Application-level issues…  Which framework? Both?  Who maintains framework and how can this job be properly divided up among locals?  SHOULD THE RAID ARRAY BE PARTITONED, a la the PHENIX counting house buffer boxes’ /a and /b file systems?

Resources Filtered event can be analyzed, but not ALL PRDF event Many trigger has overlap. Assume 90KByte/event and 0.1GByte/hour/CPU Signal TriggerLumi[nb^-1]#Event[M]Size[Gbyte]CPU[hour] 100CPU[day ] mu-mumue-mu ERT_electron MUIDN_1D_&BBCLL MUIDN_1D&MUIDS_1D&BBC LL MUIDN_1D1S&BBCL MUIDN_1D1S&NTCN MUIDS_1D&BBCLL MUIDS_1D1S&BBCLL MUIDS_1D1S&NTCS ALL PRDF ,000330,

Rough calculation of real-data processing (I/O-intensive) capabilities: 10 M events, PRDF-to-{DST+x}, both mut & mutoo; assume 3 sec/event (*1.3 for LL), 200  200 KB/event.  One pass: 7 days on 50 CPUs (25 boxes), using 56% of LL local network capacity.  My 200K “SU” (~150K LL CPU hours) allocation allows for 18 of these passes (4.2 months)  3 MB/sec Internet2 connection = 1.6 TB / 12 nights (MUIDN_1D1S&NTCN) (Presently) LL is most effective for CPU-intensive tasks: simulations can easily fill the 512 CPUs; e.g, QM02 Project 07. Caveats: “LLDIMU” is a front-end machine; LL worker node environment is different from CAS/RCS node (  P.Power…)

Santa Fe 6/18/03 Timothy L. Thomas 31

Santa Fe 6/18/03 Timothy L. Thomas 32

Santa Fe 6/18/03 Timothy L. Thomas 33 On UNM Grid activities T.L.Thomas

I have a 200K SU (150K LL CPU hour) grant from the NRAC of the NSF/NCSA, with which UNM HPC (“AHPCC”) is affiliated.

. CPU time used: ~ 33,000 LosLobos hours. Number of files handled: > 2200 files. Data moved to BNL: > 0.5 TB (globus-url-copy) (NOTE: In 2001, did even more (~110,000 hours), as an as an exercise… see ). Comments: [from late summer... but still relevant]. Global storage and I/O (disk, network, network) management a headache; too human intensive. --> Throwing more people at the problem (i.e., giving people accounts at more remote sites) is not a particularly efficient way to solve this problem.. File naming standard essential (esp. for data base issues.). I have assembled a (still rough; not included here) standard request form for DETAILED information... --> This could be turned into an automatic interface... A PORTAL (to buzz). PWG contacts need to assemble as detailed a plan as they can, but without the kinds of system details that are probably going to be changed anyway. (e.g., "chunk" size hints welcome but may be ignored.). Use of varied facilities requires flexibility, including an "ATM" approach --> Simulation database needs to reflect this complexity.

. Generator config / management needs to be somewhat more sophisticated. --> E.g., random seeds, "back-end" generation.. An big issue (that others may understand better): the relationship and interface between the simulation data base and other PHENIX data bases.... Multiple levels of logs actually helped bookkeeping! --> Perhaps 'pseudo-parallelism' is the way to go.. Emerging reality (one of the main motivations "Grid" technology): no one has enough computing when it's needed but everyone has too much when they don't need it, which is much of the time. More than enough computing to get the work done is out there; you don't need your own! BUT: they they are "out there", and this must be dealt with. ==> PHENIX can and should form its own IntraGrid.

Reality Check #1: Perpetual computing person-power shortage; this pertains to both software production and data production, both real and M.C. Given that, M.C. is presently way too much work. Simple Vision: Transparently distributed processing should allow us to optimize our use of production computing person- power. Observed and projected massive increases in network bandwidth make this a not-so-crazy idea. Reality Check #2: What? Distributed real-data reco? Get Real! (...?) Fairly Simple Vision: OK, OK: Implement Simple Vision for M.C. first, see how that goes. If one can process M.C., then one is perhaps 75% of the way to processing real data. (Objy write-back problem is one serious catch.)

Santa Fe 6/18/03 Timothy L. Thomas 38 (The following slides are from a presentation that I was invited to give to the UNM-led multi-institutional “Internet 2 Day” this past March…)

Santa Fe 6/18/03 Timothy L. Thomas 39 Internet 2 and the Grid The Future of Computing for Big Science at UNM Timothy L. Thomas UNM Dept of Physics and Astronomy

Santa Fe 6/18/03 Timothy L. Thomas 40 Grokking The Grid Grok v. To perceive a subject so deeply that one no longer knows it, but rather understands it on a fundamental level. Coined by Robert Heinlein in his 1961 novel, Stranger in a Strange Land. (Quotes from a colleague of mine…) Feb 2002: “This grid stuff is garbage.” Dec 2002: “Hey, these grid visionaries are serious!”

Santa Fe 6/18/03 Timothy L. Thomas 41 So what is a “Grid”?

Santa Fe 6/18/03 Timothy L. Thomas 44 Ensemble of distributed resources acting together to solve a problem: ”The Grid is about collaboration, about people working together.” Linking people, computing resources, and sensors / instruments  Idea is decades old, but enabling technologies are recent.  Capacity distributed throughout an infrastructure Aspects of Grid computing:  Pervasive  Consistent  Dependable  Inexpensive

Santa Fe 6/18/03 Timothy L. Thomas 45 Virtual Organizations (VOs)  Security implications Ian Foster’s Three Requirements:  VOs that span multiple administrative domains  Participant services based on open standards  Delivery of serious Quality of Service

Santa Fe 6/18/03 Timothy L. Thomas 46 High Energy Physics Grids GriPhyN (NSF)  CS research focusing on virtual data, request planning  Virtual Data Toolkit: Delivery vehicle for GriPhyN products iVDGL: International Virtual Data Grid Laboratory (NSF)  A testbed for large-scale deployment and validation Particle Physics Data Grid (DOE)  Grid-enabling six High-Energy/Nuclear Physics experiments EU Data Grid (EDG): Applications areas…  Particle physics  Earth and planetary sciences: "Earth Observation“  “Biology” GLUE: Grid Laboratory Uniform Environment  Link from US grids to EDG grids

Santa Fe 6/18/03 Timothy L. Thomas 47 >> (“Grids: Grease or Glue?”)

Santa Fe 6/18/03 Timothy L. Thomas 48 Natural Grid Applications High-energy elementary particle and Nuclear Physics (HENP) Distributed image processing  Astronomy…  Biological/biomedical research; e.g., pathology…  Earth and Planetary Sciences  Military applications; e.g., space surveillance Engineering simulations  NEES Grid Distributed event simulations  Military applications; e.g., SF Express  Medicine: distributed, immersive patient simulations  Project Touch  Biology: complete cell simulations…

Santa Fe 6/18/03 Timothy L. Thomas 49 Processing requirements Two examples Example 1: High-energy Nuclear Physics  10’s of petabytes of data per year  10’s of teraflops of distributed CPU power o Comparable to today’s largest supercomputers…

Biological Databases: Complex interdependencies GenBank Swissprot TRRD GERD Transfac EpoDB EMBL DDBJ flow of data BEAD GAIA Domino-effect in data publishing Efficiently keep many versions Swissprot EpoDB GAIA BEAD Transfac (Yong Zhao, University of Chicago)

Data Mining Example

Santa Fe 6/18/03 Timothy L. Thomas 52 …and the role of Internet 2. It is clear that advanced networking will play a critical role in the development of an intergrid and its eventual evolution into The Grid…  Broadband capacity  Advanced networking protocols  Well-defined, finely graded, clearly-costed high Qualities of Service

Connectivity of the web: one can pass from any node of IN through SCC to any node of OUT. Hanging off IN and OUT are TENDRILS containing nodes that are reachable from portions of IN, or that can reach portions of OUT, without passage through SCC. It is possible for a TENDRIL hanging off from IN to be hooked into a TENDRIL leading into OUT, forming a TUBE -- a passage from a portion of IN to a portion of OUT without touching SCC.

Santa Fe 6/18/03 Timothy L. Thomas 56 …In other words: barely predictable But no doubt inevitable, disruptive, transformative, …and very exciting!