GlobusWorld 2012: Experience with EXPERIENCE WITH GLOBUS ONLINE AT FERMILAB Gabriele Garzoglio Computing Sector Fermi National Accelerator.

Slides:



Advertisements
Similar presentations
High Throughput Data Program at Fermilab R&D Parag Mhashilkar Grid and Cloud Computing Department Computing Sector, Fermilab Network Planning for ESnet/Internet2/OSG.
Advertisements

Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
Fermilab, the Grid & Cloud Computing Department and the KISTI Collaboration GSDC - KISTI Workshop Jangsan-ri, Nov 4, 2011 Gabriele Garzoglio Grid & Cloud.
Fermilab E = Mc 2 Opening Windows on the World Young-Kee Kim Fermilab and the University of Chicago June 1, 2010.
Introduction The Open Science Grid (OSG) is a consortium of more than 100 institutions including universities, national laboratories, and computing centers.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
The LHC Computing Grid Project Tomi Kauppi Timo Larjo.
Integrated Scientific Workflow Management for the Emulab Network Testbed Eric Eide, Leigh Stoller, Tim Stack, Juliana Freire, and Jay Lepreau and Jay Lepreau.
Energy management at FERMILAB: strategy on energy management, efficiency, sustainability Stephen Krstulovich, energy manager.
Minerva Infrastructure Meeting – October 04, 2011.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
SCD FIFE Workshop - GlideinWMS Overview GlideinWMS Overview FIFE Workshop (June 04, 2013) - Parag Mhashilkar Why GlideinWMS? GlideinWMS Architecture Summary.
GlobusWorld 2012: Experience with EXPERIENCE WITH GLOBUS ONLINE AT FERMILAB Gabriele Garzoglio Computing Sector Fermi National Accelerator.
OSG Grid Workshop in KNUST, Kumasi, Ghana August 6-8, 2012 following the AFRICAN SCHOOL OF FUNDAMENTAL PHYSICS AND ITS APPLICATIONS July 15-Aug 04, 2012.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
Physics Steven Gottlieb, NCSA/Indiana University Lattice QCD: focus on one area I understand well. A central aim of calculations using lattice QCD is to.
100G R&D at Fermilab Gabriele Garzoglio (for the High Throughput Data Program team) Grid and Cloud Computing Department Computing Sector, Fermilab Overview.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Impacts of Tevatron Extension Pier Oddone P5 Meeting, October 15 th, P. Oddone, P5 Meeting, October 15, 2010.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
GlobusWorld 2012: Experience with EXPERIENCE WITH GLOBUS ONLINE AT FERMILAB Gabriele Garzoglio Computing Sector Fermi National Accelerator.
Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
International Workshop on HEP Data Grid Nov 9, 2002, KNU Data Storage, Network, Handling, and Clustering in CDF Korea group Intae Yu*, Junghyun Kim, Ilsung.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Fermi National Accelerator Laboratory SC2006 Fermilab Data Movement & Storage Multi-Petabyte tertiary automated tape store for world- wide HEP and other.
The Grid & Cloud Computing Department at Fermilab and the KISTI Collaboration Meeting with KISTI Nov 1, 2011 Gabriele Garzoglio Grid & Cloud Computing.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Spectrum of Support for Data Movement and Analysis in Big Data Science Network Management and Control E-Center & ESCPS Network Management and Control E-Center.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Summary Comments and Discussion Pier Oddone 40 th Anniversary Users’ Meeting June 8, 2007.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Storage and Data Movement at FNAL D. Petravick CHEP 2003.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
Computing Sector, Fermi National Accelerator Laboratory 4/12/12GlobusWorld 2012: Experience with
Fermilab: Present and Future Young-Kee Kim Data Preservation Workshop May 16, 2011.
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
Victoria A. White Head, Computing Division, Fermilab Fermilab Grid Computing – CDF, D0 and more..
Nigel Lockyer Fermilab Operations Review 16 th -18 th May 2016 Fermilab in the Context of the DOE Mission.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.
Nigel Lockyer Fermilab Operations Review 16 th -18 th May 2016 Fermilab in the Context of the DOE Mission.
1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.
Fabric for Frontier Experiments at Fermilab Gabriele Garzoglio Grid and Cloud Services Department, Scientific Computing Division, Fermilab ISGC – Thu,
Fermilab: Introduction Young-Kee Kim DOE KA12 (Electron Research)Review June 22-23, 2010.
Hao Wu, Shangping Ren, Gabriele Garzoglio, Steven Timm, Gerard Bernabeu, Hyun Woo Kim, Keith Chadwick, Seo-Young Noh A Reference Model for Virtual Machine.
Parag Mhashilkar (Fermi National Accelerator Laboratory)
100G R&D at Fermilab Gabriele Garzoglio (for the High Throughput Data Program team) Grid and Cloud Computing Department Computing Sector, Fermilab Overview.
Big Data over a 100G Network at Fermilab Gabriele Garzoglio Grid and Cloud Services Department Computing Sector, Fermilab CHEP 2013 – Oct 15, 2013 Overview.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
Particle Physics Sector Young-Kee Kim / Greg Bock Leadership Team Strategic Planning Winter Workshop January 29, 2013.
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
Chapter 1 Characterization of Distributed Systems
Electron Ion Collider New aspects of EIC experiment instrumentation and computing, as well as their possible impact on and context in society (B) COMPUTING.
LHC DATA ANALYSIS INFN (LNL – PADOVA)
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
LHC Data Analysis using a worldwide computing grid
Presentation transcript:

GlobusWorld 2012: Experience with EXPERIENCE WITH GLOBUS ONLINE AT FERMILAB Gabriele Garzoglio Computing Sector Fermi National Accelerator Laboratory 1 Thu Apr 12, 2012

GlobusWorld 2012: Experience with Why am I here? The Fermi National Accelerator Laboratory serves a large number of communities in High Energy Physics  Through the partnership with the Open Science Grid Consortium, Fermilab collaborates with communities from a variety of scientific fields It is in the mission of the Scientific Computing at Fermilab (and my department - Grid and Cloud Computing) to support their computations on highly distributed resources We cannot (…and are not interested in…) reinventing all services necessary to do this To achieve our mission, we create a network of partnerships with international groups of computer scientists These partnerships make us and our partners stronger because of our users Disclaimer: this talk only covers our work on GO 2

GlobusWorld 2012: Experience with Fermi National Accelerator Laboratory 3

GlobusWorld 2012: Experience with Fermilab is evolving to match the needs of future science Minos MinervaDESMicroBoone NOvA Illinois Accelerator Research Center

GlobusWorld 2012: Experience with It’s about the science! The D0 and CDF examples:  CDF & D0 - Top Quark Asymmetry Results.  CDF - Discovery of Ξ b 0.  CDF - Λ c (2595) baryon.  Combined CDF & D0 Limits on standard model higgs mass. 5 CDF D0 Hundreds of publications per year! Data recorded over O(10) yrs: CDF = 8.63 PB; Dzero 7.54 PB. Now we expect 5+ PB every year for LHC Fermilab will support analysis for 5+ years and data access for 10+ years. Fermilab Collaborates with CERN on the LHC CDF & D0 CMS (LHC)

GlobusWorld 2012: Experience with

Roles: energy frontier From P. Oddone, Secretary of Energy’s Visit, June 2nd, Tevatron LHC ILC, CLIC or Muon Collider Now 2016 LHC Upgrades ILC??

GlobusWorld 2012: Experience with Energy frontier: the legendary Tevatron 8 Accelerator Innovations First major SC synchrotron Industrial production of SC cable (MRI) Electron cooling New RF manipulation techniques Detector innovations Silicon vertex detectors in hadron environment LAr-U238 hadron calorimetry Advanced triggering Analysis Innovations Data mining from Petabytes of data Use of neural networks, boosted decision trees Major impact on LHC planning and developing GRID pioneers Major discoveries Top quark B s mixing Precision W and Top mass  Higgs mass prediction Direct Higgs searches Ruled out many exotica The next generation Fantastic training ground for next generation More than 500 Ph.D.s Produced critical personnel for the next steps, especially LHC From P. Oddone, Secretary of Energy’s Visit, June 2nd, 2011

GlobusWorld 2012: Experience with Roles: cosmic frontier 9 Now DM: ~10 kg DE: SDSS P. Auger DM: ~100 kg DE: DES P. Auger Holometer? DM: ~1 ton DE: LSST WFIRST?? BigBOSS?? DE: LSST WFIRST?? 2022 From P. Oddone, Secretary of Energy’s Visit, June 2nd, 2011

GlobusWorld 2012: Experience with Roles: intensity frontier 10 MINOS MiniBooNE MINERvA SeaQuest NOvA MicroBooNE g-2 MINERvA MINOS SeaQuest Now 2016 NOvA g-2 LBNE Mu2e Project X+LBNE , K, nuclear, … Factory ?? From P. Oddone, Secretary of Energy’s Visit, June 2nd, 2011

GlobusWorld 2012: Experience with Work done to date on GO Integration of Workload Management and Data Movement Systems with GO 1. Center for Enabling Distributed Petascale Science (CEDPS): GO integration with glideinWMS 2. Data Handling prototype for Dark Energy Survey (DES) Performance tests of GO over 100 Gpbs networks 3. GO on the Advanced Network Initiative (ANI) testbed Data Movement on OSG for end users 4. Network for Earthquake Engineering Simulation (NEES) 11 Communities potentially interested in follow ups with GO Intensity Frontier, QCD, Accelerator modeling, Computational Cosmology, et al.

GlobusWorld 2012: Experience with 1. CEDPS: Integrating glideinWMS with GO Goals:  Middleware handle data movement, rather than the application  Middleware optimize use of computing resources (CPU does not block on data movement) Users provide data movement directives in the Job Description File (e.g. storage services for IO) glideinWMS procures resources on the Grid and run jobs using Condor Data movement is delegated to the underlying Condor system globusconnect is instantiated and GO plug-in is invoked using the directives in the Job Description File Condor optimizes resources 12 Parag Mhashilkar, Fermilab; Condor Team, U. Wisconsin Madison; GO team, ANL CMS ~50K jobs pool CMS ~400K jobs Required Scale

GlobusWorld 2012: Experience with Validation Test Results Intensity Frontier (Minerva) jobs transfer output sandbox to GO endpoints  Jobs: 2636 (500 running at a time)  Total files transferred: Up to 500 dynamically created GO endpoints at a given time. 95% transfer success rate. Stressing scalability of GO in new way Main scalability issue identified: Endpoint management not sufficiently responsive at this scale GO team working to increase scalability by  reusing GO endpoints and  transferring multiple files at once. 13

GlobusWorld 2012: Experience with 2. Prototype integration of GO with DES Data Access Framework Motivation  The Dark Energy Survey is an experiment on the cosmological frontier at Fermilab. Getting ready for data taking… DES Data Access Framework (DAF) uses a network of GridFTP servers to reliably move data across sites. 1. DAF data transfer parameters were not optimal for both small and large files. 2. Reliability was implemented inefficiently by sequentially verifying real file size with DB catalogue. Tested DAF moving 31,000 files (184 GB) with GO vs. UberFTP  Time for Transfer + Verification is the same (~100 min)  Transfer time is 27% faster with GO than with UberFTP  Verification time is 50% slower with GO than sequentially with UberFTP Proposed Improvements:  Allow specification of src / dest transfer reliability semantics (e.g. same size, same CRC, etc.) – Implemented for size  Allow finer-grain failure model (e.g. specify number of transfer retries)  Provide interface for efficient (pipelined) ls of src / dest files. 14 See Don Petravick’s talk on Wed

GlobusWorld 2012: Experience with 3. GO on the ANI Testbed Work by Dave Dykstra w/ contrib. by Raman Verma & Gabriele Garzoglio 15 Motivation: Testing Grid middleware readiness to interface 100 Gbits links on the Advanced Network Initiative (ANI) Testbed. Characteristics: 300GB of data split into files (8KB – 8GB) (small, medium, large, all sizes) Network: aggregate 3 x 10Gbit/s to bnl-1 test machine Results GO (yellow) does almost as well as direct GridFTP practical max (red) for medium-size files. Increasing concurrency and pipelining on small files improves throughput by 30%. GO auto-tuning works better for medium files than for the large files

GlobusWorld 2012: Experience with 4. Data Movement on OSG for NEES Motivation  supporting NEES group at UCSD to run parametric studies of nonlinear models of structure systems on the Open Science Grid NEES scientist ran at 20 OSG sites, then moved 12 TB from the RENCI submission server to the user’s desktop at UCSD using GO Note: there is still no substitute for a good network administrator  Initially, we had 5 Mbps  eventually 200 Mbps (over 600 Mbps link).  Improvements: Upgrade eth card on user desktop; Migrate from Windows to Linux; Work with the user to use GO; Find a good net admin to find and fix broken fiber at RENCI, when nothing else worked. Better use of GO on OSG: Integrate GO with the Storage Resource Broker (SRM) A. R. Barbosa, J. P. Conte, J. I. Restrepo, UCSD 30 days on OSG vs. 12 yrs on Desktop 16

GlobusWorld 2012: Experience with Sample future work and ideas (1) CortexNet: Integration of network intelligence with data management middleware How do I select the highest-throughput source of a data replica considering the network conditions ? How do I integrate GO with CortexNet? There are interesting algorithmic challenges: if you are a network expert with background on AI or statistics, let’s talk! 17

GlobusWorld 2012: Experience with Sample future work and ideas (2) VCluster: Transparent access to dynamically instantiated resources through standard Grid interfaces Collaborating with KISTI, South Korea, to dynamically instantiate OSG-compatible Grid Clusters from Cloud resources Focusing now on idle VM detection and monitoring to automate Grid cluster expansion Can I use GO to distribute the VM images to the collaborating Cloud resources? 18 Work by Seo-Young Noh and the FermiCloud team

GlobusWorld 2012: Experience with Summary Fermilab developers and computer scientists support distributed large-data intensive computing for HEP and other scientific communities via OSG. We collaborate closely with a wide variety of external research and development groups. We have worked with the GO team and improved the system:  Stressed the “many-globusconnect” dimension.  New requirements on reliability semantics.  Auto-tuning at extreme scale.  Usability. Fermilab science has many needs and ideas that are future opportunities for collaboration. 19

GlobusWorld 2012: Experience with References 1. CEDPS Report: GO Stress Test Analysis  bin/RetrieveFile?docid=4474;filename=GlobusOnline% 20PluginAnalysisReport.pdf;version=1 bin/RetrieveFile?docid=4474;filename=GlobusOnline% 20PluginAnalysisReport.pdf;version=1 2. DES DAF Integration with GO  /DESIntegrationWithGlobusonline /DESIntegrationWithGlobusonline 3. GridFTP & GO on the ANI Testbed  kUt5ico01vXcFsgyIGZH5pqbbGeI7t8/edit?hl=en_US& pli=1 kUt5ico01vXcFsgyIGZH5pqbbGeI7t8/edit?hl=en_US& pli=1 4. OSG User Support of NEES  /EngageOpenSeesProductionDemo /EngageOpenSeesProductionDemo 20