US Planck Data Analysis Review 1 Julian BorrillUS Planck Data Analysis Review 9–10 May 2006 Computing Facilities & Capabilities Julian Borrill Computational.

Slides:



Advertisements
Similar presentations
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Introduction to Computer Administration Introduction.
Advertisements

LCLS Data Systems Amedeo Perazzo SLAC HSF Workshop, January 20 th 2015.
Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
ASKAP Central Processor: Design and Implementation Calibration and Imaging Workshop 2014 ASTRONOMY AND SPACE SCIENCE Ben Humphreys | ASKAP Software and.
File Management Systems
Title US-CMS User Facilities Vivian O’Dell US CMS Physics Meeting May 18, 2001.
Kian-Tat Lim Offline Computing November 12 th, LCLS Offline Data Management.
 Contents 1.Introduction about operating system. 2. What is 32 bit and 64 bit operating system. 3. File systems. 4. Minimum requirement for Windows 7.
Prepared by Careene McCallum-Rodney Hardware specification of a computer system.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
J OINT I NSTITUTE FOR N UCLEAR R ESEARCH OFF-LINE DATA PROCESSING GRID-SYSTEM MODELLING FOR NICA 1 Nechaevskiy A. Dubna, 2012.
Group Computing Strategy Introduction and BaBar Roger Barlow June 28 th 2005.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Component 4: Introduction to Information and Computer Science Unit 4: Application and System Software Lecture 3 This material was developed by Oregon Health.
Update on a New EPICS Archiver Kay Kasemir and Leo R. Dalesio 09/27/99.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
US Planck Data Analysis Review 1 Peter MeinholdUS Planck Data Analysis Review 9–10 May 2006 Where we need to be 2 months before launch- Instrument view.
US Planck Data Analysis Review 1 Christopher CantalupoUS Planck Data Analysis Review 9–10 May 2006 CTP Working Group Presented by Christopher Cantalupo.
 Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet). 
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
Using Bitmap Index to Speed up Analyses of High-Energy Physics Data John Wu, Arie Shoshani, Alex Sim, Junmin Gu, Art Poskanzer Lawrence Berkeley National.
(D)CI related activities at IFCA Marcos López-Caniego Instituto de Física de Cantabria (CSIC-UC) Astro VRC Workshop Paris Nov 7th 2011.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
STAR Off-line Computing Capabilities at LBNL/NERSC Doug Olson, LBNL STAR Collaboration Meeting 2 August 1999, BNL.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Scientific Advisory Committee – September 2011COLA Information Systems COLA’s Information Systems 2011.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Application Software System Software.
High Energy FermiLab Two physics detectors (5 stories tall each) to understand smallest scale of matter Each experiment has ~500 people doing.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
Planck Report on the status of the mission Carlo Baccigalupi, SISSA.
State of LSC Data Analysis and Software LSC Meeting LIGO Hanford Observatory November 11 th, 2003 Kent Blackburn, Stuart Anderson, Albert Lazzarini LIGO.
MISSION CRITICAL COMPUTING Siebel Database Considerations.
Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.
RCF Status - Introduction PHENIX and STAR Counting Houses are connected to RCF at a Network Bandwidth of 20 Gbits/sec each –Redundant (Bandwidth-wise and.
Evolving Scientific Data Workflow CAS 2011 Pamela Gillman
NCAR RP Update Rich Loft NCAR RPPI May 7, NCAR Teragrid RP Developments Current Cyberinfrastructure –5.7 TFlops/2048 core Blue Gene/L system –100.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Cosmic Microwave Background Data Analysis At NERSC Julian Borrill with Christopher Cantalupo Theodore Kisner.
Computing at SSRL: Experimental User Support Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
A U.S. Department of Energy laboratory managed by UChicago Argonne, LLC. Introduction APS Engineering Support Division –Beamline Controls and Data Acquisition.
Report from US ALICE Yves Schutz WLCG 24/01/2007.
Status and requirements of PLANCK
A Brief Introduction to NERSC Resources and Allocations
WP18, High-speed data recording Krzysztof Wrona, European XFEL
2. OPERATING SYSTEM 2.1 Operating System Function
Tools and Services Workshop
Computing Facilities & Capabilities
SAM at CCIN2P3 configuration issues
(for the Algorithm Development Group of the US Planck Team)
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
ALICE Computing Model in Run3
ALICE Computing Upgrade Predrag Buncic
Data Issues Julian Borrill
Scientific Computing At Jefferson Lab
OPERATING SYSTEMS.
NERSC Reliability Data
TeraScale Supernova Initiative
Tomography at Advanced Photon Source
Presentation transcript:

US Planck Data Analysis Review 1 Julian BorrillUS Planck Data Analysis Review 9–10 May 2006 Computing Facilities & Capabilities Julian Borrill Computational Research Division, Berkeley Lab & Space Sciences Laboratory, UC Berkeley

2 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 2 Computing Issues Data Volume Data Processing Data Storage Data Security Data Transfer Data Format/Layout Its all about the data

3 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 3 Data Volume Planck data volume drives (almost) everything –LFI :  22 detectors with 32.5, 45 & 76.8 Hz sampling  4 x samples per year  0.2 TB time-ordered data TB full detector pointing data –HFI :  52 detectors with 200 Hz sampling  3 x samples per year  1.3 TB time-ordered data TB full boresight pointing data –LevelS (e.g. CTP “Trieste” simulations) :  4 LFI detectors with 32.5 Hz sampling  4 x 10 9 samples per year  2 scans x 2 beams x 2 samplings x 7 components + 2 noises  1.0 TB time-ordered data TB full detector pointing data

4 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 4 Data Processing Operation count scales linearly (& inefficiently) with –# analyses, # realizations, # iterations, # samples –100 x 100 x 100 x 100 x ~ O(10) Eflop (cf. '05 Day in the Life) NERSC –Seaborg : 6080 CPU, 9 Tf/s –Jacquard : 712 CPU, 3 Tf/s (cf. Magique-II) –Bassi : 888 CPU, 7 Tf/s –NERSC-5 : O(100) Tf/s, first-byte in 2007 –NERSC-6 : O(500) Tf/s, first-byte in 2010 –Expect allocation of O(2 x 10 6 ) CPU-hours/year => O(4) Eflop/yr (10GHz 5% efficiency) USPDC cluster –Specification & location TBD, first-byte in 2007/8 –O(100) CPU x 80% x 9000 hours/year => O(0.4) Eflop/yr (5GHz 3% efficiency) IPAC small cluster dedicated to ERCSC

5 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 5 Processing 0.1 Tf/s ERCSC Cluster 100 Tf/s NERSC 5 (2007) 7 Tf/s NERSC Bassi 3 Tf/s NERSC Jacquard 9 Tf/s NERSC Seaborg 0.5 Tf/s USPDC Cluster 500 Tf/s NERSC 6 (2010)

6 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 6 Data Storage Archive at IPAC –mission data –O(10) TB Long-term at NERSC using HPSS –mission + simulation data & derivatives –O(2) PB Spinning disk at USPDC cluster & at NERSC using NGF –current active data subset –O(2 - 20) TB Processor memory at USPDC cluster & at NERSC –running job(s) –O( ) GB/CPU & O( ) TB total

7 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 7 Processing + Storage 0.1 Tf/s 50 GB ERCSC Cluster 100 Tf/s 50 TB NERSC-5 (2007) 7 Tf/s 4 TB NERSC Bassi 3 Tf/s 2 TB NERSC Jacquard 9 Tf/s 6 TB NERSC Seaborg 0.5 Tf/s 200 GB USPDC Cluster 20/200 TB NERSC NGF 2 TB USPDC Cluster 10 TB IPAC Archive 2/20 PB NERSC HPSS 500 Tf/s 250 TB NERSC-6 (2010)

8 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 8 Data Security UNIX filegroups –special account : user planck –permissions _r__/___/___ Personal keyfob to access planck acount –real-time grid-certification of individuals –keyfobs issued & managed by IPAC –single system for IPAC, NERSC & USPDC cluster Allows securing of selected data –e.g. mission vs simulation Differentiates access to facilities and to data –standard personal account & special planck account

9 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 9 PLANCK KEYFOB REQUIRED Processing + Storage + Security 0.1 Tf/s 50 GB ERCSC Cluster 7 Tf/s 4 TB NERSC Bassi 3 Tf/s 2 TB NERSC Jacquard 9 Tf/s 7 TB NERSC Seaborg 0.5 Tf/s 200 GB USPDC Cluster 20/200 TB NERSC NGF 2 TB USPDC Cluster 10 TB IPAC Archive 2/20 PB NERSC HPSS 100 Tf/s 50 TB NERSC-5 (2007) 500 Tf/s 250 TB NERSC-6 (2010)

10 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 10 Data Transfer From DPCs to IPAC –transatlantic tests being planned From IPAC to NERSC –10 Gb/s over Pacific Wave, CENIC + ESNet –tests planned this summer From NGF to/from HPSS –1 Gb/s being upgraded to 10+ Gb/s From NGF to memory (most real-time critical) –within NERSC  8-64 Gb/s depending on system (& support for this) –offsite depends on location  10Gb/s to LBL over dedicated data link on Bay Area MAN –fallback exists : stage data on local scratch space

11 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 11 PLANCK KEYFOB REQUIRED Processing + Storage + Security + Networks 0.1 Tf/s 50 GB ERCSC Cluster 100 Tf/s 50 TB NERSC-5 (2007) 7 Tf/s 4 TB NERSC Bassi 3 Tf/s 2 TB NERSC Jacquard 9 Tf/s 7 TB NERSC Seaborg 0.5 Tf/s 200 GB USPDC Cluster 20/200 TB NERSC NGF 2 TB USPDC Cluster 10 TB IPAC Archive 2/20 PB NERSC HPSS DPCs 10 Gb/s 8 Gb/s 10 Gb/s ? ? ? ? 64 Gb/s ? ? 500 Tf/s 250 TB NERSC-6 (2010)

12 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 12 Project Columbia Update Last year we advertised our proposed use of NASA's new Project Columbia (5 x 2048 CPU, 5 x 12 Tf/s), potentially including a WAN-NGF. We were successful in pushing for Ames' connection to the Bay Area MAN, providing a 10Gb/s dedicated data connect. We were unsuccessful in making much use of Columbia: –disk read performance varies from poor to atrocious, effectively disabling data analysis (although simulation is possible). –foreign nationals are not welcome, even if they have passed JPL security screening ! We have provided feedback to Ames and HQ, but for now we are not pursuing this resource.

13 US Planck Data Analysis Review 9–10 May 2006Julian Borrill 13 Data Formats Once data are on disk they must be read by codes that do not know (or want to know) their format/layout: –to analyze LFI, HFI, LevelS, WMAP, etc data sets  both individually and collectively –to be able to operate on data while it is being read  e.g. weighted co-addition of simulation components M3 provides a data abstraction layer to make this possible Investment in M3 has paid huge dividends this year: –rapid (10 min) ingestion of new data formats, such as PIOLIB evolution and WMAP –rapid (1 month) development of interface to any compressed pointing, allowing on-the-fly interpolation & translation –immediate inheritance of improvements (new capabilities & optimization/tuning) by the growing number of M3-based codes