Nuclear Physics Greenbook Presentation (Astro,Theory, Expt) Doug Olson, LBNL NUG Business Meeting 25 June 2004 Berkeley.

Slides:



Advertisements
Similar presentations
University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
Advertisements

Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
Introduction CSCI 444/544 Operating Systems Fall 2008.
PZ13B Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ13B - Client server computing Programming Language.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Understanding Operating Systems 1 Overview Introduction Operating System Components Machine Hardware Types of Operating Systems Brief History of Operating.
Computer Science 162 Section 1 CS162 Teaching Staff.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
 Introduction Introduction  Definition of Operating System Definition of Operating System  Abstract View of OperatingSystem Abstract View of OperatingSystem.
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Computer System Architectures Computer System Software
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.
Windows 2000 Advanced Server and Clustering Prepared by: Tetsu Nagayama Russ Smith Dale Pena.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
Fall 2000M.B. Ibáñez Lecture 01 Introduction What is an Operating System? The Evolution of Operating Systems Course Outline.
R. Ryne, NUG mtg: Page 1 High Energy Physics Greenbook Presentation Robert D. Ryne Lawrence Berkeley National Laboratory NERSC User Group Meeting.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
Mass Storage System Forum HEPiX Vancouver, 24/10/2003 Don Petravick (FNAL) Olof Bärring (CERN)
Course Introduction Andy Wang COP 5611 Advanced Operating Systems.
المحاضرة الاولى Operating Systems. The general objectives of this decision explain the concepts and the importance of operating systems and development.
Block1 Wrapping Your Nugget Around Distributed Processing.
Chapter 4 Realtime Widely Distributed Instrumention System.
G063 - Distributed Databases. Learning Objectives: By the end of this topic you should be able to: explain how databases may be stored in more than one.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March Data Management Needs for Nuclear-Astrophysical Simulation at the Ultrascale.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, Sam Liston,
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
PDSF and the Alvarez Clusters Presented by Shane Canon, NERSC/PDSF
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
Parallel IO for Cluster Computing Tran, Van Hoai.
Distributed Computing Systems CSCI 6900/4900. Review Definition & characteristics of distributed systems Distributed system organization Design goals.
Tackling I/O Issues 1 David Race 16 March 2010.
NERSC User Group Meeting June 3, 2002 FY 2003 Allocation Process Francesca Verdier NERSC User Services Group Lead
Background Computer System Architectures Computer System Software.
Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.
HPC University Requirements Analysis Team Training Analysis Summary Meeting at PSC September Mary Ann Leung, Ph.D.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Clouds , Grids and Clusters
Andy Wang COP 5611 Advanced Operating Systems
Course Introduction Dr. Eggen COP 6611 Advanced Operating Systems
Andy Wang COP 5611 Advanced Operating Systems
Grid Computing.
Chapter 16: Distributed System Structures
NERSC Reliability Data
An Introduction to Computer Networking
TeraScale Supernova Initiative
Andy Wang COP 5611 Advanced Operating Systems
Chapter 2: Operating-System Structures
Andy Wang COP 5611 Advanced Operating Systems
Chapter 2: Operating-System Structures
Andy Wang COP 5611 Advanced Operating Systems
Presentation transcript:

Nuclear Physics Greenbook Presentation (Astro,Theory, Expt) Doug Olson, LBNL NUG Business Meeting 25 June 2004 Berkeley

25 June 2004D. Olson, NP Reqs, NUG Mtg2 Reminding you what Nuclear Physics is. The mission of the Nuclear Physics (NP) program is to advance our knowledge of the properties and interactions of atomic nuclei and nuclear matter and the fundamental forces and particles of nature. The program seeks to understand how quarks bind together to form nucleons and nuclei, to create and study the quark-gluon plasma that is thought to have been the primordial state of the early universe, and to understand energy production and element synthesis in stars and stellar explosions.

25 June 2004D. Olson, NP Reqs, NUG Mtg3

25 June 2004D. Olson, NP Reqs, NUG Mtg4 Contents Questions Answers Some Observations

25 June 2004D. Olson, NP Reqs, NUG Mtg5 Questions about current needs and 3-4 years out 1. What are your most important processing needs? (e.g. single CPU speed, # of parallel CPU's, memory,...) 2. What are your most important storage needs? (e.g. I/O bandwidth to disk, disk space, HPSS bandwidth, space,...) 3. What are your most important network needs? (e.g. wide-area bandwidth, bandwidth between NERSC resources,...) 4. What are your most important remote access needs? (e.g. remote login, remote visualization, data transfer, single sign-on to automate work across multiple sites,...) 5. What are your most important user services needs? (e.g. general helpdesk questions, tutorials, debugging help,...) 6. Do you have special software requirements, if so what? 7. Do you have special visualization requirements, if so what? 8. Is automating your work across multiple sites important? Called distributed workflow. Sites could be other large centers, a cluster, your desktop, etc. 9. Anything else important to your project? Asked to all PI’s with NP awards 9 responders & good cross section

25 June 2004D. Olson, NP Reqs, NUG Mtg6 Responses Astronomy –Swesty, SUNY SB, TSI Collaboration –Nugent, LBNL Nuclear Theory –Ji, NCSU, QCD –Pieper, ANL, Light nuclei –Dean, ORNL, Nuclear many body problem –Vary, Iowa State, Nuclear reactions –Lee, Kentucky, Lattice QCD Experiment –Klein, LBNL, IceCube –Olson, LBNL, STAR

25 June 2004D. Olson, NP Reqs, NUG Mtg7 1. What are your most important processing needs? (e.g. single CPU speed, # of parallel CPU's, memory,...) Theory –Small cluster of nodes (1-16 nodes) and long run duration (12-24h or more) –implementation of execution of single-processor tasks would be welcomed. –Faster processors are always a good thing! –Up to 2048 parallel (for SMMC) –Need faster interprocessor b/w (for two other codes) Eq. 256 CPU Altrix is 7X faster than Seaborg –total number of cycles obtained over a range of processors (50 to 500). generally Gbytes/processor are needed on processors with 1-4x seaborg speed. –Memory per CPU - by far the most important to our project (saves I/O to disk and/or cuts down on inter-node communication)

25 June 2004D. Olson, NP Reqs, NUG Mtg8 1. What are your most important processing needs? (e.g. single CPU speed, # of parallel CPU's, memory,...) Astro –Getting a little more CPU speed is ok, but BY FAR we need faster bandwidth between processors and between nodes. – processors now –Lower latency communications Expt –Not parallel algorithms, compute at PDSF & other linux clusters

25 June 2004D. Olson, NP Reqs, NUG Mtg9 2. What are your most important storage needs? (e.g. I/O bandwidth to disk, disk space, HPSS bandwidth, space,...) Theory –Bandwidth to disk and gpfs disk space (increase by 100X for disk space). –Inode limit is a persistant pain, but can be lived with. –HPSS bandwidth, space –Single file system across machines (seaborg, newton) Astro –Fine now, may change as we move more to 3-D. –improved parallel I/O throughput to disk for > 1024 processor jobs –increased scratch disk capacity Expt –Database, MySQL in use now –Disk - scalable size & I/O performance, >100TB, > 1 GB/sec –HPSS – size & I/O –Automated caching, replication & I/O load balancing

25 June 2004D. Olson, NP Reqs, NUG Mtg10 3. What are your most important network needs? (e.g. wide-area bandwidth, bandwidth between NERSC resources,...) Theory –Moving 20 GB datasets today ORNL-NERSC-MSU –Moving 0.5 TB datasets in 2 years ORNL-NERSC-MSU- LLNL-PNNL –bandwidth between NERSC resources Astro –Improved throughput between NERSC, ORNL, and Stony Brook Expt –WAN bandwidth end-to-end (means endpoints or other LAN effects are often the problem), labs & universities

25 June 2004D. Olson, NP Reqs, NUG Mtg11 4. What are your most important remote access needs? (e.g. remote login, remote visualization, data transfer, single sign-on to automate work across multiple sites,...) Theory –X-windowed system –Data transfer is becoming an increasingly important need. –ssh/scp with authorized keys is fine. one-time passwords would severely handicap my use of a local emacs and tramp to edit, view, and transfer files. Astro –Single sign-on to allow process automation is very important right now. Of CRITICAL importance is avoidance of one-time authentication methods which would kill any hopes of scientific workflow automation. –Some remote viz. Expt –Data transfer –Single sign-on across sites for automated workflow

25 June 2004D. Olson, NP Reqs, NUG Mtg12 5. What are your most important user services needs? (e.g. general helpdesk questions, tutorials, debugging help,...) Theory –support/online-help for Windows-based X-servers –Programming languages online-references or links to online-references would be great. –General helpdesk and sometimes tutorials. –Online tutorials (stored and indexed) Astro –Biggest problems are dealing with new compiler bugs. –Performance optimization, requires help people who have access to the IBM compiler group to code kernels tuned. Expt –General user support and collaboration software installation is very good. –Need troubleshooting across sites and WAN

25 June 2004D. Olson, NP Reqs, NUG Mtg13 6. Do you have special software requirements, if so what? Theory –Part of my plans involve solving a large sparse eigenvalue problem. Software like Aztec is going to be useful for this. Astro –We continue to rely on the availability of HDF5 v1.4.5 for our I/O needs on seaborg. HDF5 1.6.x will not suffice as we have uncovered show-stopping bugs in this release. Expt –Community & collaboration software (CERN, ROOT, …) –Current install/maintenance procedures work well

25 June 2004D. Olson, NP Reqs, NUG Mtg14 7. Do you have special visualization requirements, if so what? Theory –We would welcome introduction of visual debugging tools for fortran/C++, especially for MPI or HPF programs, if possible of course. Astro –Some, but most have been covered by the viz group. –We continue to rely heavily on the NERSC viz group to help us address our viz needs.

25 June 2004D. Olson, NP Reqs, NUG Mtg15 8. Is automating your work across multiple sites important? Called distributed workflow. Sites could be other large centers, a cluster, your desktop, etc. Theory –Yes. We are considering how to develop common component software for nuclear physics problems. The low- energy nuclear theory community will increasingly move towards integrated code environments. This includes data movement, and workflow across several sites. (We do this now with NERSC/ORNL/MSU). –I do a fair amount of post processing with Speakeasy on my workstation. This involves mixing results from NERSC, Argonne's parallel machines, and Los Alamos' qmc at present.

25 June 2004D. Olson, NP Reqs, NUG Mtg16 8. Is automating your work across multiple sites important? Called distributed workflow. Sites could be other large centers, a cluster, your desktop, etc. Astro –Yes! We are currently working with the SPA (Scientific Process Automation) team from the SciDAC Scientific Data Managment ISIC on automatic our workflow between NERSC and our home computing site at Stony Brook. Expt –Yes. Experiment collaboration computing is spread across large & small sites and desktops. Need more integration with security & tools for a more seamless environment.

25 June 2004D. Olson, NP Reqs, NUG Mtg17 9. Anything else important to your project? Theory –Nersc is a great help, keep up a great work! –My biggest concern with NERSC at the present time is that it has fallen behind the curve on the machine front. While I still consider NERSC a valuable resource to my research, I have diversified significantly during this FY. –Any performance tools, such as POE, that help diagnose the bottlenecks in a code and help suggest routes to improvements. Astro –Memory bandwidth and latency. Expt –User management across site & national boundaries. Separate user registration & accounts across many sites will become too burdensome. Think single sign-on & seamless!

25 June 2004D. Olson, NP Reqs, NUG Mtg18 Observations A strong need for greater inter-processor bandwidth Faster processors, more memory Single file system view across NERSC Greater parallel FS performance (>1024) More space More/better data management tools Single sign-on across sites Help with Inter-site (WAN) issues Much scientific computing now has workflow across several sites