NASA High Performance Computing (HPC) Directions, Issues, and Concerns: A User’s Perspective Dr. Robert C. Singleterry Jr. NASA Langley Research Center.

Slides:

Advertisements

Similar presentations

Simulation at NASA for the Space Radiation Effort Dr. Robert C. Singleterry Jr. NASA Administrator's Fellow (Cohort 6) NASA Langley Research Center HPC.

Advertisements

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

18-OCT-2005 Lyndon B. Johnson Space Center space radiation analysis group 1 Operational Aspects of Space Radiation Analysis October 18, 2005 Mark Weyland.

Anatomy of 4GL Disaster CS524 - Software Engineering I Fall I, Sheldon X. Liang, Ph.D. Nathan Scheck CS524 - Software Engineering I Fall I, 2007.

HPC - High Performance Productivity Computing and Future Computational Systems: A Research Engineer’s Perspective Dr. Robert C. Singleterry Jr. NASA Langley.

Lab/Sessional -CSE-374. SYSTEM DEVELOPMENT LIFE CYCLE.

July Terry Jones, Integrated Computing & Communications Dept Fast-OS.

1: Operating Systems Overview

Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Quantitative.

1 Analysis code for KEK Test-Beam M. Ellis Daresbury Tracker Meeting 30 th August 2005.

Computer Organization and Architecture

02/07/2001 EOSDIS Core System (ECS) COTS Lessons Learned Steve Fox

Large-Scale Density Functional Calculations James E. Raynolds, College of Nanoscale Science and Engineering Lenore R. Mullin, College of Computing and.

System Implementations American corporations spend about $300 Billion a year on software implementation/upgrade projects.

Sensor Network Simulation Simulators and Testbeds Jaehoon Kim Jeeyoung Kim Sungwook Moon.

Data Center Infrastructure

Space Radiation Protection, Space Weather and Exploration 25 April Kerry Lee 2 J. Barzilla, 2 T. Bevill, 2 A. Dunegan, 1 D. Fry, 2 R. Gaza, 2 A.S.

Exploring Space 1.1 Some space objects are visible to the human eye.

1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.

Overview of Computing. Computer Science What is computer science? The systematic study of computing systems and computation. Contains theories for understanding.

Exploring Space CHAPTER the BIG idea People develop and use technology to explore and study space. Some space objects are visible to the human eye. Telescopes.

Automated Electron Step Size Optimization in EGS5 Scott Wilderman Department of Nuclear Engineering and Radiological Sciences, University of Michigan.

TWIST Measuring the Space-Time Structure of Muon Decay Carl Gagliardi Texas A&M University TWIST Collaboration Physics of TWIST Introduction to the Experiment.

Ian C. Smith 1 A portal-based system for quality assurance of radiotherapy treatment plans using Grid-enabled High Performance Computing clusters CR Baker.

The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.

Aerospace Nuclear Science and Technology Technical Group Presentation to the ANS Board of Directors Dr. Robert C. Singleterry Jr. November 19 th, 2003.

Chapter 4 Realtime Widely Distributed Instrumention System.

Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:

Outline 3  PWA overview Computational challenges in Partial Wave Analysis Comparison of new and old PWA software design - performance issues Maciej Swat.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

Science 9: Space Exploration Topic 7 - The Solar System.

Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

A Framework for Visualizing Science at the Petascale and Beyond Kelly Gaither Research Scientist Associate Director, Data and Information Analysis Texas.

1. 2 Table 4.1 Key characteristics of six passenger aircraft: all figures are approximate; some relate to a specific model/configuration of the aircraft.

Chapter 2 Introduction to Systems Architecture. Chapter goals Discuss the development of automated computing Describe the general capabilities of a computer.

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.

Morgan Kaufmann Publishers

Exploring Space CHAPTER the BIG idea People develop and use technology to explore and study space. Some space objects are visible to the human eye. Telescopes.

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.

Pure Path Tracing: the Good and the Bad Path tracing concentrates on important paths only –Those that hit the eye –Those from bright emitters/reflectors.

August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.

WARM UP Did you look at the stars? Who/what is Andromeda? What do you see in this picture?

Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.

DiRAC-3 – The future Jeremy Yates, STFC DiRAC HPC Facility.

Units to cover: 52, 53, Observatories in Space.

Discussion Slides S.E. Kruger, J.D. Callen, J. Carlson, C.C. Hegna, E.D. Held, T. Jenkins, J. Ramos, D.D. Schnack, C.R. Sovinec, D.A. Spong ORNL SWIM Meet.

Modul ke: Fakultas Program Studi Teknologi Pusat Data 13 FASILKOM Teknik Informatika Infrastruktur Pusat Data.

 155 South 1452 East Room 380  Salt Lake City, Utah  This research was sponsored by the National Nuclear Security Administration.

Cosmic Rays and Manned Interplanetary Travel Isaac Shaffer Gary Bowman Keran O’Brien Northern Arizona University.

Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer

How Solar Technologies can benefit from the Copernicus Project Nur Energie November 2015.

Enabling Grids for E-sciencE LRMN ThIS on the Grid Sorina CAMARASU.

EDGE™ Preliminary Project Plan P09102 – METEOR Stage Separation System JJ Guerrette (ME)

Canadian Bioinformatics Workshops

Pulkkinen, A., M. Kuznetsova, Y. Zheng, L. Mays and A. Wold

Fast & Accurate Biophotonic Simulation for Personalized Photodynamic Cancer Therapy Treatment Planning Investigators: Vaughn Betz, University of Toronto.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING

Concurrent Data Structures for Near-Memory Computing

A Brachytherapy Treatment Planning Software Based on Monte Carlo Simulations and Artificial Neural Network Algorithm Amir Moghadam.

Data Issues Julian Borrill

Operating Systems (CS 340 D)

Big Bang: timeline.

NASA Nasa's Parker Solar Probe mission set off to explore the Sun's atmosphere in the summer of The probe will swoop to within 4 million miles of.

Simulation at NASA for the Space Radiation Effort

Compiler Back End Panel

Compiler Back End Panel

SCIENCE MISSION DIRECTORATE

Alternative Processor Panel

Computing as Fast as an Engineer can Think

Presentation transcript:

NASA High Performance Computing (HPC) Directions, Issues, and Concerns: A User’s Perspective Dr. Robert C. Singleterry Jr. NASA Langley Research Center HPC China Oct 29th, 2010

Overview Current Computational Resources Directions from a User’s Perspective Issues and Concerns Conclusion? Case Study – Space Radiation Summary 29-Oct-20102HPC China

Current Computational Resources Ames 115,000+ cores (Pleiades) 1-2 GB/core LUSTRE Langley cores (K) 1GB/core LUSTRE Goddard 10,000+ Nehalem cores (1 year ago) 3GB/core GPFS Others at other centers 29-Oct-2010HPC China3

Current Computational Resources Science applications Star and galaxy formation Weather and climate modeling Engineering applications CFD Ares-I and Ares-V Aircraft Orion reentry Space radiation Structures Materials Satellite operations, data analysis & storage 29-Oct-2010HPC China4

Directions from a User’s Perspective 2004: Columbia 10,240 cores 2008: Pleiades 51,200 cores 2012 System 256,000 cores 2016 System 1,280,000 cores Extrapolation!!! Use at own risk 29-Oct-2010HPC China5 5 times more cores every 4 years

Issues and Concerns Assume power and cooling are not issues Is this a valid assumption? What will a “core” be in the next 6 years? “Nehalem”-like – powerful, fast, and “few” “BlueGene”-like – minimal, slow, and “many” “Cell”-like – not like CPU at all, fast, and many “Unknown”-like – combination, hybrid, new, … In 2016, NASA should have a 1.28 million core machine tightly coupled together Everything seems to be fine 29-Oct-2010HPC China6 Maybe???

Issues and Concerns? A few details about our systems Each of the 4 NASA Mission Directorates “own” part of Pleiades Each Center and Branch resource control their own machines in the manner they see fit Queues limit the number of cores used per job per Directorate, Center, or Branch Queues limit the time per job without special permissions from the Directorate, Center, or Branch This harkens of a time share machine of old 29-Oct-2010HPC China7

Issues and Concerns? As machines get bigger, 1.28 million cores in 2016, do the queues get bigger? Can the NASA research, engineer, and operation users utilize the bigger queues? Will NASA algorithms keep up with the 5 times scaling every 4 years? 2008: 2000 core algorithms 2016: 50,000 core algorithms Is NASA spending money on right issue? Newer, bigger, better hardware Newer, better, scalable algorithms 29-Oct-2010HPC China8

Conclusions? Is there a conclusion? There are issues and concerns! Spend money on bigger and better hardware? Spend money on more scalable algorithms? Do the NASA funders understand these issues from a researcher, engineer, and operations point of view? Do researchers and engineers understand the NASA funder point of view? At this point, there is no conclusion! 29-Oct-2010HPC China9

Case Study – Space Radiation 29-Oct-2010HPC China10 Cosmic Rays and Solar Particle Events Nuclear interactions Human and electronic damage Dose Equivalent: damage caused by energy deposited along the particle’s track

Previous Space Radiation Algorithm Design and start to build spacecraft Mass limits and objectives have been reached Brought in radiation experts Analyzed spacecraft by hand (not parallel) Extra shielding needed for certain areas of the spacecraft or extra component capacity Reduced new mass to mass limits by lowering the objectives of the mission Throwing off science experiments Reducing mission capability 29-Oct-2010HPC China11

Previous Space Radiation Algorithm Major missions impacted in this manner Viking Gemini Apollo Mariner Voyager 29-Oct-2010HPC China12

Previous Space Radiation Algorithm 29-Oct-2010HPC China13 SAGE III

Primary Space Radiation Algorithm Ray trace of spacecraft/human geometry Reduction of ray trace materials to three ordered materials Aluminum Polyethylene Tissue Transport database Interpolate each ray Integrate each point Do for all points in the body - weighted sum 29-Oct-2010HPC China14

Primary Space Radiation Algorithm Transport database creation is mostly serial and not parallelizable in coarse grain 1,000 point interpolation over database is parallel in the coarse grain Integration of data at points is parallel if the right library routines are bought At most, a hundreds-of-core process over hours of computer time Not a good fit for the design cycle Not a good fit for the HPC of 2012 and Oct-2010HPC China15

Imminent Space Radiation Algorithm Ray trace of spacecraft/human geometry Run transport algorithm along each ray No approximation on materials Integrate all rays Do for all points Weighted sum 29-Oct-2010HPC China16

Imminent Space Radiation Algorithm 1,000 rays per point 1,000 points per body 1,000,000 transport 1 min to 10 hours per point (depends on rays) Integration of data at points is bottleneck Data movement speed is key Data size is small This process is inherently parallel if communication bottleneck is reasonable Better fit for HPC of 2012 and Oct-2010HPC China17

Future Space Radiation Algorithms Monte Carlo methods Data communications is bottleneck Each history is independent of other histories Forward/Adjoint finite element methods Same problems as other finite element codes Phase space decomposition is key Hybrid methods Finite Element and Monte Carlo together Best of both worlds (on paper anyway) Variational methods Unknown at this time 29-Oct-2010HPC China18

Summary Present space radiation methods are not HPC friendly or scalable Why care? Are the algorithms good enough? Need scalability to Keep up with design cycle wanted by users Slower speeds of the many core chips New bells & whistles wanted by funders Imminent method better but has problems Future methods show HPC scalability promise on paper but need resources for investigation and implementation 29-Oct-2010HPC China19

Summary NASA is committed to HPC for science, engineering, and operations Issues & concerns about where resources are spent & how they impact NASA’s work Will machines be bought that can benefit science, engineering, and operations? Will resources be spent on algorithms that can utilize the machines bought? HPC help desk creation to inform and work with users to achieve better results for NASA work: HeCTOR Model 29-Oct-2010HPC China20