Parallel Visualization At TACC Greg Abram. Visualization Problems Small problems: Data are small and easily moved Office machines and laptops are adequate.

Slides:



Advertisements
Similar presentations
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Advertisements

Ultra-Scale Visualization with Open-Source Software Berk Geveci Kitware Inc.
GPU Computing with Hartford Condor Week 2012 Bob Nordlund.
WEST VIRGINIA UNIVERSITY HPC and Scientific Computing AN OVERVIEW OF HIGH PERFORMANCE COMPUTING RESOURCES AT WVU.
MCITP Guide to Microsoft Windows Server 2008 Server Administration (Exam #70-646) Chapter 11 Windows Server 2008 Virtualization.
Distributed Processing, Client/Server, and Clusters
Technical Architectures
Silicon Graphics, Inc. Poster Presented by: SGI Proprietary Technologies for Breakthrough Research Rosario Caltabiano North East Higher Education & Research.
Advanced Scientific Visualization Paul Navrátil 28 May 2009.
Network+ Guide to Networks, Fourth Edition Chapter 1 An Introduction to Networking.
1 Chapter Overview Introduction to Windows XP Professional Printing Setting Up Network Printers Connecting to Network Printers Configuring Network Printers.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Project Overview:. Longhorn Project Overview Project Program: –NSF XD Vis Purpose: –Provide remote interactive visualization and data analysis services.
A+ Guide to Hardware: Managing, Maintaining, and Troubleshooting, Sixth Edition Chapter 9, Part 11 Satisfying Customer Needs.
Parallelization with the Matlab® Distributed Computing Server CBI cluster December 3, Matlab Parallelization with the Matlab Distributed.
Visualization Linda Fellingham, Ph. D Manager, Visualization and Graphics Sun Microsystems Shared Visualization 1.1 Software Scalable Visualization 1.1.
STRATEGIES INVOLVED IN REMOTE COMPUTATION
High Performance Computing G Burton – ICG – Oct12 – v1.1 1.
4 1 Operating System Activities  An operating system is a type of system software that acts as the master controller for all activities that take place.
Operating System. Architecture of Computer System Hardware Operating System (OS) Programming Language (e.g. PASCAL) Application Programs (e.g. WORD, EXCEL)
Virtualization. ABCs Special software: hypervisors or virtual machine managers Guest OS (virtual machine) sits on top of host OS (Win 7 in our case) We.
Larry Marx and the Project Athena Team. Outline Project Athena Resources Models and Machine Usage Experiments Running Models Initial and Boundary Data.
Visualization as a Science Discovery Tool Issues and Concerns Kelly Gaither Director of Visualization/ Sr. Research Scientist Texas Advanced Computing.
Introduction to HPC resources for BCB 660 Nirav Merchant
University of Illinois at Urbana-Champaign NCSA Supercluster Administration NT Cluster Group Computing and Communications Division NCSA Avneesh Pant
University of Management & Technology 1 Operating Systems & Utility Programs.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Guide to Linux Installation and Administration, 2e1 Chapter 2 Planning Your System.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
MediaGrid Processing Framework 2009 February 19 Jason Danielson.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Common Practices for Managing Small HPC Clusters Supercomputing 12
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
Application Layer Khondaker Abdullah-Al-Mamun Lecturer, CSE Instructor, CNAP AUST.
Rafaella Luque 8vo “a”. A computer network is a group of computers that are connected to each other for the purpose of communication.
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
A Framework for Visualizing Science at the Petascale and Beyond Kelly Gaither Research Scientist Associate Director, Data and Information Analysis Texas.
ITGS Networks. ITGS Networks and components –Server computers normally have a higher specification than regular desktop computers because they must deal.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Using the Weizmann Cluster Nov Overview Weizmann Cluster Connection Basics Getting a Desktop View Working on cluster machines GPU For many more.
I NTRODUCTION TO N ETWORK A DMINISTRATION. W HAT IS A N ETWORK ? A network is a group of computers connected to each other to share information. Networks.
CSC190 Introduction to Computing Operating Systems and Utility Programs.
Presented by Visualization at the Leadership Computing Facility Sean Ahern Scientific Computing Center for Computational Sciences.
Introduction TO Network Administration
Remote & Collaborative Visualization. TACC Remote Visualization Systems Longhorn – Dell XD Visualization Cluster –256 nodes, each with 48 GB (or 144 GB)
Tackling I/O Issues 1 David Race 16 March 2010.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
CLIENT SERVER COMPUTING. We have 2 types of n/w architectures – client server and peer to peer. In P2P, each system has equal capabilities and responsibilities.
Scaling up R computation with high performance computing resources.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Advanced Computing Facility Introduction
Compute and Storage For the Farm at Jlab
VisIt Project Overview
Welcome to Indiana University Clusters
HPC usage and software packages
File System Implementation
Operating System.
VisIt Libsim Update DOE Computer Graphics Forum 2012 Brad Whitlock
VirtualGL.
Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node
Architecture & System Overview
Software Architecture in Practice
Chapter 16: Distributed System Structures
Grid Canada Testbed using HEP applications
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
Advanced Computing Facility Introduction
Hardware Accelerated Video Decoding in
Presentation transcript:

Parallel Visualization At TACC Greg Abram

Visualization Problems Small problems: Data are small and easily moved Office machines and laptops are adequate for visualization * With thanks to Sean Ahern for the metaphor Medium problems: Data are costly to move over WAN Visualization requires lots of memory, fast CPU and GPU Large problems: Data are impractical to move over WAN, costly to move over LAN Visualization requires parallel systems for enough memory, CPU and GPU Huge problems: Data cannot be moved off system where it is computed Visualization requires equivalent resources as source HPC system

Visualization Problems Huge problems Don’t move the data; in-situ and co-processing visualization minimizes or eliminates data I/O Medium and small problems Move your data to visualization server and visualize using high- performance systems at TACC Large problems Move your data to visualization server and visualize using parallel high-performance systems and software at TACC * With thanks to Sean Ahern for the metaphor

Visualization Servers: Longhorn 256 compute nodes: – 16 “Fat” nodes - 8 Intel Nehalem cores 144 GB 2 NVIDIA Quadro FX5800 GPUs – 240 standard nodes 8 Intel Nehalem cores 48 GB 2 NVIDIA Quadro FX5800 GPUs Access (see Longhorn User Guide): – Longhorn Portal, or – Run qsub job.vnc on Longhorn using normal, development, long, largemem, request queues

Longhorn Architecture login1.longhorn (longhorn) Longhorn Visualization Portal Login Nodes “Fat” nodes 144 GB RAM 2 GPUs Vis Nodes 48 GB RAM 2 GPUs $HOME $SCRATCH Longhorn File Systems Compute Nodes Queues normal /ranger/share /ranger/work /ranger/scratch Ranger File Systems largemem request Read-Only File System Access Read/Write File System Access Job submission normal development long request Lustre Parallel File System NFS File System Retired

Visualization Servers: Stampede Access to Stampede file systems 128 Vis nodes: – 16 Intel Sandy Bridge cores – 32 GB RAM – 1 Nvidia K20 GPU 16 Large Memory Nodes – 32 Intel Sandy Bridge cores – 1 TB RAM – 2 Nvidia K20 GPU Access (see Stampede User Guide): Run sbatch job.vnc on Longhorn using vis, largemem queues

Stampede Architecture 4x login nodes stampede.tacc.utexas.edu Login Nodes 128 Vis Nodes 32 GB RAM 16 cores Nvidia K20 GPU Compute Nodes Queues vis, gpu SHARE WORK SCRATCH Stampede Lustre File Systems largemem Read/Write File System Access Job submission normal, serial, development, request ~6300 Compute Nodes 32 GB RAM 16 cores Xeon Phi 16 LargeMem Nodes 1TB RAM 32 cores 2x Nvidia K20 GPU

Parallel Visualization Software Good news! Paraview and Visit both run in parallel on and look just the same! Client/Server Architecture – Allocate multiple nodes for vncserver job – Run client process serially on root node – Run server processes on all nodes under MPI

Interactive Remote Desktop VNC server VNC viewer system login node system login node Port forwarding Internet System Vis Nodes Remote System … …

Remote Serial Visualization System login node System login node Port forwarding Internet System Vis Nodes Remote System … … visualization process Visualization GUI

Remote Parallel Visualization System login node System login node Port forwarding Internet System Vis Nodes Remote System … … Visualization client process Visualization GUI MPI Process Allocated node set Parallel visualization server process

Parallel Session Settings Number of nodes M – more nodes gives you: More total memory More I/O bandwidth (up to a limit determined by file system and other system load) More CPU cores, GPUs (though also affected by wayness) Wayness N – processes to run on each node – Paraview and Visit are not multi-threaded – N < k gives each process more memory, uses fewer CPU cores for k = number of cores per node – Need N > 1 on Longhorn, Stampede large memory nodes to utilize both GPUs Longhorn portal: – Number of Nodes and Wayness pulldowns qsub -pe Nway M argument – M specifies number of nodes (actually, specify k*M) – N specifies wayness sbatch –N [#nodes] –n [#processes/node]

Running Paraview In Parallel Run Paraview as before In a separate text window: module load python paraview NO_HOSTSORT=1 ibrun tacc_xrun pvserver In Paraview GUI: – File->Connect to bring up the Choose Server dialog – Set the server configuration name to manual – Click Configure and, from Startup Type, select Manual and Save – In Choose Server dialog, select manual and click Connect In client xterm, you should see Waiting for server… and in the server xterm, you should see Client connected.

Running Visit In Parallel Run Visit as before; it’ll do the right thing

Data-Parallel Visualization Algorithms Spatially partitioned data are distributed to participating processes…

Data-Parallel Algorithms Sometimes work well… Iso-contours – Surfaces can be computed in each partition concurrently and (with ghost zones) independently – However, since surfaces may not be evenly distributed across partitions, may lead to poor load balancing

Data-Parallel Algorithms Sometimes not so much… Streamlines are computed incrementally P0P1P2P3

Parallel Rendering Collect-and-render – Gather all geometry on 1 node and render Render-and-composite – Render locally, do depth- aware composite Both PV and Visit have both, offer user control of which to use

Parallel Data Formats To run in parallel, data must be distributed among parallel subprocess’ memory space Serial formats are “data-soup” – Data must be read, partitioned and distributed Parallel formats contain information enabling each subprocess to import its own subset of data simultanously – Maximize bandwith into parallel visualization process – Minimize reshuffling for ghost-zones Network file system enables any node to access any file

Paraview XML Parallel Formats Partition data reside in separate files: –.vti regular grids,.vts for structured grids … – Example: One of 256 partitions of a volume: c-2_5_5.vti ….. Encoded data looking lijke ascii gibberish … Global file associates partitions into overall grid –.pvti regular grids,.pvts for structured grids … – Example: global file for volume: c.pvti … …

SILO Parallel Format “Native” VisIt format – Not currently supported by Paraview Built on top of lower-level storage libraries – NetCDF, HDF5, PDB Multiple partitions in single file simplifies data management – Directory-like structure – Parallel file system enables simultaneous read access to file by multiple nodes – Optimal performance may be a mix note that write access to silo files is serial

Xdmf Parallel Format Common parallel format – Seen problems in VisIt Also built on top of lower-level storage libraries – NetCDF, HDF5 Multiple partitions in single file simplifies data management – Also directory-like structure – Also leverages Parallel File System – Also optimal performance may be a mix

Data Location Data must reside on accessible file system Movement within TACC faster than across Internet, but can still take a long time to transfer between systems Lonestar Lustre PFSs Lonestar Ranch TACC LAN Internet RangerSpur Ranger Lustre PFSLonghorn Lustre PFS Longhorn Retired Stampede Lustre PFSs Stampede

Post-Processing 1.Simulation writes periodic timesteps to storage 2.Visualization loads timestep data from storage, runs visualization algorithms and interacts with user … … … … Storage Viz host Sim host

Postprocessing On HPC Systems and Longhorn … … … … Longhorn Lustre PFS Longhorn Lonestar /Stampede System Lustre PFS

Postprocessing On Stampede … … Stampede Lustre PFS 1. Simulation runs on compute nodes, writes data to Lustre file system 2. Data is read back into vis nodes for visualization

Huge Data: Co- and In-Situ Processing Visualization requires equivalent horsepower – Not all visualization computation is accelerated – Increasingly, HPC platforms include acceleration I/O is expensive: simulation to disk, disk to disk, disk to visualization – I/O is not scaling with compute – Data is not always written at full spatial, temporal resolution

Huge Data: Co- and In-Situ Processing Visualization requires equivalent horsepower – Not all visualization computation is accelerated – Increasingly, HPC platforms include acceleration I/O is expensive: simulation to disk, disk to disk, disk to visualization – I/O is not scaling with compute – Data is not always written at full spatial, temporal resolution

Co-Processing Perform simulation and visualization on same host – Concurrently – Communication: Many-to-many Using high-performance interconnect to communicate … …

In-Situ Processing … … Incorporate visualization directly into simulation Run visualization algorithms on simulation’s data Output only visualization results

Co- and In-Situ Processing Not a panacea – Limits scientist’s exploration of the data Can’t go back in time May pay off to re-run the simulation – Impacts simulation May require source-code modification of simulation May increase simulation node’s footprint May affect simulation’s stability – Simulation host may not have graphics accelerators … but visualizations are often not rendering-limited … and more and more HPC hosts are including accelerators

Co- and In-Situ Status Bleeding edge Coprocessing capabilities in Paraview, VisIt – Did I say bleeding edge? In-Situ visualization is not simple We can help

Summary Parallel visualization is only partly about upping the compute power available, its also about getting sufficient memory and I/O bandwidth. I/O is a really big issue. Planning how to write your data for parallel access, and placing it where it can be accessed quickly, is critical. The TACC visualization groups are here to help you!