Download presentation
Presentation is loading. Please wait.
Published byMarilynn Gray Modified over 8 years ago
1
BigBen @ PSC
4
BigBen Features Compute Nodes 2068 nodes running Catamount (QK) microkernel Seastar interconnect in a 3-D torus configuration No external connectivity (no TCP) All Inter-node communication is over Portals Applications use MPI which is based on Portals Service & I/O Nodes (SIO) Nodes 22 nodes running Suse Linux Also on the Seastar interconnect SIO nodes can have PCI-X hardware installed, defining unique roles for each 2 SIO nodes are externally connected to ETF with 10GigE cards (currently)
5
Portals Direct I/O (PDIO) Details Portals-to-TCP routing –PDIO daemons aggregate hundreds of portals data streams into a configurable number of outgoing TCP streams –Heterogenous portals (both QK + Linux nodes) Explicit Parallelism –Configurable # of Portals receivers (on SIO nodes) Distributed across multiple 10GigE-connected Service & I/O (SIO) nodes –Corresponding # of TCP streams (to the WAN) one per PDIO daemon –A Parallel TCP receiver in the Goodhue booth Supports a variable/dynamic number of connections
6
Portals Direct I/O (PDIO) Details Utilizing the ETF network –10GigE end-to-end –Benchmarked >1Gbps in testing Inherent flow-control feedback to application –Aggregation protocol allows TCP transmission or even remote file system performance to throttle the data streams coming out of the application (!) Variable message sizes and file metadata supported Multi-threaded ring buffer in the PDIO daemon –Allows the Portals receiver, TCP sender, and computation to proceed asynchronously
7
Portals Direct I/O (PDIO) Config User-configurable/tunable parameters: –Network targets Can be different for each job –Number of streams Can be tuned for optimal host/network utilization –TCP network buffer size Can be tuned for maximum throughput over the WAN –Ring buffer size/length Controls total memory utilization of PDIO daemons –Number of portals writers Can be any subset of the running application’s processes –Remote filename(s) File metadata are propagated through the full chain, per write
8
ETF network Compute Nodes I/O Nodes Steering iGRIDPSC HPC resource and renderer waiting…
9
pdiod recv ETF network Compute Nodes I/O Nodes Steering iGRIDPSC Launch PPM job, PDIO daemons, and iGRID recv’ers
10
pdiod recv ETF network Compute Nodes I/O Nodes Steering iGRIDPSC Aggregate data via Portals
11
pdiod recv ETF network Compute Nodes I/O Nodes Steering iGRIDPSC Route traffic to ETF net
12
pdiod recv ETF network Compute Nodes I/O Nodes Steering iGRIDPSC Recv data @ iGRID
13
pdiod recv ETF network render Compute Nodes I/O Nodes Steering iGRIDPSC Render real-time data
14
pdiod recv ETF network render Compute Nodes I/O Nodes Steering iGRIDPSC Send steering data back to active job input
15
pdiod recv ETF network render Compute Nodes I/O Nodes Steering iGRIDPSC Dynamically update rendering input
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.