Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Energy Research Scientific Computing Center (NERSC) HPC In a Production Environment Nicholas P. Cardo NERSC Center Division, LBNL November 19,

Similar presentations


Presentation on theme: "National Energy Research Scientific Computing Center (NERSC) HPC In a Production Environment Nicholas P. Cardo NERSC Center Division, LBNL November 19,"— Presentation transcript:

1 National Energy Research Scientific Computing Center (NERSC) HPC In a Production Environment Nicholas P. Cardo NERSC Center Division, LBNL November 19, 2003

2 Scientific Computing Climate Chemistry Physics Nano-Science Genomics Molecular Modeling Materials Simulation of Large Systems Algorithms Development

3 System Configuration 184 Compute Nodes 16 GPFS Nodes 4 Service Nodes 3 Login Nodes 1 Network/Admin Nodes 24.7 TB Formatted SSA 13 Homes @ ~500 GB Scratch @ ~13 TB 4 Nodes @ 64 GB 64 Nodes @ 32 GB 140 Nodes @ 16 GB

4 System Utilization Hours

5 Job Size Breakdown Hours Scaling Efforts

6 Large Jobs Percent Scaling Efforts 50%

7 System Expanded March 2003 The System Doubled Difficult Decision: –Change in operating model, single large scale production system –Cable length limitations required existing hardware to be relocated –Integration with minimal disruption of service

8 System Configuration 380 Compute Nodes 20 GPFS Nodes 8 Service Nodes 6 Login Nodes 2 Network/Admin Nodes 44.7 TB SSA Disk ~33 TB Scratch +106% +25% +100% +80% +153%

9 SCSI Disks 2 x 36.4 GB SCSI drives Mirrored for availability 36.4 GB available space rootvg (36.4 GB) 36.4 GB

10 SSA Disks Hot Spare hdisk x hdisk y hdisk z 16 drives per drawer RAID 5 for RAS Each node twintailed to five other nodes node in the same frame 3 Groups per drawer

11 Networking Login Node Network Node Jumbo Frame Production Jumbo Frame Production

12 Fun Facts 39,936 DIMMS 7.7 TB Memory 832 SCSI Disks 29.6 TB SCSI Disks 6,656 Processors 35 Miles of Cable 30 Gigabit Adapters 210 SSA Adapters 3,440 SSA Disks 65.4 TB raw SSA

13 System Utilization Hours

14 Job Size Breakdown Hours

15 New Batch Configuration premium regular low interactive debug pre_128 pre_32 pre_1 reg_128 reg_32 reg_1 reg_1l interactive debug low Class Of Service Job Class high low Priority

16 System Utilization Hours

17 Job Size Breakdown Hours

18 Large Jobs allocation depletion Percent 50%

19 Job Efficiency Hours

20 Performance Variation Performance variation problem detected. Original nodes appeared to performed slower than nodes added into the system. Hardware swapped between original nodes and new nodes, no improvement. Accounting showed occurrence of specific commands significantly higher on original nodes. Four problem management definitions found to be deactivated but still executing constantly on original nodes. Analysis performed by NERSC’s David Skinner

21 FY04 System Utilization Hours

22 FY04 Job Size Breakdown Hours

23 FY04 Large Jobs 50% Percent

24 Job Efficiency

25


Download ppt "National Energy Research Scientific Computing Center (NERSC) HPC In a Production Environment Nicholas P. Cardo NERSC Center Division, LBNL November 19,"

Similar presentations


Ads by Google