Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting July 31, 2014
●ITS implemented a traditional HPC infrastructure based upon: ○Fairshare model ■Priority bump in the queues ○No guaranteed runtimes ■Wasted research time waiting for “turn” in the system ○Segregated clusters ■Limited re-configuration options ■No sharing of High-Speed interconnects One model for all computing needs!
●Research Community needs are bigger! ■Guaranteed response times ■Self-service access – Needed for empowering the research community to consume a model that works for their needs ■Root level access – Needed for enabling customized environments ■Virtualization in addition to HPC resources – flexible configurations and online maintenance of hardware ■Accelerator cards (GPU and Phi) ■Big Data platforms ■Fast Data transfer rates ●Old ITS approach revolved around segmented, and fractured ”clusters” without flexibility and expandability Old model can not keep pace with current and future computing needs!
●ICS-CI2 (High-Performance Research Cloud) ○Fundamental new approach to engineering, deploying, and managing research computing resources ○On-Premises High-Performance Cloud allows for full customization and control of the software and hardware stacks ■Flexible configurations ■Guaranteed run times ■Secure data storage ■High-speed network bandwidth ■Multiple possible Service Level Agreement (SLA) models ■On-demand storage purchasing capacity ○Bursting to public clouds and national Labs (Hybrid Cloud Model) ■Compute bursting for large-scale (10k+ core) jobs ■Participation in XSEDE ○Model used at CERN and other Research Computing Centers ●Stable computing platform ○Tested and verified software catalogs, including operating systems ■Linux, Windows ■C, C++, Java,.NET, Scripting Languages, etc. ○Self-service portals ○Science gateways ○Seamless maintenance ○Enable choice of consumption of resources
Customized Environments …………. To be aligned at later date to conform to governance structures recommended by Research CI Governance Taskforce
Penn State Research Cloud Resource Request ICS-CI2 (On-Premises Research Cloud) Regional and National Labs Public Cloud Resources
●Compute ○ICS-CI2 compute can be “re-provisioned” as needed to accommodate multiple models ○Utilizing GPU enabled, large memory, and blade servers deployed through each proposed phase ○N number of CI-Cores are built on top of converged compute, segmented by security boundaries, networks, firewalls ●Storage ○ICS-CI2 Cloud Storage offers choice of provisioning, backup, and retention models ○ICS-CI2 Storage Automation, Metering, Metrics allow for methodical expansion based on usage and trends ○ICS-CI2 Storage scales to multiple Petabytes ●Network ○Direct integration into the PSU Research Network for fast access and data transfers ○Limited single points of failure to minimize downtime/maintenance windows
Direct integration into research network core
The ITS interactive cluster Hammer was at a breaking point! Hardware Issue ●24 compute nodes ○Slow network ○Slow IO ○Inadequate Memory ○Old, outdated operating system Operational Issue ●Software stack not unified ●Memory can not support number of user requesting resources ○Processes denied running ●Hardware near end-of-life
ICS is installing a new interactive cluster with the following enhancements! Hardware Specifications ●24 compute nodes ○Dual 10 core processors ○256 GB of RAM ○NVIDIA K4000 Graphics Card ○10G Ethernet ●Public 10G ethernet access (10X increase) ●Research Network Ten-Gigabit ethernet access ○Available late fall ●Unified software stack with batch clusters ●Re-usable hardware platform ●Interactive processing ●5X improvement on processing power ●5X improvement on memory * Hardware will be available for 2014 fall semester