Download presentation
Presentation is loading. Please wait.
Published byΕυφήμιος Ταρσούλη Modified over 6 years ago
1
Building Efficient Clouds for HPC, Big Data, and Deep Learning Middleware and Applications
Tor Skeie Feroz Zahid Simula Research Laboratory 27th June 2018 ISC 2018, Frankfurt, Germany
2
Digital universe is expected to grow to 163 ZB in 2025 (IDC)
Emerging applications such as big data analytics and machine learning at scale are redefining HPC workloads Social innovation in modern era increasingly rely on our capacity to efficiently process large data-sets Digital universe is expected to grow to 163 ZB in 2025 (IDC) 30.7 billion IoT devices by 2020 (IDC) Social Networks and multimedia producing huge amount of data 500 million tweets per day 510,000 comments and 136,000 photos per second on facebook Vast amount of biological data available from genome projects Deep learning is a new killer application in the market Artificial intelligence Requires high-performance infrastructure for reducing training time Building Efficient Clouds for HPC, Big Data, and Deep Learning Applications . 2
3
The need of high-performance computing is no longer restricted to the scientific community
Big Data Volume Velocity Veracity Variety Value Extending HPC power through cloud computing can be key to reach a broader audience! Building Efficient Clouds for HPC, Big Data, and Deep Learning Applications . 3
4
Cloud computing paradigm can extend HPC power to a broader audience
Traditional HPC infrastructures High-Performance Interconnects, Specialized Storage, Fat Nodes HPC in the Cloud HPC applications in the Cloud, Both private and public Clouds HPC Clouds Clouds on top of HPC infrastructure, HPC-as-a-Service Hybrid Architectures Cloud bursting, Cloud federation, Cross-Cloud deployments Building Efficient Clouds for HPC, Big Data, and Deep Learning Applications . 4
5
Traditional Cloud platforms are at large not suitable for high-performance computing applications, mainly due to their network performance Study Summary of Results Can cloud computing reach the Top500? Jeffrey Napper et al., 2009 ..first step would be to offer better interconnects or nodes provisioned with more physical memory to overcome the slower network Case study for running HPC applications in public clouds Qiming He et al., 2010 Most current public clouds are not designed for running scientific applications primarily due to their poor networking capabilities Performance analysis of high performance computing applications on the amazon web services cloud Keith R. Jackson et al., 2010 The interconnect on the EC2 cloud platform severely limits performance and causes significant variability Evaluation of HPC applications on clouds Abhishek Gupta et al., 2011 Cloud is viable platform for only low communication intensive applications Understanding the Performance and Potential of Cloud Computing for Scientific Applications Iman Sadooghi et al., 2015 We can conclude that there is need for cloud infrastructures with more powerful network capacity… Performance evaluation of Amazon Elastic Compute Cloud for NASA high-performance computing applications Piyush Mehrotra et al., 2016 …cannot currently compete with HPC systems… particularly for tightly coupled applications where communication performance is important Evaluating and improving the performance and scheduling of HPC applications in cloud Abhishek Gupta et al., 2016 HPC applications and runtimes must adapt to minimize the impact of slow network, heterogeneity, and multi-tenancy in clouds Cloud versus in-house cluster: Evaluating Amazon cluster compute instances for running MPI applications Yan Zhai et al., 2011 ...communication infrastructure remains the chief problem in scaling MPI programs The Magellan Report on Cloud Computing for Science Megellan Leads, et al., 2011 Scientific applications with minimal communication and I/O are best suited for cloud High performance computing in the cloud: Deployment, performance and cost efficiency Eduardo Roloff et al., 2012 Benchmarks with a higher amount of communication, present worse results Building Efficient Clouds for HPC, Big Data, and Deep Learning Applications . 5
6
HPC interconnects are not flexible enough for the Clouds
Many of the key challenges in HPC stem from the flexibility of the interconnect solutions HPC interconnects are not flexible enough for the Clouds Designed for non-dynamic HPC environments Static configurations, costly network reconfiguration mechanisms Full potential utilization of network interconnect is not achieved No harmony between upper cloud layers and interconnect In datacenter communication vs services over the Internet Runtimes, frameworks, and middleware not adapted to the HPC networks, such as using RDMA communication Emerging software but still limited For example, RDMA-based Apache Hadoop / HBase from Ohio University Building Efficient Clouds for HPC, Big Data, and Deep Learning Applications . 6
7
Bridging the gap between existing cloud computing technologies and the requirements of HPC/AI/DL applications is necessary Multi-Tenancy Dynamic Environments Virtualization SLAs / SLOs Cloud Computing C High performance Interconnect Performance Isolation Predictability HPC/AI/DL Applications C Building Efficient Clouds for HPC, Big Data, and Deep Learning Applications . 7
8
Challenges exist at different layers in the Cloud Stack
Usability Cloud Federation Automatic Deployments Web GUIs Monitoring and Auditing Cost-Effectiveness Applications Modelling Design time Annotations Data Awareness Scalability Portability Runtime Workload-aware Middleware HPC-aware Runtime Libraries Automation Runtime Adaptation Security MARCO A. S. NETTO et. al, HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges - Infrastructure Performance Isolation Network Optimization Application-aware Scheduling Elasticity Heterogeneity MARCO A. S. NETTO et. al, HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges - Building Efficient Clouds for HPC, Big Data, and Deep Learning Applications . 8
9
Efficiency of HPC / Big Data / AI applications in clouds can be improved at different levels and a co-design approach is needed Cloud Technologies Public Clouds Private Clouds Delivery Models Costs Over multiple GPUs on a single node Over multiple nodes Improve Parallelism A Computationally light apps Models that scale better Apps that adapt Design Better Applications B Hardware with more processing capabilities Interconnects with more bw Improve Hardware C Frameworks Acceleration Single Node Communication NVIDIA cuDNN MKL-DNN NCCL Gloo Multi Node Communication Heterogeneous Hardware CPUs GPUs FPGAs Interconnect Technologies Building Efficient Clouds for HPC, Big Data, and Deep Learning Applications . 9
10
The users can also use a feedback-driven approach to optimize resource allocations and efficiency of their applications in the cloud This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No Building Efficient Clouds for HPC, Big Data, and Deep Learning Applications . 10
11
Thank you for your attention!
In summary, a road map is necessary for address challenges that emerging workloads face in cloud computing environements Thank you for your attention! . 11
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.