Tor Skeie Feroz Zahid Simula Research Laboratory 27th June 2018

Slides:



Advertisements
Similar presentations
2  Industry trends and challenges  Windows Server 2012: Beyond virtualization  Complete virtualization platform  Improved scalability and performance.
Advertisements

SLA-Oriented Resource Provisioning for Cloud Computing
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.
Virtualization of Fixed Network Functions on the Oracle Fabric Krishna Srinivasan Director, Product Management Oracle Networking Savi Venkatachalapathy.
HPC Pack On-Premises On-premises clusters Ability to scale to reduce runtimes Job scheduling and mgmt via head node Reliability HPC Pack Hybrid.
What is Cloud Computing? o Cloud computing:- is a style of computing in which dynamically scalable and often virtualized resources are provided as a service.
1© Copyright 2015 EMC Corporation. All rights reserved. SDN INTELLIGENT NETWORKING IMPLICATIONS FOR END-TO-END INTERNETWORKING Simone Mangiante Senior.
Cloud Computing (101).
Topics Problem Statement Define the problem Significance in context of the course Key Concepts Cloud Computing Spatial Cloud Computing Major Contributions.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
© 2009 VMware Inc. All rights reserved VMware Cloud Application Platform Gilles Lunzenfichter, VMware EMEA Marketing Director for vFabric
PhD course - Milan, March /09/ Some additional words about cloud computing Lionel Brunie National Institute of Applied Science (INSA) LIRIS.
Introduction To Windows Azure Cloud
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
An Answer to the EC Expert Group on CLOUD Computing Keith G Jeffery Scientific Coordinator.
Copyright 2009 Fujitsu America, Inc. 0 Fujitsu PRIMERGY Servers “Next Generation HPC and Cloud Architecture” PRIMERGY CX1000 Tom Donnelly April
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
Extreme scale parallel and distributed systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward.
1 A Framework for Data-Intensive Computing with Cloud Bursting Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The Ohio.
Challenges towards Elastic Power Management in Internet Data Center.
1 © 2009 Cisco Systems, Inc. All rights reserved.Cisco Confidential Cloud Computing – The Value Proposition Wayne Clark Architect, Intelligent Network.
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
| nectar.org.au NECTAR TRAINING Module 4 From PC To Cloud or HPC.
Robert Mahowald August 26, 2015 VP, Cloud Software, IDC
Web Technologies Lecture 13 Introduction to cloud computing.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-2.
1 TCS Confidential. 2 Objective : In this session we will be able to learn:  What is Cloud Computing?  Characteristics  Cloud Flavors  Cloud Deployment.
PRESENTED BY– IRAM KHAN ISHITA TRIPATHI GAURAV AGRAWAL GAURAV SINGH HIMANSHU AWASTHI JAISWAR VIJAY KUMAR JITENDRA KUMAR VERMA JITENDRA SINGH KAMAL KUMAR.
Big Data analytics in the Cloud Ahmed Alhanaei. What is Cloud computing?  Cloud computing is Internet-based computing, whereby shared resources, software.
Connect A 3 Contact persons: Sandro D'Elia Anne-Marie Sassen Horizon 2020: LEIT – ICT WP
Designing Cisco Data Center Unified Fabric
Communication Needs in Agile Computing Environments Michael Ernst, BNL ATLAS Distributed Computing Technical Interchange Meeting University of Tokyo May.
Issues in Cloud Computing. Agenda Issues in Inter-cloud, environments  QoS, Monitoirng Load balancing  Dynamic configuration  Resource optimization.
Extreme Scale Infrastructure
Rick Fleming HP Federal Practice Lead February 2009
New Paradigms: Clouds, Virtualization and Co.
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
Device Maintenance and Management, Parental Control, and Theft Protection for Home Users Made Easy with Remo MORE and Power of Azure MICROSOFT AZURE APP.
Understanding The Cloud
Organizations Are Embracing New Opportunities
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Introduction to Distributed Platforms
Cloud adoption NECOOST Advisory | June 2017.
Status and Challenges: January 2017
StratusLab Final Periodic Review
Extreme Big Data Examples
StratusLab Final Periodic Review
Cloud Computing & ANalytics
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Grid Computing.
Introduction to Cloud Computing
Management of Virtual Execution Environments 3 June 2008
OpenNebula Offers an Enterprise-Ready, Fully Open Management Solution for Private and Public Clouds – Try It Easily with an Azure Marketplace Sandbox MICROSOFT.
Built on the Powerful Microsoft Azure Platform, Lievestro Delivers Care Information, Capacity Management Solutions to Hospitals, Medical Field MICROSOFT.
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
Cloud Computing Dr. Sharad Saxena.
Scalable SoftNAS Cloud Protects Customers’ Mission-Critical Data in the Cloud with a Highly Available, Flexible Solution for Microsoft Azure MICROSOFT.
EIS Fast-track Revision Om Trivedi Enterprise Information Systems
Big Data Young Lee BUS 550.
Guarantee Hyper-V, System Center Performance and Autoscale to Microsoft Azure with Application Performance Control System from VMTurbo MICROSOFT AZURE.
Cloud Computing: Concepts
The Performance of Big Data Workloads in Cloud Datacenters
Project Overview Konstantinos Tserpes, ICCS/NTUA Final Review Meeting
Agenda Need of Cloud Computing What is Cloud Computing
Can (HPC)Clouds supersede traditional High Performance Computing?
The Intelligent Enterprise and SAP Business One
Convergence of Big Data and Extreme Computing
Presentation transcript:

Building Efficient Clouds for HPC, Big Data, and Deep Learning Middleware and Applications Tor Skeie Feroz Zahid Simula Research Laboratory 27th June 2018 ISC 2018, Frankfurt, Germany

Digital universe is expected to grow to 163 ZB in 2025 (IDC) Emerging applications such as big data analytics and machine learning at scale are redefining HPC workloads Social innovation in modern era increasingly rely on our capacity to efficiently process large data-sets Digital universe is expected to grow to 163 ZB in 2025 (IDC) 30.7 billion IoT devices by 2020 (IDC) Social Networks and multimedia producing huge amount of data 500 million tweets per day 510,000 comments and 136,000 photos per second on facebook Vast amount of biological data available from genome projects Deep learning is a new killer application in the market Artificial intelligence Requires high-performance infrastructure for reducing training time Building Efficient Clouds for HPC, Big Data, and Deep Learning Applications . 2

The need of high-performance computing is no longer restricted to the scientific community Big Data Volume Velocity Veracity Variety Value Extending HPC power through cloud computing can be key to reach a broader audience! Building Efficient Clouds for HPC, Big Data, and Deep Learning Applications . 3

Cloud computing paradigm can extend HPC power to a broader audience Traditional HPC infrastructures High-Performance Interconnects, Specialized Storage, Fat Nodes HPC in the Cloud HPC applications in the Cloud, Both private and public Clouds HPC Clouds Clouds on top of HPC infrastructure, HPC-as-a-Service Hybrid Architectures Cloud bursting, Cloud federation, Cross-Cloud deployments Building Efficient Clouds for HPC, Big Data, and Deep Learning Applications . 4

Traditional Cloud platforms are at large not suitable for high-performance computing applications, mainly due to their network performance Study Summary of Results Can cloud computing reach the Top500? Jeffrey Napper et al., 2009 ..first step would be to offer better interconnects or nodes provisioned with more physical memory to overcome the slower network Case study for running HPC applications in public clouds Qiming He et al., 2010 Most current public clouds are not designed for running scientific applications primarily due to their poor networking capabilities Performance analysis of high performance computing applications on the amazon web services cloud Keith R. Jackson et al., 2010 The interconnect on the EC2 cloud platform severely limits performance and causes significant variability Evaluation of HPC applications on clouds Abhishek Gupta et al., 2011 Cloud is viable platform for only low communication intensive applications Understanding the Performance and Potential of Cloud Computing for Scientific Applications Iman Sadooghi et al., 2015 We can conclude that there is need for cloud infrastructures with more powerful network capacity… Performance evaluation of Amazon Elastic Compute Cloud for NASA high-performance computing applications Piyush Mehrotra et al., 2016 …cannot currently compete with HPC systems… particularly for tightly coupled applications where communication performance is important Evaluating and improving the performance and scheduling of HPC applications in cloud Abhishek Gupta et al., 2016 HPC applications and runtimes must adapt to minimize the impact of slow network, heterogeneity, and multi-tenancy in clouds Cloud versus in-house cluster: Evaluating Amazon cluster compute instances for running MPI applications Yan Zhai et al., 2011 ...communication infrastructure remains the chief problem in scaling MPI programs The Magellan Report on Cloud Computing for Science Megellan Leads, et al., 2011 Scientific applications with minimal communication and I/O are best suited for cloud High performance computing in the cloud: Deployment, performance and cost efficiency Eduardo Roloff et al., 2012 Benchmarks with a higher amount of communication, present worse results Building Efficient Clouds for HPC, Big Data, and Deep Learning Applications . 5

HPC interconnects are not flexible enough for the Clouds Many of the key challenges in HPC stem from the flexibility of the interconnect solutions HPC interconnects are not flexible enough for the Clouds Designed for non-dynamic HPC environments Static configurations, costly network reconfiguration mechanisms Full potential utilization of network interconnect is not achieved No harmony between upper cloud layers and interconnect In datacenter communication vs services over the Internet Runtimes, frameworks, and middleware not adapted to the HPC networks, such as using RDMA communication Emerging software but still limited For example, RDMA-based Apache Hadoop / HBase from Ohio University Building Efficient Clouds for HPC, Big Data, and Deep Learning Applications . 6

Bridging the gap between existing cloud computing technologies and the requirements of HPC/AI/DL applications is necessary Multi-Tenancy Dynamic Environments Virtualization SLAs / SLOs Cloud Computing C High performance Interconnect Performance Isolation Predictability HPC/AI/DL Applications C Building Efficient Clouds for HPC, Big Data, and Deep Learning Applications . 7

Challenges exist at different layers in the Cloud Stack Usability Cloud Federation Automatic Deployments Web GUIs Monitoring and Auditing Cost-Effectiveness Applications Modelling Design time Annotations Data Awareness Scalability Portability Runtime Workload-aware Middleware HPC-aware Runtime Libraries Automation Runtime Adaptation Security MARCO A. S. NETTO et. al, HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges - https://arxiv.org/pdf/1710.08731.pdf Infrastructure Performance Isolation Network Optimization Application-aware Scheduling Elasticity Heterogeneity MARCO A. S. NETTO et. al, HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges - https://arxiv.org/pdf/1710.08731.pdf Building Efficient Clouds for HPC, Big Data, and Deep Learning Applications . 8

Efficiency of HPC / Big Data / AI applications in clouds can be improved at different levels and a co-design approach is needed Cloud Technologies Public Clouds Private Clouds Delivery Models Costs Over multiple GPUs on a single node Over multiple nodes Improve Parallelism A Computationally light apps Models that scale better Apps that adapt Design Better Applications B Hardware with more processing capabilities Interconnects with more bw Improve Hardware C Frameworks Acceleration Single Node Communication NVIDIA cuDNN MKL-DNN NCCL Gloo Multi Node Communication Heterogeneous Hardware CPUs GPUs FPGAs Interconnect Technologies Building Efficient Clouds for HPC, Big Data, and Deep Learning Applications . 9

The users can also use a feedback-driven approach to optimize resource allocations and efficiency of their applications in the cloud This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 731664 This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 731664 http://melodic.cloud/ Building Efficient Clouds for HPC, Big Data, and Deep Learning Applications . 10

Thank you for your attention! In summary, a road map is necessary for address challenges that emerging workloads face in cloud computing environements Thank you for your attention! . 11