Download presentation
Presentation is loading. Please wait.
1
Boyd Wilson (Boydw at Omnibond dot com)
Cloud HPC Overview Boyd Wilson (Boydw at Omnibond dot com) March 2016
2
Outline Overview Public Cloud History -> Present Problem
Historical Issues Security / Compliance Cost Problem Building Blocks of HPC & Big Data Clusters AWS Security Networking Compute Instance Storage How its Built Compute Groups & Schedulers High Performance Working or Scratch Storage Storage Access Software Collaboration Demo
3
Public Cloud History -> Present
Initially the discussion around the public cloud was comparing it to on premise virtualization What people are learning now is the public cloud is really a dynamic API driven infrastructure Forward looking companies have used this API driven infrastructure to leap ahead: Netflix, Airbnb, Yelp, Expedia, Adobe, Pinterest, zynga, gilt, Mlbam, Slack, Foursquare, Lyft, Dow Jones, Bristol -Myers Squibb, etc…
4
Security & Compliance of the Public Cloud
CSA ISO 9001 ISO 27001 ISO 27018 MPAA CJIS DIACAP DoD FDA FedRamp FERPA FIPS HIPAA GxP ITAR NIST EU Data Protection IT-Grundschutz G-Cloud Malaysian Privacy Consideration MLPS MTCS Singapore Privacy Consideration IRAP New Zealand Privacy Consideration
5
Costs of Public Clouds (2013-2014)
Source: RBC Capital Markets, Company Reports
6
Economies of Scale? Computers = public facing
7
Price War II is coming AWS – the Gorilla in the space
Gartner May 2015, “10x bigger than its next 14 competitors combined”, “5x the cloud capacity in use than the aggregate total of the other 14 providers” Azure – Investing Heavily Just Released ARM (not the processor) Supports IB Google – Silently Releasing more and more AWS like services
8
Confessions of a Former Data Center Director
Power and Cooling (kWh) Compute Capacity (cost per GB Ram) Storage (cost per TB) Network (cost per port) Costs always calculated at Max Utilization Lets not discuss labor, it’s a sunk cost… Time to Use (depreciation) Per unit costs go down as more use a resource, there is a cost associated with delayed adoption Headroom tape library example
9
Social Side of Funding Open Questions (Conversation with Rick)
When does the are the break even costs vs. local resource? How do we Compare at scale? How does a site ramp up to the cloud (Training)? How would funding look if certain places went Public Cloud only? How would funding agencies recognize the Public Cloud wrt funding?
10
Example AWS EC2 Instance Pricing
11
Building Blocks
13
How to pull it all together
14
CloudyCluster Goals On Demand HPC and BigData Resources in AWS
Compute Instances High Performance Storage Choice of Schedulers (Initially Torque) Simple Deployment, pausing and deletion from phone, tablet or desktop Elastic HPC based on Jobs Submitted with CCQ Available in the AWS Marketplace with a pay as you go model (payment goes through amazon)
15
Security Virtual Private Cloud IAM Roles and Permissions
Provides network security layer Public and Private Subnet options Requires Bastion Host for SSH access Amazon VPC IAM Roles and Permissions Assign Roles Permissions to create and interact with AWS Constructs Option to assign Roles to Instances, enabling them to perform actions via APIs Security Groups provide for restrictions on network interfaces roles IAM permissions
16
Networking Subnets NAT Instances
Public and private options within a VPC Dynamically calculated subnet-mask based upon the number of instances requested Subnets NAT Instances Provide external access (Internet) for instances in the VPC NAT
17
Compute Amazon Machine Image Compute Instances Is the unit of compute
Can be of many OS types and Flavors Can have the software needed preinstalled Customers can add their own SW and save a new AMI AMI Compute Instances The running instantiation of an AMI Option to create an Auto Scaling group with policies for increasing/decreasing the number of instances based on workload through CCQ. Can assign Roles to Instances enabling software on an instance to perform AWS actions via APIs Compute Instance Auto Scaling
18
Instance Storage & Metadata
Elastic Block Storage (EBS) Volume Attached to an instance for block level storage IOPS can be configured at creation time EBS Local Instance Storage Storage volumes available on the local instance SSD and Rotational Types of varying sizes depending on Instance type Local Instance Storage EFS DynamoDB NoSQL data service provided by AWS DynamoDB
19
How CloudyCluster is Built
20
Compute Groups & Schedulers
A CloudyCluster Compute Group is an AWS auto-scaling group of a given instance type, with all instances configured to work with the same scheduler. Compute Group 1 C-1 C-2 C-3 C-4 Schedulers CloudyCluster provides options for a scheduler for one or more compute groups. Torque/Maui, Slurm and SGE are planned scheduler options, initial release supports Torque/Maui Compute groups are automatically registered with the corresponding scheduler as they are added. Scheduler Compute Scaling option for compute groups If a scheduler is configured for elastic scaling through CCQ dispatcher, jobs will drive the instance launching and post-job-termination automatically C-5 C-6 C-7 C-8 Utility Torque1 SGE1 Condor1
21
Working / Scratch and Home Storage
Working / Scratch Storage A CloudyCluster working or scratch storage automatically combines multiple instances and EBS storage into a unified high performance parallel file system. Future versions will allow storage across the compute instance local storage Option to configure automatic failover instances if an instance dies The working / scratch file system is automatically mounted on every node in the cluster. Option to configure WebDav (via an Login Instance) availability for the working / scratch file system and or EFS OrangeFS1 WS-1 WS-2 WS-3 WS-4 WS-5 WS-6 WS-7 WS-8 Utility Other Storage CloudyCluster offers EFS support (NFS service by AWS) The EFS file system is automatically mounted on every compute node if selected. EFS
22
Storage Access Storage Access
OrangeFS1 Storage Access Option to make Working / Scratch and EFS storage automatically accessible from WebDav Data integration can also be accomplished with Globus as a supported storage access methods. iRODS will be integrated in the future Future development targets simplified CloudyCluster data loading and results retrieval DynamoDB Stores Metadata for CloudyCluster User and collaboration data. CCQ job and data WS-1 WS-2 WS-3 WS-4 WS-5 WS-6 WS-7 WS-8 Utility Access Instance EFS DynamoDB
23
All Together Compute Groups
Public Subnet Scheduler Compute Groups Login Instance: WebDAV Globus DynamoDB NAT Highly Available Working / Scratch OrangeFS Storage Management Instance
24
CCQ - Elastic HPC Dispatching
CCQ holds job determines and launches instances needed CCQ Sends the job to the scheduler when ready Submit Job Through CCQ Public Subnet Login Instance WebDAV Scheduler launches normally DynamoDB Scheduler If no jobs in the queue for that instance type near the hour, instances are terminated
25
HPC Software Included You can also add your own (ex EMC 2-Tier)
Ambertools ANN ATLAS BLAS Blast Blender Burrows-Wheeler Aligner CESM GROMACS LAMM NCAR NCL NCO Nwchem OpenFoam PAPI Paraview Quantum Espresso SAMtools WRF You can also add your own (ex EMC 2-Tier)
26
HPC Infrastructure and Libs Included
Boost Cuda Toolkit Docker FFTW FLTK GCC Gengetopt GRIB2 GSL Hadoop HDF5 ImageMagick JasPer NetCDF NumPy Octave OpenCV OpenMPI PROJ R Rmpi SciPy SWIG WGRIB UDUNITS JasPer Octave OpenCV OpenMPI PROJ R Rmpi SciPy SWIG WGRIB UDUNITS
27
Collaboration Have the ability to create collaborations
Invite other collaborators to CloudyCluster Initially can share Google Drive Folders Oauth and Shib InCommon support
28
Other Items To run CloudyCluster you may have to ask for some of the initial AWS limits be raised: The number of Instances are initially limited to 20 Some instances are limited further. Read up on limits before you attempt to spin up a larger cluster. The number of EIPs is limited to 5 5 VPCs per Region (each cluster requires a VPC) Billing Alerts Are good for general cloud usage
29
See CloudyCluster.com for videos, Docs and Quickstart Guide
Demo See CloudyCluster.com for videos, Docs and Quickstart Guide
31
Conclusion Imagine a researcher needing to solve the next big problem sitting in front of them in their area of expertise and having the ability to go to a Public Cloud Marketplace and launch, manage and easily maintain a high performance computing infrastructure, without needed to go through long procurement processes and start to get the results needed. Now multiply this by the number of people staring at their monitors wondering how they could possibly compute what is needed. …and the advancements and innovation in the world just exponentially increased.
32
Omnibond.com Info at: CloudyCluster.com
33
Intelligent Transportation Solutions
Solution Areas Intelligent Transportation Solutions Identity Manager Drivers & Sentinel Connectors Parallel Scale-Out Storage Software Social Media Interaction System
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.