NoHype: Virtualized Cloud Infrastructure without the Virtualization

NoHype: Virtualized Cloud Infrastructure without the Virtualization
Eric Keller, Jakub Szefer, Jennifer Rexford, Ruby Lee Princeton University ISCA 2010

Virtualized Cloud Infrastructure
Run virtual machines on a hosted infrastructure Benefits… Economies of scale Dynamically scale (pay for what you use) The first part of the title is Virtualized Cloud Infrastructure implying that this is a good thing, and it is. The cloud computing we’re talking about essentially allows you to run virtual machines on a hosted infrastructure, and there are many providers to choose from with Amazon EC2 being the most well known. Among the benefits are that there are economies of scale at work here. You are sharing the infrastructure, so costs like cooling and power are cheaper than if everyone had to pay for it individually. But the main benefit is that it allows you to dynamically scale your application. You pay for what you use rather than having to provision for the worst case.

Without the Virtualization
Virtualization used to share servers Software layer running under each virtual machine Guest VM1 Guest VM2 Apps Apps The second part of the title is without the virtualization, implying that this is a bad thing. Virtualization is the technology that is used to share servers. To do this, a software layer known as the hypervisor or virtual machine monitor, runs underneath each of the virtual machines. OS OS Hypervisor servers Physical Hardware

Without the Virtualization
Virtualization used to share servers Software layer running under each virtual machine Malicious software can run on the same server Attack hypervisor Access/Obstruct other VMs Guest VM1 Guest VM2 Apps Apps Why this is bad is because you have no control over what is running in the other virtual machines on the same server. One of these virtual machines can be running malicious software. Just as applications can exploit holes in the operating system, a guest VM can exploit holes in the hypervisor, giving them the ability to access or obstruct other VMs. %1% For cloud computing to get general acceptance, it is necessary to overcome this threat and make computing in the cloud as secure, if not more secure, than in private facilities. %1% This can include access to confidential data, which will be stored encrypted, but to use the data you have to decrypt it, so it’s in plain text in memory. OS OS Hypervisor servers Physical Hardware

Are these vulnerabilities imagined?
No headlines… doesn’t mean it’s not real Not enticing enough to hackers yet? (small market size, lack of confidential data) Virtualization layer huge and growing 100 Thousand lines of code in hypervisor 1 Million lines in privileged virtual machine Derived from existing operating systems Which have security holes The obvious question you might be having is whether this is really a problem. I mean, there haven’t been any headlines like Amazon EC2 was brought down by hackers. But just because there haven’t been headlines, doesn’t mean it’s not a real problem. %1%. I won’t bore you reading from a bugzilla database. Instead, I’ll just say that these virutalization layers are huge, and they’re growing as more and more functionality gets packed in. %2%. Further, they are typically derived from, and share code with, existing operating systems. So if you believe operating systems can have holes, then hypervisors and their privileged virtual machine can as well. %1% Perhaps, it’s not large enough or valuable confidential data hasn’t made it’s way onto these infrastructures to entice the hackers yet. %2% They can be on the order of 100k lines of code in the hypervisor, and over 1M in the privileged virtual machine.

NoHype No hypervisor NoHype removes the hypervisor
There’s nothing to attack Complete systems solution Still retains the needs of a virtualized cloud infrastructure Guest VM1 Guest VM2 Apps Apps Since this hypervisor is the source of vulnerabilities, with NoHype we propose that the hypervisor should be removed all together. If there is no software running under the guest VM, there is nothing the guest VM to attack. We achieve this through a complete systems solution, the processor, I/O and software. Of course, there is a reason virtualization is used in the cloud today so we need to retain the key features which led to its use. %still want a virtualized infrastructure. We OS OS No hypervisor Physical Hardware

Virtualization in the Cloud
Why does a cloud infrastructure use virtualization? To support dynamically starting/stopping VMs To allow servers to be shared (multi-tenancy) Do not need full power of modern hypervisors Emulating diverse (potentially older) hardware Maximizing server consolidation The reason virtualization technology is used is to support the ability to dynamically start and stop virtual machines, and to allow severs to be shared. However, the cloud infrastructure does not need the full power of modern hypervisors. In the cloud, there is no need for emulating a large collection of devices or to maximize server consolidation as you’re simply leasing resources. %% In the cloud, there is no need for emulating a large and diverse, potentially old, collection of hardware. There are no mice, no key boards, no monitors. And in the cloud, the goal is not maximizing server consolidation, one of the major uses of virtualization. In the cloud the customer leases resources, and that is what it is paying for – a VM with 1 GB of memory.

Roles of the Hypervisor
Isolating/Emulating resources CPU: Scheduling virtual machines Memory: Managing memory I/O: Emulating I/O devices Networking Managing virtual machines Push to HW / Pre-allocation Remove Push to side To better understand how we can remove the hypervisor, it is useful to understand the responsibilities of the hypervisor. This includes isolating and emulating the three main resources in computers – CPU, memory, and I/O, performing networking, and managing virtual machines. So to remove the need for a hypervisor which is active in the management of a VM, we essentially replace it with a sort of boot loader which pre-allocates resources, configures virtualized hardware, and then gets out of the way. So the roles of the hypervisor are either pushed into hardware, removed entirely, or pushed to the side. While this is a QOB talk about removing the hypervisor, this is not all that far off. NoHype has an intended double meaning that it may indeed be no hype, that we may be able to realize this now. We’ll first present what we need to remove the hypervisor, then talk about what’s available today afterwards. Not just about architectural features that are available, it’s about how you use them as well. NoHype has a double meaning… “no hype”

Scheduling Virtual Machines
Today Scheduling Virtual Machines Scheduler called each time hypervisor runs (periodically, I/O events, etc.) Chooses what to run next on given core Balances load across cores VMs timer switch I/O switch switch timer The first responsibility of the hypervisor is to schedule the virtual machines to arbitrate access to the CPU cycles. Here, the scheduler is called each time the hypervisor runs – which includes periodically based on a timer to give each VM a turn to run, on I/O events, and so forth. The scheduler chooses what to run next on the given core and it balances the load across cores. hypervisor time

Dedicate a core to a single VM
NoHype Dedicate a core to a single VM Ride the multi-core trend 1 core on 128-core device is ~0.8% of the processor Cloud computing is pay-per-use During high demand, spawn more VMs During low demand, kill some VMs Customer maximizing each VMs work, which minimizes opportunity for over-subscription In order to remove this, in NoHype we dedicate a core to a single Virtual machine. %1%. Riding the multi-core trend where we’re adding more and more cores per processor, 1 core is a small fraction of the processing power. So apps are likely to dynamically grow at coarser granularities. And because cloud computing is pay per use, the customer has incentive to maximize utilization of its VMs. In other words, you kill a VM rather than let it sit idle. This reduces the opportunity for over subscription. %1%At first glance, this may seem wasteful, but consider how you use cloud computing. You dynamically launch and kill virtual machines to scale.

Managing Memory Goal: system-wide optimal usage
Today Managing Memory Goal: system-wide optimal usage i.e., maximize server consolidation Hypervisor controls allocation of physical memory The next role of the hypervisor is managing memory. The goal of virtualization technology today is system-wide optimal usage, for the reason of maximizing server consolidation. The idea is you run VMs which you hope will require a different amount of memory at different times, and you can give them each their needed memory with less total memory. Now, in a private infrastructure which is using virtualization for server consolidation, these are all your own applications, so if memory is tight, you simply hinder one of your applications and you can even prioritize. The hypervisor controls this allocation.

Pre-allocate Memory In cloud computing: charged per unit
NoHype Pre-allocate Memory In cloud computing: charged per unit e.g., VM with 2GB memory Pre-allocate a fixed amount of memory Memory is fixed and guaranteed Guest VM manages its own physical memory (deciding what pages to swap to disk) Processor support for enforcing: allocation and bus utilization In NoHype, to remove the hypervisor which manages memory, we pre-allocate memory. In cloud computing, you’re not optimizing your own server, you leasing resources – for example a VM with 2GB of memory. That’s what you expect to receive. So if the cloud provider over subscribes and can’t give you the memory you pay for, you’ll see a performance hit. In NoHype we preallocate a fixed amount of memory that’s fixed and guaranteed. The guest VM then manages their own physical memory and the processor enforces the allocation and manages the bus utilization.

Emulate I/O Devices Guest sees virtual devices
Today Emulate I/O Devices Guest sees virtual devices Access to a device’s memory range traps to hypervisor Hypervisor handles interrupts Privileged VM emulates devices and performs I/O Priv. VM Guest VM1 Guest VM2 Device Emulation Apps Apps Real Drivers The next role of the hypervisor is to emulate the vast array of devices possible on computers. Any interaction by the VM with the device goes through the hypervisor where software emulates the device and interacts with the physical device. It is common to run this emulation in a privileged virtual machine, but we consider this part of the virtualization layer as the requests go through it. %1% The guest VMs see virtual devices as told to it by virtual firmware. %% Any access to one of these device’s memory range traps to the hypervisor, or from the other direction, interrupts from the real devices are handled by the hypervisor. Then software emulates the device the VM sees and interacts with the real device. OS OS hypercall trap trap Hypervisor Physical Hardware

Emulate I/O Devices Guest sees virtual devices
Today Emulate I/O Devices Guest sees virtual devices Access to a device’s memory range traps to hypervisor Hypervisor handles interrupts Privileged VM emulates devices and performs I/O Priv. VM Guest VM1 Guest VM2 Device Emulation Apps Apps Real Drivers As shown here, where any code on the path can have security holes. %% So a request from a guest VM to interact with its virtual device is as shown here, going through the hypervisor, and the device emulation and drivers in the privileged VM, any of which can have security holes. OS OS hypercall trap trap Hypervisor Physical Hardware

Dedicate Devices to a VM
NoHype Dedicate Devices to a VM In cloud computing, only networking and storage Static memory partitioning for enforcing access Processor (for to device), IOMMU (for from device) Guest VM1 Guest VM2 Apps Apps In NoHype we dedicate a device to a VM so that we can give it direct access to it. In cloud computing there’s only networking and storage, so this becomes a reasonable task. We then use the memory partitioning for enforcing access, which requires support in the processor for requests to the device, and in the chipset for transfers from the device. OS OS Physical Hardware

Virtualize the Devices
NoHype Virtualize the Devices Per-VM physical device doesn’t scale Multiple queues on device Multiple memory ranges mapping to different queues Network Card Classify MAC/PHY Of course, a per-VM physical device doesn’t scale, so in NoHype the devices themselves are virtualized. There are multiple queues on the device, and different memory ranges map to different queues. Processor Chipset Peripheral bus MUX Memory

Networking Ethernet switches connect servers Today server server
The fourth role of today’s hypervisor is networking. In data centers, some topology of Ethernet switches is used to allow the servers in the data center to communicate with one another.

Networking (in virtualized server)
Today Networking (in virtualized server) Software Ethernet switches connect VMs Virtual server Virtual server Software Virtual switch Applying this to virtualized servers, where there are multiple virtual servers running on a single physical server, a software Ethernet switch performs the same functionality as that first hop Ethernet switch shown in the previous slide.

Today Networking (in virtualized server) Software Ethernet switches connect VMs Guest VM1 Guest VM2 Apps Apps OS OS Re-drawing this picture a bit, each of these virtual servers are guest virtual machines, and the software switch is running inside of the virtualization layer, which can be in the hypervisor as shown here. Hypervisor hypervisor

Today Networking (in virtualized server) Software Ethernet switches connect VMs Priv. VM Guest VM1 Guest VM2 Apps Apps Software Switch OS OS Or in the privileged virtual machine, as shown here. %%In either case, this is a complex piece of software and part of the generic virtualization layer which we are removing with NoHype. Hypervisor

Do Networking in the Network
NoHype Do Networking in the Network Co-located VMs communicate through software Performance penalty for not co-located VMs Special case in cloud computing Artifact of going through hypervisor anyway Instead: utilize hardware switches in the network Modification to support hairpin turnaround In NoHype, we believe you should do networking in the network, not on the servers. These software switches really only exist to support communication between co-located virtual machines which will be a special case in the cloud anyway. %1%. Instead, in NoHype we remove this functionality and utilize the hardware switches already in the network. This requires a slight modification to the switches to enable so called hairpin turnarounds, where packets can be forwarded out ports they are received on. %1% There is a performance penalty otherwise, going through an extra software switch. This is a special case in the cloud, where a customer’s virtual machines will likely be located on different servers. But since you go through the hypervisor and device emulation, you might as well perform the Ethernet switching functionality there.

Managing Virtual Machines
Today Managing Virtual Machines Allowing a customer to start and stop VMs Request: Start VM The final responsibility of the hypervisor is to enable management of virtual machines – allowing a customer to start and stop virtual machines. %1%. The customer sends a request over the wide area network to the cloud provider. %1% Perhaps a simplification of the management, but our static nature simplifies it to this. Wide Area Network Cloud Customer Cloud Provider

Managing Virtual Machines
Today Managing Virtual Machines Allowing a customer to start and stop VMs Servers . Request: Start VM Request: Start VM Cloud Manager This is handled by a dedicated server acting as the cloud manager. The cloud manager fetches the VM image from storage, chooses a server to host the VM, and sends the request to that server. Wide Area Network VM images Cloud Customer Cloud Provider

Hypervisor’s Role in Management
Today Hypervisor’s Role in Management Run as application in privileged VM Priv. VM VM Mgmt. The hypervisor’s role in this request starts with some software, the VM management software, running as an application in the privileged VM. Hypervisor Physical Hardware

Today Hypervisor’s Role in Management Receive request from cloud manager Priv. VM VM Mgmt. This management software receives the request form the cloud manager Hypervisor Physical Hardware

Today Hypervisor’s Role in Management Form request to hypervisor Priv. VM VM Mgmt. Forms a request to the hypervisor Hypervisor Physical Hardware

Today Hypervisor’s Role in Management Launch VM Priv. VM Guest VM1 VM Mgmt. Apps OS And the hypervisor then launches the VM. So you can see that the hypervisor, which is involved in all aspects of the VMs operation, is intertwined with the management software as well. Hypervisor Physical Hardware

Decouple Management And Operation
NoHype Decouple Management And Operation System manager runs on its own core Core 0 Core 1 In NoHype, we decouple management of virtual machines from the operations of them. For this, the system manager runs on a dedicated core. %%, which will only be a small fraction as more and more cores get put on a device. System Manager

NoHype Decouple Management And Operation System manager runs on its own core Sends an IPI to start/stop a VM Core 0 Core 1 When it receives a request from the cloud manager, it will send an inter-processor interrupt to another core. System Manager IPI

NoHype Decouple Management And Operation System manager runs on its own core Sends an IPI to start/stop a VM Core manager sets up core, launches VM Not run again until VM is killed Core 0 Core 1 Guest VM2 The handler for the interrupt, called the core manager, will setup the core and launch the VM and is not run again until the VM is killed. The guest VM has no interaction with the system manager or any way to invoke the core manager. %% Note that only the system manager can send this IPI and that there no possible way to invoke the core manager and no interaction with System manager. System Manager Apps Core Manager IPI OS

Removing the Hypervisor Summary
Scheduling virtual machines One VM per core Managing memory Pre-allocate memory with processor support Emulating I/O devices Direct access to virtualized devices Networking Utilize hardware Ethernet switches Managing virtual machines Decouple the management from operation Bringing it all together, NoHype removes the need for the hypervisor. We do so by capitalizing on the unique use case of the cloud allowing us to pre-allocate resources and push functionality to hardware, remove it all together, or push it to the side. %%To remove the need for scheduling virtual machines, in NoHype we run only one VM on a given core. To remove the need for managing memory, we pre-allocate memory with support from the processor. To remove the need for emulating I/O devices, in NoHype, guest VMs get direct access to virtualized devices. To remove the need for networking, we utilize the hardware Ethernet switches. And finally, we still need management software, but it is decoupled from the operation of the VMs.

Security Benefits Confidentiality/Integrity of data Availability
Side channels The reason why we’re removing the hypervisor is for security reasons – namely availability, confidentiality and integrity of data and side channels.

Security Benefits Confidentiality/Integrity of data Availability
Side channels In the interest of time, I’ll only discuss the first. You can see the paper for the others.

Confidentiality/Integrity of Data
Requires access to the data System manager can alter memory access rules But, guest VMs do not interact with the system manager With hypervisor NoHype Registers upon VM exit No scheduling Packets sent through software switch No software switch Memory accessible by hypervisor No hypervisor To affect the confidentiality or integrity of data requires access to the data. With a hypervisor this can be from the registers exposed upon VM exits, packets sent through the software switch, or in the worst case any of memory since the hypervisor has full access. With NoHype there is no scheduling and no VM exits, there is no software switch, and no hypervisor in general. The only way to get at the data is to compromise the system manager, which can alter the memory access rules in the hardware. But VMs do not interact with the system manager, so attacking it is difficult and must come from external. %% With a hypervisor, data can be exposed in the registers whenever the VM exits and the scheduler is run, packets are exposed to the software ethernet switch, and all of the memory is generally available to the hypervisor, so there is the potential to exploit holes in any one of these functions to gain access to any data. With NoHype, there is no scheduling and no VM Exits to expose the registers, there is no software switch to exploit to leak packets, and no hypervisor to compromise to get complete access to memory as the hardware enforces access.

NoHype Double Meaning Means no hypervisor, also means “no hype”
Multi-core processors Available now Extended (Nested) Page Tables SR-IOV and Directed I/O (VT-d) Network cards now, Storage devices near future Virtual Ethernet Port Aggregator (VEPA) Next-generation switches NoHype means no hypervisor, but it has a double meaning in that it is no hype, that this is possible today. Technologies have come out over the years in support of virtualization, and we believe we can use these in specific ways to realize much of NoHype. Multicore processors are available now. The extended page tables to allow VMs to manage their own page tables, SR-IOV to enable a device to announce itself as multiple devices, and directed I/O to allow direct access to the devices. And virtual Ethernet port aggregator, which is going through the IEEE standardization process and coming out in switches which enables the hair-pin turn arounds. Read the paper to see more details on these and how we intend to use them.

Conclusions and Future Work
Trend towards hosted and shared infrastructures Significant security issue threatens adoption NoHype solves this by removing the hypervisor Performance improvement is a side benefit Future work: Implement on current hardware Assess needs for future processors In conclusion, there is a trend towards hosted and shared infrastructures. But there’s a significant security issue that threatens adoption. NoHype solves this by removing the hypervisor. While security was the goal, a side benefit is that since there is no extra software layer, there will be significant performance improvements as a side benefit. As future work, we’re currently going through and implementing NoHype on current hardware, verifying the nohype double meaning, and in doing so see the limitations of current hardware and assess the needs for future processors.

Questions? Contact info:

Backup

Cloud vs. Enterprise Desktop/Enterprise: One organization
Cloud: Shared by many organizations Enterprise: Maximize server consolidation Cloud: Sell discreet units (give guarantees) Desktop: many devices Cloud: Only networking and disk

Slide here, further illustrating the badness???
Initial state: confidential data stored encrypted in network storage 1) VM1 fetches data into memory 2) VM1 decrypts data 3) VM2 attacks hypervisor (or has previously attacked) 4) hypervisor reads plain text from memory, copies into VM2’s address space (or sends out network)

NoHype No hypervisor System architecture to remove the hypervisor
No hypervisor means nothing to attack Retain key needs of virtualized cloud infrastructure Dynamically start/stop VMs Multi-tenancy Guest VM1 Guest VM2 Apps Apps While you could perform some sort of integrity checking or attempt to minimize the hypervisor, these have limitations. Instead, we propose that the hypervisor should be removed all together. If there is no software running under the guest VM, there is nothing to attack. Of course, there is a reason virtualization is used in the cloud today so we still want a virtualized infrastructure. We need to retain the key features which led to the use of virtualization, namely the ability to dynamically start and stop VMs, and the ability to support multi-tenancy. (remove integrity lead in… instead say, since this hypervisor is the source of vulnerabilities, we argue that in the cloud it’s not really needed, so we remove it all together). OS OS No hypervisor Physical Hardware

NoHype Switch Modifications
Need to support ‘hairpin turnaround’ Guest VM1 Guest VM2 Apps Apps OS OS One modification needed to support this is that the first hop Ethernet switches need to be aware of virtualization and need to support hairpin turnarounds. Because of Ethernet’s historically being used as a shared medium, switches do not forward packets out of the same port they were received since it would assume the destination already saw the packet. Physical hardware

NoHype Switch Modifications
Need to support ‘hairpin turnaround’ Guest VM1 Guest VM2 Apps Apps OS OS For communication between co-located VMs, that’s exactly the behavior we want. Physical hardware

Processor Support for Memory
NoHype Processor Support for Memory Per core MMU for subset of memory Guest handles page faults Global MMC for enforcing access and fairness Kill VM on access outside of allowed range Core 0 To do this we need some support in the processor. As each VM is managing their subset of memory, we need a MMU per core which allows the guest to fully manage its own memory. We also need a global multi-core memory controller which enforces access to memory and arbitrates memory bus usage in a fair manner. If a VM access memory it’s not allowed to access, it is simply killed. Machine Phys. MMU Multi-core Memory Controller Guest Phys. Core 1 MMU

Availability Altering VM scheduling Interrupting a core
One VM per core – there is no scheduling Interrupting a core Local interrupt controller with masking Performing large amount of memory or I/O access Accesses are fair and rate-limited Availability can be affected in one of three ways. It can be affected by altering the VM scheduling. In NoHype, there is no scheduling so that cannot happen. One core can blast another with interrupts, which in NoHype will have a local APIC with masking. And one VM can perform large amounts of memory or I/O accesses. In NoHype these are fair and rate limited.

Side Channels Any based on physical analysis outside scope
NoHype shared resources System busses (fair and rate limited) L2 or L3 cache Most damaging to date are based on L1 cache Not shared And finally side channels, while very difficult to completely eliminate, we can render them to be of very low bandwidth. Any based on physical analysis are outside the scope for us. Side channels exist because of shared resources, which include system busses which are fair and rate limited, and L2 and L3 caches. So very limited. The most damaging side channels to date are based on L1 cache, which are not shared.

Confidentiality/Integrity of Data
Requires access to registers or memory Cores are not shared, so cannot access registers Hardware enforces memory access Need to compromise system manager to alter the tables But, guest VMs do not interact with the system manager To affect the confidentiality or integrity of data requires access to registers or memory. In NoHype, cores are not shared, so another cannot access the registers. And hardware enforces memory accesses, so you would need to compromise the system manager to alter these access tables. But, there is no interaction with the system manager, so limited opportunity to compromise.

NoHype: Virtualized Cloud Infrastructure without the Virtualization

Similar presentations

Presentation on theme: "NoHype: Virtualized Cloud Infrastructure without the Virtualization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

NoHype: Virtualized Cloud Infrastructure without the Virtualization

Similar presentations

Presentation on theme: "NoHype: Virtualized Cloud Infrastructure without the Virtualization"— Presentation transcript:

Similar presentations

About project

Feedback