Dynamic Infrastructure for Dependable Cloud Services Eric Keller Princeton University.

Slides:



Advertisements
Similar presentations
All Rights Reserved © Alcatel-Lucent 2009 Enhancing Dynamic Cloud-based Services using Network Virtualization F. Hao, T.V. Lakshman, Sarit Mukherjee, H.
Advertisements

The Platform as a Service Model for Networking Eric Keller, Jennifer Rexford Princeton University INM/WREN 2010.
Virtual Switching Without a Hypervisor for a More Secure Cloud Xin Jin Princeton University Joint work with Eric Keller(UPenn) and Jennifer Rexford(Princeton)
Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
Virtualization and Cloud Computing. Definition Virtualization is the ability to run multiple operating systems on a single physical system and share the.
Seamless BGP Migration with Router Grafting Eric Keller, Jennifer Rexford Princeton University Kobus van der Merwe AT&T Research NSDI 2010.
The Case for Enterprise Ready Virtual Private Clouds Timothy Wood, Alexandre Gerber *, K.K. Ramakrishnan *, Jacobus van der Merwe *, and Prashant Shenoy.
Migrating and Grafting Routers to Accommodate Change Eric Keller Princeton University Jennifer Rexford, Jacobus van der Merwe, Yi Wang, and Brian Biskeborn.
Performance Evaluation of Open Virtual Routers M.Siraj Rathore
Tunis, Tunisia, 28 April 2014 Business Values of Virtualization Mounir Ferjani, Senior Product Manager, Huawei Technologies 2.
Towards Virtual Routers as a Service 6th GI/ITG KuVS Workshop on “Future Internet” November 22, 2010 Hannover Zdravko Bozakov.
Virtual Machine Security Design of Secure Operating Systems Summer 2012 Presented By: Musaad Alzahrani.
Grafting Routers to Accommodate Change Eric Keller Princeton University Oct12, 2010 Jennifer Rexford, Jacobus van der Merwe, Michael Schapira.
NoHype: Virtualized Cloud Infrastructure without the Virtualization Eric Keller, Jakub Szefer, Jennifer Rexford, Ruby Lee ISCA 2010 Princeton University.
Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.
VROOM: Virtual ROuters On the Move Jennifer Rexford Joint work with Yi Wang, Eric Keller, Brian Biskeborn, and Kobus van der Merwe
Refactoring Router Software to Minimize Disruption Eric Keller Advisor: Jennifer Rexford Princeton University Final Public Oral - 8/26/2011.
VROOM: Virtual ROuters On the Move Jennifer Rexford Joint work with Yi Wang, Eric Keller, Brian Biskeborn, and Kobus van der Merwe (AT&T)
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
Rethinking Routers in the Age of Virtualization Jennifer Rexford Princeton University
VROOM: Virtual ROuters On the Move Yi Wang (Princeton) With: Kobus van der Merwe (AT&T Labs - Research) Jennifer Rexford (Princeton)
COS 461: Computer Networks
Virtualization for Cloud Computing
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
Microsoft Virtual Academy Module 4 Creating and Configuring Virtual Machine Networks.
Network Topologies.
VMware vSphere 4 Introduction. Agenda VMware vSphere Virtualization Technology vMotion Storage vMotion Snapshot High Availability DRS Resource Pools Monitoring.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Virtual ROuters On the Move (VROOM): Live Router Migration as a Network-Management Primitive Yi Wang, Eric Keller, Brian Biskeborn, Kobus van der Merwe,
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
Hosting Virtual Networks on Commodity Hardware VINI Summer Camp.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Jakub Szefer, Eric Keller, Ruby B. Lee Jennifer Rexford Princeton University CCS October, 2011 報告人:張逸文.
Common Devices Used In Computer Networks
+ CS 325: CS Hardware and Software Organization and Architecture Cloud Architectures.
NoHype: Virtualized Cloud Infrastructure without the Virtualization Eric Keller, Jakub Szefer, Jennifer Rexford, Ruby Lee (ISCA follow up soon to.
Benefits: Increased server utilization Reduced IT TCO Improved IT agility.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Virtualization Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is licensed.
 Network Segments  NICs  Repeaters  Hubs  Bridges  Switches  Routers and Brouters  Gateways 2.
Virtual Machine Security Systems Presented by Long Song 08/01/2013 Xin Zhao, Kevin Borders, Atul Prakash.
Virtualization: Not Just For Servers Hollis Blanchard PowerPC kernel hacker.
The Best of Both Worlds with On-Demand Virtualization Thawan Kooburat and Michael M. Swift On-Demand Virtualization allows systems to benefit from virtualization.
Politecnico di Torino Dipartimento di Automatica ed Informatica TORSEC Group Performance of Xen’s Secured Virtual Networks Emanuele Cesena Paolo Carlo.
VMware vSphere Configuration and Management v6
SECURING SELF-VIRTUALIZING ETHERNET DEVICES IGOR SMOLYAR, MULI BEN-YEHUDA, AND DAN TSAFRIR PRESENTED BY LUREN WANG.
Security Vulnerabilities in A Virtual Environment
1 | © 2015 Infinera Open SDN in Metro P-OTS Networks Sten Nordell CTO Metro Business Group
Full and Para Virtualization
Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,
THAWAN KOOBURAT MICHAEL SWIFT UNIVERSITY OF WISCONSIN - MADISON 1 The Best of Both Worlds with On-Demand Virtualization.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Computer System Structures
Chapter 6: Securing the Cloud
Instructor Materials Chapter 1: LAN Design
CIS 700-5: The Design and Implementation of Cloud Networks
Chapter 4: Routing Concepts
Virtualization overview
Refactoring Router Software to Minimize Disruption
NoHype: Virtualized Cloud Infrastructure without the Virtualization
Virtualization Layer Virtual Hardware Virtual Networking
NTHU CS5421 Cloud Computing
Specialized Cloud Architectures
Windows Virtual PC / Hyper-V
Presentation transcript:

Dynamic Infrastructure for Dependable Cloud Services Eric Keller Princeton University

Cloud Computing Services accessible across a network Available on any device from any where No installation or upgrade 2 Documents Videos Photos

What makes it cloud computing? Dynamic infrastructure with illusion of infinite scale –Elastic and scalable 3

What makes it cloud computing? Dynamic infrastructure with illusion of infinite scale –Elastic and scalable Hosted infrastructure (public cloud) Benefits… –Economies of scale –Pay for what you use –Available on-demand (handle spikes) 4

Cloud Services Increasingly demanding → social media → streaming (live) video 5

Cloud Services Increasingly demanding → social media → streaming (live) video Increasingly critical business software → smart power grid → healthcare 6

Cloud Services Increasingly demanding → social media → streaming (live) video Increasingly critical business software → smart power grid → healthcare 7 Available Secure High performance Dependable

“In the Cloud” Documents Videos Photos 8

“In the Cloud” But it’s a real infrastructure with real problems Not controlled by the user Not even controlled by the service provider 9

Today’s Network Infrastructure 10

Network operators need to make changes –Install, maintain, upgrade equipment –Manage resource (e.g., bandwidth) Today’s Network Infrastructure 11

Network operators need to deal with change –Install, maintain, upgrade equipment –Manage resource (e.g., bandwidth) Today’s (Brittle) Network Infrastructure 12

Single update partially brought down Internet –8/27/10:House of Cards –5/3/09:AfNOG Takes Byte Out of Internet –2/16/09:Reckless Driving on the Internet Today’s (Buggy) Network Infrastructure 13 [Renesys]

Single update partially brought down Internet –8/27/10:House of Cards –5/3/09:AfNOG Takes Byte Out of Internet –2/16/09:Reckless Driving on the Internet Today’s (Buggy) Network Infrastructure 14 How to build a Cybernuke [Renesys]

Today’s Computing Infrastructure Virtualization used to share servers –Software layer running under each virtual machine 15 Physical Hardware Hypervisor OS Apps Guest VM1Guest VM2

Today’s (Vulnerable) Computing Infrastructure Virtualization used to share servers –Software layer running under each virtual machine Malicious software can run on the same server –Attack hypervisor –Access/Obstruct other VMs 16 Physical Hardware Hypervisor OS Apps Guest VM1Guest VM2

Dependable Cloud Services? 17 Brittle/Buggy network infrastructure Vulnerable computing infrastructure

Interdisciplinary Systems Research 18 Across computing and networking

Interdisciplinary Systems Research 19 Across computing and networking Across layers within computing/network node Physical Hardware Virtualization OS Apps Computer Architecture Operating system / network stack Distributed Systems / Routing software Rethink layers

Dynamic Infrastructure for Dependable Cloud Services Part I: Make network infrastructure dynamic –Rethink the monolithic view of a router –Enabling network operators to accommodate change Part II: Address security threat in shared computing –Rethink the virtualization layer in computing infrastructure –Eliminating security threat unique to cloud computing 20

21 Migrating and Grafting Routers to Accommodate Change [SIGCOMM 2008] [NSDI 2010] Part I

The Two Notions of “Router” The IP-layer logical functionality, and the physical equipment 22 Logical (IP layer) Physical

The Tight Coupling of Physical & Logical Root cause of disruption is monolithic view of router (hardware, software, links as one entity) 23 Logical (IP layer) Physical

The Tight Coupling of Physical & Logical Root cause of disruption is monolithic view of router (hardware, software, links as one entity) 24 Logical (IP layer) Physical

Breaking the Tight Couplings Root cause of disruption is monolithic view of router (hardware, software, links as one entity) 25 Logical (IP layer) Physical Decouple logical from physical Allowing nodes to move around Decouple links from nodes Allowing links to move around

Planned Maintenance 26 Shut down router to… –Replace power supply –Upgrade to new model –Contract network Add router to… –Expand network

Planned Maintenance Migrate logical router to another physical router 27 A B VR-1

Planned Maintenance Perform maintenance 28 A B VR-1

Planned Maintenance Migrate logical router back NO reconfiguration, NO reconvergence 29 A B VR-1

Planned Maintenance Could migrate external links to other routers –Away from router being shutdown, or –To router being added (or brought back up) 30 OSPF or Fast re-route for internal links

Customer Requests a Feature Network has mixture of routers from different vendors * Rehome customer to router with needed feature 31

Traffic Management Typical traffic engineering: * adjust routing protocol parameters based on traffic Congested link 32

Traffic Management Instead… * Rehome customer to change traffic matrix 33

Migrating and Grafting Virtual Router Migration (VROOM) [SIGCOMM 2008] –Allow (virtual) routers to move around –To break the routing software free from the physical device it is running on –Built prototype with OpenVZ, Quagga, NetFPGA or Linux Router Grafting [NSDI 2010] –To break the links/sessions free from the routing software instance currently handling it 34

Router Grafting: Breaking up the router 35 Send state Move link

Router Grafting: Breaking up the router 36 Router Grafting enables this breaking apart a router (splitting/merging).

Not Just State Transfer 37 Migrate session AS100 AS200 AS400 AS300

Not Just State Transfer 38 Migrate session AS100 AS200 AS400 AS300 The topology changes (Need to re-run decision processes)

Goals Routing and forwarding should not be disrupted –Data packets are not dropped –Routing protocol adjacencies do not go down –All route announcements are received Change should be transparent –Neighboring routers/operators should not be involved –Redesign the routers not the protocols 39

Challenge: Protocol Layers BGP TCP IP BGP TCP IP Migrate Link Migrate State Exchange routes Deliver reliable stream Send packets Physical Link A B C 40

Physical Link BGP TCP IP BGP TCP IP Migrate Link Migrate State Exchange routes Deliver reliable stream Send packets Physical Link A B C 41

Unplugging cable would be disruptive 42 Physical Link Move Link neighboring network network making change

Unplugging cable would be disruptive Links are not physical wires –Switchover in nanoseconds mi 43 Physical Link Move Link Optical Switches network making change neighboring network

IP BGP TCP IP BGP TCP IP Migrate Link Migrate State Exchange routes Deliver reliable stream Send packets Physical Link A B C 44

IP address is an identifier in BGP Changing it would require neighbor to reconfigure –Not transparent –Also has impact on TCP (later) 45 Changing IP Address mi Move Link network making change neighboring network

IP address not used for global reachability –Can move with BGP session –Neighbor doesn’t have to reconfigure 46 Re-assign IP Address mi Move Link network making change neighboring network

TCP BGP TCP IP BGP TCP IP Migrate Link Migrate State Exchange routes Deliver reliable stream Send packets Physical Link A B C 47

Dealing with TCP TCP sessions are long running in BGP –Killing it implicitly signals the router is down BGP and TCP extensions as a workaround (not supported on all routers) 48

Migrating TCP Transparently Capitalize on IP address not changing –To keep it completely transparent Transfer the TCP session state –Sequence numbers –Packet input/output queue (packets not read/ack’d) 49 TCP(data, seq, …) send() ack TCP(data’, seq’) recv() app OS

BGP TCP IP BGP TCP IP Migrate Link Migrate State Exchange routes Deliver reliable stream Send packets Physical Link A B C 50

BGP: What (not) to Migrate Requirements –Want data packets to be delivered –Want routing adjacencies to remain up Need –Configuration –Routing information Do not need (but can have) –State machine –Statistics –Timers Keeps code modifications to a minimum 51

Routing Information mi Could involve remote end-point –Similar exchange as with a new BGP session –Migrate-to router sends entire state to remote end-point –Ask remote-end point to re-send all routes it advertised Disruptive –Makes remote end-point do significant work 52 Move Link Exchange Routes mi

Routing Information (optimization) Migrate-from router send the migrate-to router: The routes it learned –Instead of making remote end-point re-announce The routes it advertised –So able to send just an incremental update 53 mi Move Link Incremental Update Send routes advertised/learned mi

Migration in The Background 54 Migration takes a while –A lot of routing state to transfer –A lot of processing is needed Routing changes can happen at any time Migrate in the background mi Move Link

Prototype Added grafting into Quagga –Import/export routes, new ‘inactive’ state –Routing data and decision process well separated Graft daemon to control process SockMi for TCP migration 55 Modified Quagga graft daemon Linux kernel SockMi.ko Graftable Router Handler Comm Linux kernel click click.ko Emulated link migration Quagga Unmod. Router Linux kernel

Evaluation Mechanism: Impact on migrating routers Disruption to network operation Application: Traffic engineering 56

Impact on Migrating Routers How long migration takes –Includes export, transmit, import, lookup, decision –CPU Utilization roughly 25% 57 Between Routers 0.9s (20k) 6.9s (200k)

Disruption to Network Operation Data traffic affected by not having a link –nanoseconds Routing protocols affected by unresponsiveness –Set old router to “inactive”, migrate link, migrate TCP, set new router to “active” –milliseconds 58

Internet2 topology, and traffic data Developed algorithms to determine links to graft Traffic Engineering Evaluation 59

Internet2 topology, and traffic data Developed algorithms to determine links to graft Traffic Engineering Evaluation 60 Network can handle more traffic (at same level of congestion)

Router Grafting Conclusions Enables moving a single link with… –Minimal code change –No impact on data traffic –No visible impact on routing protocol adjacencies –Minimal overhead on rest of network Applying to traffic engineering… –Enables changing ingress/egress points –Networks can handle more traffic 61

62 Virtualized Cloud Infrastructure without the Virtualization [ISCA 2010] Part II

Today’s (Vulnerable) Computing Infrastructure Virtualization used to share servers –Software layer running under each virtual machine Malicious software can run on the same server –Attack hypervisor –Access/Obstruct other VMs 63 Physical Hardware Hypervisor OS Apps Guest VM1Guest VM2

Is this Problem Real? No headlines… doesn’t mean it’s not real –Not enticing enough to hackers yet? (small market size, lack of confidential data) Virtualization layer huge and growing Derived from existing operating systems –Which have security holes 64

NoHype NoHype removes the hypervisor –There’s nothing to attack –Complete systems solution –Still retains the needs of a virtualized cloud infrastructure 65 Physical Hardware OS Apps Guest VM1Guest VM2 No hypervisor

Virtualization in the Cloud Why does a cloud infrastructure use virtualization? –To support dynamically starting/stopping VMs –To allow servers to be shared (multi-tenancy) Do not need full power of modern hypervisors –Emulating diverse (potentially older) hardware –Maximizing server consolidation 66

Roles of the Hypervisor Isolating/Emulating resources –CPU: Scheduling virtual machines –Memory: Managing memory –I/O: Emulating I/O devices Networking Managing virtual machines 67 Push to HW / Pre-allocation Remove Push to side

Scheduling Virtual Machines Scheduler called each time hypervisor runs (periodically, I/O events, etc.) –Chooses what to run next on given core –Balances load across cores 68 hypervisor timer switch I/O switch timer switch VMs time Today

Dedicate a core to a single VM Ride the multi-core trend –1 core on 128-core device is ~0.8% of the processor Cloud computing is pay-per-use –During high demand, spawn more VMs –During low demand, kill some VMs –Customer maximizing each VMs work, which minimizes opportunity for over-subscription 69 NoHype

Managing Memory Goal: system-wide optimal usage –i.e., maximize server consolidation Hypervisor controls allocation of physical memory 70 Today

Pre-allocate Memory In cloud computing: charged per unit –e.g., VM with 2GB memory Pre-allocate a fixed amount of memory –Memory is fixed and guaranteed –Guest VM manages its own physical memory (deciding what pages to swap to disk) Processor support for enforcing: –allocation and bus utilization 71 NoHype

Emulate I/O Devices Guest sees virtual devices –Access to a device’s memory range traps to hypervisor –Hypervisor handles interrupts –Privileged VM emulates devices and performs I/O 72 Physical Hardware Hypervisor OS Apps Guest VM1Guest VM2 Real Drivers Priv. VM Device Emulation trap hypercall Today

Guest sees virtual devices –Access to a device’s memory range traps to hypervisor –Hypervisor handles interrupts –Privileged VM emulates devices and performs I/O Emulate I/O Devices 73 Physical Hardware Hypervisor OS Apps Guest VM1Guest VM2 Real Drivers Priv. VM Device Emulation trap hypercall Today

Dedicate Devices to a VM In cloud computing, only networking and storage Static memory partitioning for enforcing access –Processor (for to device), IOMMU (for from device) 74 Physical Hardware OS Apps Guest VM1Guest VM2 NoHype

Virtualize the Devices Per-VM physical device doesn’t scale Multiple queues on device –Multiple memory ranges mapping to different queues 75 ProcessorChipset Memory Classify MUX MAC/PHY Network Card Peripheral bus NoHype

Ethernet switches connect servers Networking 76 server Today

Software Ethernet switches connect VMs Networking (in virtualized server) 77 Virtual server Software Virtual switch Today

Software Ethernet switches connect VMs Networking (in virtualized server) 78 OS Apps Guest VM1 Hypervisor OS Apps Guest VM2 hypervisor Today

Software Ethernet switches connect VMs Networking (in virtualized server) 79 OS Apps Guest VM1 Hypervisor OS Apps Guest VM2 Software Switch Priv. VM Today

Do Networking in the Network Co-located VMs communicate through software –Performance penalty for not co-located VMs –Special case in cloud computing –Artifact of going through hypervisor anyway Instead: utilize hardware switches in the network –Modification to support hairpin turnaround 80 NoHype

Removing the Hypervisor Summary Scheduling virtual machines –One VM per core Managing memory –Pre-allocate memory with processor support Emulating I/O devices –Direct access to virtualized devices Networking –Utilize hardware Ethernet switches Managing virtual machines –Decouple the management from operation 81

NoHype Double Meaning Means no hypervisor, also means “no hype” Multi-core processors Extended Page Tables SR-IOV and Directed I/O (VT-d) Virtual Ethernet Port Aggregator (VEPA) 82

Prototype Xen as starting point Pre-configure all resources Support for legacy boot –Use known good kernel (i.e., non-malicious) –Temporary hypervisor –Before switching to user code, switch off hypervisor 83 Xen Guest VM1 Priv. VM xm core kernel Kill VM

Improvements for Future Technology Main Limitations: Inter-processor Interrupts Side channels Legacy boot 84

Improvements for Future Technology Main Limitations: Inter-processor Interrupts Side channels Legacy boot 85 Processor Architecture (minor change) Processor Architecture Operating Systems in Virtualized Environments

NoHype Conclusions Significant security issue threatens cloud adoption NoHype solves this by removing the hypervisor Performance improvement is a side benefit 86

Brief Overview of My Other Work Software reliability in routers Reconfigurable computing 87

Software Reliability in Routers 88 CPU OS Routing Software CPU OS “Hypervisor” FPGA Router Bugs Performance Wall Routing Software Routing Software Routing Software

Reconfigurable Computing FPGAs in networking –Click + G: a domain specific design environment Taking advantage of reconfigurability –JBits, Self-reconfiguration –Demonstration applications (e.g., bio-informatics, DSP) 89 FPGA alongside CPUs FPGAs in network components

Future Work Computing –Securing the cloud –Rethink server architecture in large data centers Networking –Hosted and shared network infrastructure –Refactoring routers to ease management 90

91 “The Network is the Computer” [John Gage ‘84] Exciting time when this is becoming a reality

Questions? Contact info: 92