Neutron at Scale Justin Hammond - Developer Andy Hill - Systems Engineer Chad Norgan - Systems Engineer
Scope of the Talk Rackspace is early in Neutron implementation Migrating from older versions of Quantum/Melange used since the launch of our public cloud Scope of this talk is primarily Nova ⬄ Neutron interaction and the challenges we faced deploying Neutron at scale
What we mean when we say “at scale” Tens of thousands of compute nodes Hundreds of thousands of instances Most instances have two or more ports RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Implementation Requirements Maintain backwards compatibility with existing products Neutron will be the ultimate authoritative source for network state IP Address Management (IPAM) Modular network drivers so Neutron can service heterogeneous port types Enable new products to easily integrate into our public cloud offering RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Implementation Details Quark Plugin: Open source plugin for Neutron v2 API with IPAM Custom database migration from Melange/Quantum->Neutron/Quark Wafflehaus middleware collection RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Rackspace’s Neutron Implementation Active/Passive Load Balancers Neutron-api nodes running quark plugin with wafflehaus Active/Passive database with slave Our neutron service consists of Pair of load balancers Handle routing the incoming requests to available nodes Health checks of pool members Anywhere from 2 to 8 Neutron-api nodes Each node running worker/core Service neutron requests Wafflehaus on WSGI layer DNS Auth Context ddi Active/Passive pair of neutron-dbs for quark plugin stoneth through pacemaker and corosync read only slave for backups RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Wafflehaus Overview Wafflehaus is a middleware for some specific Rackspace requirements Very simple way to minimize upstream diffs Upstream efforts better spent on work that benefits the broader community RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Wafflehaus - “The API Mullet” Business logic in the front, party in the back RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Wafflehaus middlewares Wafflehaus Explained Wafflehaus middlewares Neutron-api Quark plugin API Request Part of me would like to add an arrow from one of the waffles back to API. And make it red. Does the request body contain particular UUIDs Would this request violate policy? Add this tag to the request header RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Wafflehaus middlewares Wafflehaus Explained Wafflehaus middlewares Neutron-api Quark plugin API Request RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Calls to Keystone RACKSPACE® HOSTING | WWW.RACKSPACE.COM Build Melange/Quantum Neutron (trunk) Wafflehaus + no-auth Build 5 per port Delete Info Cache Update LOTS TOTAL TOO MANY RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Wafflehaus and No-Auth Middleware PTR for 10.1.2.3? DNS Server API Request x-forwarded-for Neutron-api with wafflehaus PTR at compute.trusted.domain A for compute.trusted.domain? A at 10.1.2.3 RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Wafflehaus Explained [composite:neutronapi_v2_0] use = call:neutron.auth:pipeline_factory noauth = dns_filter request_id catch_errors extensions neutronapiapp_v2_0 keystone = request_id catch_errors authtoken keystonecontext extensions neutronapiapp_v2_0 [filter:dns_filter] paste.filter_factory = wafflehaus.dns_filter.whitelist:filter_factory whitelist = trusted.domain enabled = true RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Call Volume Before & After rax.io/neutron_lon_combined rax.io/neutron_lon_launch
Call Volume Before & After
On Info Cache Updates Nova caches a copy of the instance’s network information (info cache) Cache is refreshed on instance operations which reach out to Neutron Callback system is needed RACKSPACE® HOSTING | WWW.RACKSPACE.COM
On Info Cache Updates (continued) Happens on nova-compute restart Also happens every heal_instance_info_cache_interval (default 1m) Currently 6 calls to Neutron per port Set heal_instance_info_cache_interval=0 RACKSPACE® HOSTING | WWW.RACKSPACE.COM
nova-cells and Info Cache Updates Child cells periodically sync with parent cells Migration to Neutron exposed upstream bug that was corrected in rpc network api, not neutron Cache updates were sent from child cells to global cells faster than global cells could process Delays other messages from being processed rax.io/cellsgrowth https://cdn2-b.examiner.com/sites/default/files/styles/article_large/hash/79/4c/794cf0ba3508fa0495ae6f56ddf367b1.jpg?itok=9IAlqrWM
What’s needed Callback system between nova and neutron Read-only database slave usage Cells support Nova & Neutron: Fewer calls that do more (e.g., 1 API call, many ports) RACKSPACE® HOSTING | WWW.RACKSPACE.COM
What’s next Publicly expose neutron Security Groups extension support through OVS flows Something about using wafflehaus to make an RPC Adaptor for neutron api? (notes) RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Links Patches, Blueprints https://review.openstack.org/#/c/88484/ (Neutron, Nova and Cells) https://blueprints.launchpad.net/neutron/+spec/nova-event-callback https://review.openstack.org/#/c/57517/ (noauth python-neutronclient) https://blueprints.launchpad.net/neutron/+spec/ovs-firewall-driver (OVS Firewall Driver) Projects https://github.com/rackerlabs/quark https://github.com/roaet/wafflehaus Something about using wafflehaus to make an RPC Adaptor for neutron api? (notes) RACKSPACE® HOSTING | WWW.RACKSPACE.COM