NANOG -1- Orbit1000 Technology Discussion Opnix Smart Routing Technology Overview ”There is more then one way to skin a cat…” Opnix Smart Routing Technology Overview ”There is more then one way to skin a cat…” Aaron D. Britt Opnix, Inc.
NANOG -2- Orbit1000 Technology Discussion Orbit1000 Technology Discussion Overview Orbit1000 CPE Overview Probing Method in More detail Orbit1000 CORE Overview Things to Come… Lets Review - Q & A
NANOG -3- Orbit1000 Technology Discussion Orbit1000 CPE High Level Architecture ENCRYPTED
NANOG -4- Orbit1000 Technology Discussion Functions of the Orbit1000 CPE Probe stuff Receive BGP Feed and Set Routes Communicate with the CORE –Send Raw Probe Data –Receive Optimized Routes Orbit1000 CPE Discovery Probes Set BGP Routes QA Probes Internet Customer Router(s) CORE ENCRYPTED
NANOG -5- Orbit1000 Technology Discussion How we become one with the Packet UDP Probes – Proactive Philosophy using patented ActiveScan –Tried ICMP - routers drop ICMP despite what RFC says –We tried TCP – set off IDS systems all over the place –We tried the force - but none of us had enough metaclorians. –We now use a UDP probe, though proprietary in nature, very similar to that of a typical traceroute. –We found that during testing, routing policy set using UDP Probe data is within 2% of the routing policy set using TCP probe data, but it doesn’t set off IDS systems!
NANOG -6- Orbit1000 Technology Discussion Probing Mechanism Where do we probe? –Prefix List based on prefixes important to each Customer Top 500 Trafficked Sites/ News Groups etc… Route Feed from Customer Routers Traffic Flow Data (Netflow, Span Port ) Logs (Web, DNS etc…) Capable of probing 110,000+ routes, but it doesn’t make sense to (most of the time) –discovery.ignore and discovery.include lists. –’Prefix + 1’ methodology, unless a more specific ip address is specified in the configuration. We probe multiple prefixes over multiple upstreams in parallel, configurable amount – how much bandwidth do you want to spend on Probes?
NANOG -7- Orbit1000 Technology Discussion Metrics Gathered OpScore (Algorithm based on the probe data weighted, and calculated based on customer defined settings) –Latency –Unreliability Link Unreliability Probe Closure Packet Loss Routing Loops –Bad Hops –Layer 3 Hops –Carrier Preference Lowest score wins
NANOG -8- Orbit1000 Technology Discussion QA Process (Testing the Active Link) UDP Based (Just like our Discovery Probes) We QA everything! We send the QA probe to a TTL based on where we think the endpoint is based on our discovery data. We check the latency and unreliability against the probe data we used to set the route. How many QA routes do we send, and how fast? –The QA Limit is configurable like Carrier Limit in the Client Config – which means you control how many routes we can QA in parallel. QA happens much faster then Discovery.
NANOG -9- Orbit1000 Technology Discussion Orbit1000 CORE 5 Pieces –Balancer (Communicates w/CPE) –Optimizer (Crunches Numbers) –View (Keeps Latest and Greatest Views per CPE) –SQL dB (Stores Stuff) –Customer Portal (Looks stuff up) View Customer Portal SQL dB BalancerOptimizerCPE Portal CORE
NANOG -10- Orbit1000 Technology Discussion Data Access Portal –Access to Data, raw and graphical (Current and Historical) –All metrics and weights represented –Access to each CPE Client Config –RouteVision (Visualize over Multiple Paths) –Aggregate Summarizations SQL dB –Raw Data Transactional Data (Real Time) Warehoused Data (Portal) Archival Data
NANOG -11- Orbit1000 Technology Discussion Fault Tolerance Stuff… If it goes up in smoke, the Customer router reverts back to standard BGP. Discovery Probes halt if the CPE loses the CORE connection, if keep-alives fail within a period of time, product removes routes and “sleeps” until communication with the CORE is reestablished. Fault Tolerant reasoning behind storing CPE config on central dB Heartbeat / fail over process between CPE’s SNMP traps, early warning system (RAM, Hard Disk, CPU etc..) Always working on additional MIB support
NANOG -12- Orbit1000 Technology Discussion Things to Come… Probes to support Jumbo Frames (Adjustable Frame Size) Dedicated Jitter Metrics Black- hole and Routing Loop Discovery/reports via Website TCP Slow Start Algorithm emulation TCP and/or UDP probes (Pick your poison) TCP Sniffing for Active Links (Monitor Actual Data – Replace QA) Multicast Support IPV6 Support Additional MIB support NEBS Compliant (just kidding)
NANOG -13- Orbit1000 Technology Discussion Contact Information If you have any questions or would like to comment and/or critique this method of ‘Cat Skinning’ (I would love for some hecklers to drop me a line, with-out peer review no progress is possible) here is my contact info… Case Studies available today… Tier 1 ISP Fortune 5 Enterprise Fortune 100 Financial Institution Internet2/Abilene Deployment
NANOG -14- Orbit1000 Technology Discussion Layer 3 Hops vs latency (30 day Summary)
NANOG -15- Orbit1000 Technology Discussion Prefixes are how many hops away?
NANOG -16- Orbit1000 Technology Discussion Other Questions to ask… Is there a direct correlation between Hops and Latency? Hop count seems anecdotal, yet the numbers are quite convincing… How accurate does UDP measurements compare with TCP measurements when talking about Latency, Packet Loss and Throughput? How much does Asymmetrical routing, play a part in the world of Sub optimal routing? With Netflow stats, on average it seems that Routers only forward packets to 10% or so of the Global Rib, yet our routing Tables are tenfold +. Seems we can do something here, I just don’t know what, yet…