Some Observations on Network Failures NANOG 15 Craig Labovitz
Observations Goal: Model Internet topological changes Lots of strange BGP routing Strange BGP routing went away What causes remaining BGP topological and policy changes? –Not just count flaps, but study how routing tables changes over extended periods –Not end-to-end
Internet Failures Analysis Look at default-free BGP announcements from multiple large providers –Long lived (60 % of 9 months) –Consider stable if covered by less specifics –15 minute filter window –Mean-time failure, repair and availability Case study regional network
What We Did Lots of probe machines –Mae-East, Mae-West, Paix, PacBell, AADs A default-free collector at UM –Routeviews Multi-hop EBGP 6 providers US, Canada, Europe and Japan (300,000 routes) Case study of regional backbone (OSPF, IBGP/BGP) 42 gigabytes and four years of logged routing packets
RouteTracker Peer with ISP routers Log all routing packets to disk Maintain statistics
MTBF
Route Fail-Over
MTTR
Availability
Default-Free Route Availability
Backbone MTR
Network Failures Michnet Backbone Failures 11/ /98
Observations Internet significantly less availability than PSTN (99.99% +) Low mean time to change
Next Steps Host other routeviews machines? –Merit has several FreeBSD desktop boxes Looking for peers…