Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Annotation Layer for Network Management George Porter, Arne Baste, David Chu, Dilip Joseph Randy H. Katz NetRads Retreat - June 2005.

Similar presentations


Presentation on theme: "An Annotation Layer for Network Management George Porter, Arne Baste, David Chu, Dilip Joseph Randy H. Katz NetRads Retreat - June 2005."— Presentation transcript:

1 An Annotation Layer for Network Management George Porter, Arne Baste, David Chu, Dilip Joseph Randy H. Katz NetRads Retreat - June 2005

2 Goal of today’s talk Snapshot of our thinking in this area Several open research problems as to appropriateness of piggybacking, effectiveness of distributed observation, etc. Your feedback appreciated

3 Outline Motivating example: Discovering and protecting network service performance during stress PNEs as A-Layer building block Overview: Annotation layer as provider of component building block for network management Revisit network service example with A-Layer Research challenges, open issues, opportunities

4 Outline Motivating example: Discovering and protecting network service performance PNEs as A-Layer building block Overview: Annotation layer as provider of component building block for network management Revisit network service example with A-Layer Research challenges, open issues, opportunities

5 Dist Tier Motivating Example: Network service slowdown/failure Problem:  Users in the access tier complain of slow web access, can’t mount files, and “DNS operation timed out messages”  This problem started today at 10am Where to begin?  Network connectivity between users and outside seems ok  But name resolution is intermittent and slow  We need tools to figure out who is affected, who isn’t affected, the cause, and a solution. Client R ICIC DNS Web DNS NFS FTP Server tier ISIS R DNS

6 Dist Tier Motivating Example: Network service slowdown/failure Network connectivity to DNS? [ping,traceroute] Are DNS requests making it to the server tier?  What is happening to the request completion rate (is it lower)? Vs network path losses (I.e., is it the path or the service?) DNS server CPU level up Localize the problem:  Only this user? Or other clients?  Just that server? What is happening to the DNS req/reply completion rate of other servers in that cluster? Correlations? Is this user anomalous? So far: DNS overloaded, leading to timeouts on client end Client R ICIC DNS Web DNS NFS FTP Server tier ISIS R DNS

7 Dist Tier Why is the service overloaded? Is there an usual number of requests from other sources? [deviation from the mean] What is the status of requests to this service network-wide? How has it changed since before the first reports of the problem? We discover that the number of DNS requests from access and ISP networks is unchanged (must be in server tier)  Other correlations? Yes, to SMTP traffic at ISP ingress We suspect the endpoint of SMTP traffic, a spam appliance, as the cause of DNS performance loss  No unusual surges of DNS from access or ISP (from outside our enterprise network)  Thus originating inside the server tier  And correlated to SMTP traffic Client R ICIC DNS Web DNS NFS FTP Server tier ISIS R R I SMTP

8 Dist Tier Eliminate false positives: testing this conjecture via experimental intervention  Temporarily b/w throttle SMTP traffic from ISP ingress  Test DNS latency from access network  Find that DNS latency goes down when SMTP volume goes down We enact a new (but temporary) policy:  Redirect requests from access tier to secondary or tertiary DNS server (service separation for different users)  BW regulate SMTP traffic to keep DNS server CPU load from peaking  Access users’ service restored--their traffic is protected. Problem localized and mitigated  Long term solution: software upgrade, firmware upgrade, add dedicated DNS cache for appliance Client R ICIC DNS Web DNS NFS FTP Server tier ISIS R R I SMTP DNS

9 Example Review Localizing and identifying problem required  Network-wide visibility despite stressed links/servers  Path information (network connectivity, protocol request/reply completion information)  Finding changes in behavior (avg # requests/unit time, rate of change of traffic)  Finding correlations between traffic (traffic classes, volume, network level paths)  Experimental intervention (correlation to causation)  Enabling new policy (redirecting traffic to secondary server, BW throttling/fencing misbehaving flows)

10 Principles for network management Network-wide visibility despite surges/overload/high loss rates Low overhead Path statistics gathering Some protocol visibility (TCP, IP, Services like DNS, NFS) Need to discover  Changes to request-reply rate, completions, latency over time  Correlations between different flows, protocols, parts of the network New policies (Actions)  For experimental intervention (root cause discovery)  To protect good traffic BW shaping, blocking, scheduling, fencing, selective drop Security  Against non-operators using this infrastructure  Against DoS attacks

11 Outline Motivating example: Discovering and protecting network service performance PNEs as A-Layer building block Overview: Annotation layer as provider of component building block for network management Revisit network service example with A-Layer Research challenges, open issues, opportunities

12 PNEs (Programmable Network Elements) and iBoxes Inspection-and-action points  Deep, multiprotocol, packet inspection  No routing, just observation and marking  Actions: Selective drop, b/w fencing and shaping, notification of operators, query “points of observation” Some protocol visibility to TCP, UDP, ‘good’ network service protocols like DNS/NFS Per-flow session state and reverse path visibility Per-flow and per-path simple statistics gathering (latencies, round trip times, requests/sec, address source and destinations) iBox

13 Annotation Layer Explicit layer for iBox-to-iBox communication via packet annotations Annotations:  Fixed size  Encoded to enable the de-annotation of packets  Multiple payload types based on any layer of the flow  Security field for authentication iBox url: X

14 A-Layer Annotation Design Encode annotations in between IP and transport Allow annotations to be stacked (multiple) Annotations are removed by iBoxes before reaching endhosts Motivation: start with large (but versatile) annotation format  When we discover the set of annotations that are most effective for network management, we can reduce the footprint to support that set

15 Categories of annotations NetflowsAlteon, Packeteer. SNMP proxy

16 iBox placement In an Enterprise Network: iBoxes at points of hierarchical division R R Distribution Tier B C D S S I R IAIA A Internet Edge Access Edge Server Edge Spam Appliance Primary & Secondary DNS Servers ISIS S Mail Server S 10.0.0.1 10.0.0.2... 10.0.0.100 10.0.0.101 10.0.0.102... 10.0.0.255 These locations give iBoxes ability to monitor and classify traffic flowing through them. Also, iBoxes can slow down, block, fence, and drop traffic to ease surges and protect “good” traffic from bad/ugly traffic

17 Routing to other iBoxes Once we know which iBoxes exist, we need to know how to reach them so we can send them annotations Requires building up this table at each iBox  Topology dependent If a packet’s destination address doesn’t match an iBox in this table, we remove all annotations to ensure endhost correctness Represents “core” iBoxes Represents “edge” iBoxes

18 Active vs Passive annotations When to send “active” annotations (I.e., a separate packet) vs when to passively annotate?  Available during high traffic (passive) vs expedient (active)  Associate timers with each queue  When packet arrives and an annotation is dequeued, we reset the timer  If the timer goes off, we generate a new dummy packet, annotate it, send it off to the right destination iBox, and reset the timer A B C D E

19 Outline Motivating example: Discovering and protecting network service performance PNEs as A-Layer building block Overview: Annotation layer as provider of component building block for network management Revisit network service example with A-Layer Research challenges, open issues, opportunities

20 A-Layer as component building blocks for observe-analyse-act Observe  Path statistics; req/reply completion rate,latency; new conn rate; connection age; protocol types/mixtures; their change over time Analyse  Correlations; mean changing over time (chi-sq); PCA; experimental intervention (act, then observe) Act  BW throttling, selective drop, packet scheduling, bw fencing

21 Centralized More control, consistent information (but could be out of date) Centralize policy (no need to cast policy over multiple nodes) Distributed routing preferred over centralized approach  Similar motivation for iBoxes/A-Layer Why Distributed observe- analyse-act? Distributed Quick distribution of information Need for information throughout the network Works during network partitions, provides visibility during surges when it is hard to get packets through Up-to-date info, but might be inconsistent But, consistency hard; could start bad feedback loops; need to elect leader

22 Outline Motivating example: Discovering and protecting network service performance PNEs as A-Layer building block Overview: Annotation layer as provider of component building block for network management Revisit network service example with A-Layer Research challenges, open issues, opportunities

23 Dist Tier Path-oriented connectivity and reachability  Network service monitoring Are requests getting through? What is their rate? What has been happening to the DNS latency? Where are “DNS hotspots”?  iBoxes can store characteristics of paths through the network Types of protocols they see, volume of protocols, rate of change of traffic, distribution of source/destination addresses seen, network errors, topology information NetFlows as statistics gathering at a single point  Extract and share reports from this information  Annotate packets with IBox Source annotation to have access to inside- vs-outside/paths chosen and paths taken  Annotate packets with service reachability reports, link conditions, traffic rates and changes of traffic rates  Annotate packets with protocol reports that represent the mixture of protocols seen at various points throughout the network Client R ICIC DNS Web DNS NFS FTP Server tier ISIS R R I SMTP DNS

24 Dist Tier Relationship between traffic classes, correlations, anomolies  Discovering anomalies: iBoxes consuming annotations from other parts of the network need to be able to discover when good services lose performance SLT problem of anomaly detection made easier with more information and visibility Network data stored in vector form for rate, quantity, time domain  Discovering correlations: For good services that are degrading, finding correlations to anomalous traffic surges, flash traffic, etc. provides hints to cause of problem Each iBox representing affected traffic needs annotations containing network wide events capturing changes in traffic patterns “Analysis” components of observe-analyze-act done from multiple network vantage points or centralized? Client R ICIC DNS Web DNS NFS FTP Server tier ISIS R R I SMTP DNS

25 Dist Tier Experimental Intervention, protection of good traffic via policy actions  Experimental intervention: Control annotations sent to iBox near source of surge to temporarily throttle Annotations routed to iBox at ISP ingress to invoke new policy The policy in the annotation relies on iBox actions of BW shaping, fencing, and TCP ack manipulation to reduce SMTP flow rate  Protection of good traffic: Policy could include network-level redirection to channel good DNS requests from access networks to a secondary, backup DNS service Marking traffic not affiliated with surge for protection elsewhere in the network closer to the service location Client R ICIC DNS Web DNS NFS FTP Server tier ISIS R R I SMTP DNS

26 Outline Motivating example: Discovering and protecting network service performance PNEs as A-Layer building block Overview: Annotation layer as provider of component building block for network management Revisit network service example with A-Layer Research challenges, open issues, opportunities

27 Policy expression and deployment When correlations discovered, what to do with them? Initial efforts are to provide observation platform for visualization of network state  A-Layer/iBoxes as building blocks for operator interaction

28 “Above the network” services Right now we envision iBoxes understanding well known network services  Open question as to visibility to higher level applications like web services, enterprise-specific apps  New policy complexity, new correlations and state management needed

29 Statistical visualization for operators Open problem to aggregate distributed observations into coherent visualization for operators  Where does the visualization reside?  What are the right metrics/correlations/deviations from mean that are relevant?  How do actions relate to visualization?

30 SLT analysis Choice of algorithm Finding “interesting” correlations Not being overloaded with too many correlations and events Deviation from mean, finding patterns, what is normal operation for a protocol?

31 Managing distributed actions Managing feedback loops Providing coherent actions at the global scale based on iBoxes distributed throughout the network Coordinating actions despite network surges and limited network access, path losses, etc.

32 Q & A

33 Q: What about the e2e argument? Adding/removing annotations:  Annotations easy to remove  Packet paths not modified Actions such as throttling, scheduling, dropping  Con: affects traffic in ways endhosts can detect  Pro: Provides “library” of components to enable new network services / management features That’s how we build software A-Layer gives enterprise operators control over their networks  As long as their applications are supported and work  Enterprise networks usually have white list of allowed apps, all other disallowed Contrast this to ISPs

34 Q: What about per-flow state management? Some routers can keep per-flow state (Netflows) iBoxes can sample traffic iBoxes not in correctness path--can act as ‘nops’ Network traffic parallelizable, targeting 1 GigE Can be merged into expandable network devices (see Cisco’s server cards that plug into routers)

35 Q: What about e2e security (IPsec?) E2e security obscured protocol, but not path stats Conceivable to discover request/response phases, infer completion rate; keep stats on # connections, flow rates Statistically infer when a flow is starved for bandwidth; observe bandwidth over time; correlate with destination/sources function (web server, mail server, etc) Correlations still work over encrypted traffic Can still perform experiments by affecting flow X, observing flow Y

36 Q: Why annotate? (Why not send separate packets?) Annotations are about path characteristics  Can bind to the flow they describe  Statistics follow paths where they are the most relevant  Marries per-path context with each packet of a particular flow (gives iBoxes info they need to throttle, fence, etc) As packet flow rate increases, more opportunity for visibility by piggybacking Lower overhead during times of stress  Possible preference of fewer large packets than more small packets Explicit sending of separate packets still ok  Especially for discovery, control, and policy distribution

37 Q: Why distributed? Centralized statistics gathering easy in enterprise networks  But hard during times of stress/traffic spikes/flash traffic Information might be needed in more than one place “Act” operations to protect good traffic needs timely info  Contrast to 5-min avgs common in SNMP Raises difficulty, though  Election protocols, distributed consensus, negative feedback loops, management of iBoxes Let’s experiment and see  Open research question as to benefit of distributed vs centralized network observation, analysis, and action/actuation


Download ppt "An Annotation Layer for Network Management George Porter, Arne Baste, David Chu, Dilip Joseph Randy H. Katz NetRads Retreat - June 2005."

Similar presentations


Ads by Google