MOZART: Temporal Coordination of Measurement (SOSR’ 16)

Slides:



Advertisements
Similar presentations
Universidade do Minho A Framework for Multi-Class Based Multicast Routing TNC 2002 Maria João Nicolau, António Costa, Alexandre Santos {joao, costa,
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
SDN Controller Challenges
Traffic Engineering with Forward Fault Correction (FFC)
Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
Scalable Content-Addressable Network Lintao Liu
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
Programmable Measurement Architecture for Data Centers Minlan Yu University of Southern California 1.
OpenSketch Slides courtesy of Minlan Yu 1. Management = Measurement + Control Traffic engineering – Identify large traffic aggregates, traffic changes.
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
Scalable Flow-Based Networking with DIFANE 1 Minlan Yu Princeton University Joint work with Mike Freedman, Jennifer Rexford and Jia Wang.
Edith C. H. Ngai1, Jiangchuan Liu2, and Michael R. Lyu1
Distributed Algorithms for Secure Multipath Routing
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
Traffic Engineering With Traditional IP Routing Protocols
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
1 Traffic Engineering for ISP Networks Jennifer Rexford IP Network Management and Performance AT&T Labs - Research; Florham Park, NJ
A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.
On Self Adaptive Routing in Dynamic Environments -- A probabilistic routing scheme Haiyong Xie, Lili Qiu, Yang Richard Yang and Yin Yale, MR and.
Tradeoffs in CDN Designs for Throughput Oriented Traffic Minlan Yu University of Southern California 1 Joint work with Wenjie Jiang, Haoyuan Li, and Ion.
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
OpenFlow Switch Limitations. Background: Current Applications Traffic Engineering application (performance) – Fine grained rules and short time scales.
Not All Microseconds are Equal: Fine-Grained Per-Flow Measurements with Reference Latency Interpolation Myungjin Lee †, Nick Duffield‡, Ramana Rao Kompella†
Layer 2 Switch  Layer 2 Switching is hardware based.  Uses the host's Media Access Control (MAC) address.  Uses Application Specific Integrated Circuits.
MATE: MPLS Adaptive Traffic Engineering Anwar Elwalid, et. al. IEEE INFOCOM 2001.
Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology USENIX Security '08 Presented by Lei Wu.
SECURING NETWORKS USING SDN AND MACHINE LEARNING DRAGOS COMANECI –
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Speaker:Chiang Hong-Ren An Investigation and Implementation of Botnet Detection Schemes.
DECOR: A Distributed Coordinated Resource Monitoring System Shan-Hsiang Shen Aditya Akella.
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
1 Traffic Engineering By Kavitha Ganapa. 2 Introduction Traffic engineering is concerned with the issue of performance evaluation and optimization of.
1 Packet Switching Outline Switching and Forwarding Bridges and Extended LANs.
Network Virtualization Ben Pfaff Nicira Networks, Inc.
SketchVisor: Robust Network Measurement for Software Packet Processing
In the name of God.
SIEM Rotem Mesika System security engineering
SDN and Security Security as a service in the cloud
Network Layer.
Behrouz A. Forouzan TCP/IP Protocol Suite, 3rd Ed.
Multi Node Label Routing – A layer 2.5 routing protocol
Problem: Internet diagnostics and forensics
Chapter 9 Optimizing Network Performance
University of Maryland College Park
Heitor Moraes, Marcos Vieira, Italo Cunha, Dorgival Guedes
FlowRadar: A Better NetFlow For Data Centers
Distributed Network Traffic Feature Extraction for a Real-time IDS
ECE 544: Traffic engineering (supplement)
Panagiotis Demestichas
What Are Routers? Routers are an intermediate system at the network layer that is used to connect networks together based on a common network layer protocol.
Chapter 4 Data Link Layer Switching
Location Cloaking for Location Safety Protection of Ad Hoc Networks
Load Balancing Memcached Traffic Using SDN
Northwestern Lab for Internet and Security Technology (LIST) Yan Chen Department of Computer Science Northwestern University.
Routing: Distance Vector Algorithm
DDoS Attack Detection under SDN Context
SCREAM: Sketch Resource Allocation for Software-defined Measurement
L12. Network optimization
2018/12/10 Energy Efficient SDN Commodity Switch based Practical Flow Forwarding Method Author: Amer AlGhadhban and Basem Shihada Publisher: 2016 IEEE/IFIP.
PRESENTATION COMPUTER NETWORKS
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Memento: Making Sliding Windows Efficient for Heavy Hitters
Identifying Slow HTTP DoS/DDoS Attacks against Web Servers DEPARTMENT ANDDepartment of Computer Science & Information SPECIALIZATIONTechnology, University.
Delivery, Forwarding, and Routing of IP Packets
2019/5/2 Using Path Label Routing in Wide Area Software-Defined Networks with OpenFlow ICNP = International Conference on Network Protocols Presenter:Hung-Yen.
2019/5/13 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Peng Wang, George Trimponias, Hong Xu,
Network Layer.
Lu Tang , Qun Huang, Patrick P. C. Lee
Presentation transcript:

MOZART: Temporal Coordination of Measurement (SOSR’ 16) Xuemei Liu, Meral Shirazipour, Minlan Yu, Ying Zhang Thanks for the introduction. Today I am going to show our work … This is a collaborative work among, university of Southern California, Ericsson and HP Labs.

Measurement in data center Incentive examples of measurement Fault diagnosis: Capture root causes for failures. Traffic engineering: Capture statistics for big flows. Attack detection: Capture signatures of attacks. Essence of measurement Capture data related to events. Measurement is important in dc, for examples. The first purpose is fault diagnosis. If the event of … happens, we need to collect enough logs to diagnosis the root cause The second purpose is traffic engineering. If the event of uneven distribution of traffic happens in the data center, we need to collect the flow volume among all paths to analyze which flows are not distributed evenly. The third purpose is attack detection. If the event of attack happens in the network, we need to find out which servers are compromised and attacking other servers. The essence of measurement is …. The essence of measurement is in fact capture data related to events.

Different views/abilities of devices View: per source/destination traffic Abilities: end-2-end loss, latency, etc. View: per link traffic Abilities: per link volume, latency, etc. switches We observe that, different …. The hosts … the switches … hosts

No-coordination of measurement Controller Too much reporting overhead Due to different view…, the controller needs to collect all flow data through all devices, aggregate the data, analyze what events are happening in the network, and also find out the root cause for the events. !!!! This is the traditional way of doing measuring in data center. !!!It requires no-coordination between devices!!! but there are very big problems of no-coordination of measurement. The first problem is that, …, as devices need to … The second problem is that …, as the devices need to report all … In order to solve these two problems, we propose …. I will talk about some examples to compare with no-coordination measurement and temporal coordination measurement, and show how temporal coordination solve these two problems … We propose temporal coordination of measurement Limited resource may be utilized by flows not related to the event.

Example1 – loss detection Packet loss affects performance. Operators want to locate the loss. Measure & report flow volume of all flows Measure & report loss of all flows The first example is single path loss. Packet loss affects … Operators … [only talk about loss in switches] In this example, there is traffic through one path, and suppose high loss is happening among this path. Today, without coordination, the sender needs to … switches … This is really bad as the memory size in switches in small, but it needs to measure volume of all flows through it. S0 S1 S2 Traffic flow No-coordination

Example1 - loss detection Packet loss affects performance. Operators want to locate the loss. Measure & report flow volume of only lossy flows Detect high loss for some flows However, if the sender can detect which flows are suffering high loss, and send … to the switches. The switches …. Temporal coordination happens between the sender and switches, and the sender … with switches. [no much benefits, leave to behind] S0 S1 S2 Traffic flow Selected flows Sender needs to coordinate the lossy flows with switches. Coordination

Count & report number of destinations Example2 - port scan Compromised servers detect vulnerable servers. Count & report number of destinations for all senders Port: 456 The second example is … Suppose there is a compromised …, it is trying to attacking other servers by accessing random ports. In order to detect the compromised server, for no-coordination …, the ingress switches needs to counter …., and report it to the controller. The controller can then decide which servers are compromised. Port: 123 S0 S1 Compromised sever No-coordination Port: 789 Traffic flow

Example2 - port scan Compromised servers detect vulnerable servers. Detect senders with unwanted traffic sent to secure ports Count & report number of destinations for detected sender Port: 456 However, with coordination, some interesting things will happen. For example, suppose rightmost server is a http…. The egress switches … then it can tell the ingress …, then the ingress switches just needs to … Temporal coordination happen between egress switch and ingress …, and egress switch … Port: 123 S0 S1 Http server (80) Compromised sever Egress switch coordinates candidate compromised senders with ingress switch Coordination Port: 789 Traffic flow Selected flows

Example3 - ECMP flow Facebook reported congestion caused by unbalanced ECMP traffic distribution. Measure & report volume of all flows S0 The third example is ECMP flow. In 2014, fb reported congestion… Even re-configuring the hashing algorithm of selecting ECMP paths cannot solve the problem, and this problem kept happening for more than two years. Here we discuss an simplified version of the problem, … suppose the traffic of some large flows is not distributed evenly among the 3 paths. In no-coordinationn of measurement, … S1 No-coordination S2 Traffic flow

Example3 - ECMP flow Facebook reported congestion caused by unbalanced ECMP traffic distribution. Measure & report volume of elephant flows Detect elephant flows S0 S1 Switches coordinate elephant flows with each other Coordination S2 Traffic flow Selected flows

MOnitor flowZ At the Right Time MOZART MOnitor flowZ At the Right Time In order to support temporal coordination, we propose MOZART, which is MOnitor flowZ At the Right Time.

MOZART framework MOZART controller monitor selector selector monitor Capture data related to events Configure Detect events monitor selector Report data of selected flows [comment] 1. Selector, monitor, their roles; 1.1, controller configures and collect data 2.1. coordination algorithms 2.2. coordination between selector and monitors. 2.3 placement algorithms. selector monitor Selected flows

MOZART design challenges Coordination measurement Placement of MOZART tasks Due to time limitation, We have c m for one tasks, and placement algorithm for many tasks.

MOZART design challenges Coordination measurement Placement of tasks

Strawman Coordination TIME f1 satisfies the event f1 in Selector: … f1 is selected What to measure? What is metric? what are missing? How much is missing? More concert. Monitor counts packet number. f1 in Monitor: … Normal packet

Strawman Coordination TIME f1 satisfies the event f1 in Selector: … f1 is selected Also, by the time a flow is detected, the traffic in the monitor may be already gone. f1 in Monitor: … … Traffic before selected is not captured Normal packet Captured packet

Two-mode Coordination Normal Mode Event Mode TIME f1 satisfies the event f1 in Selector: … f1 is selected Sampling in Normal Mode [end] talk about there are several f1 in Monitor: … … Normal packet Traffic before selected has a chance to be captured. Captured packet Sampled packet

Memory management in monitors Selected flows, non-selected flows coexist in hash table. Limited memory in devices. Collision may happen in hash table. Selected flows f7 [comment] We have good benefit of two modes, so how to support in monitors. Because devices have limited memory resource for monitoring, the size of hashtable used to store flow statistics is limited. When two flows are trying to occupy the same entry, collision happens. Thus, we design the memory management mechanism to utilize the limited memory in a better way. Flow ID f1 f2 f3 Selected flow? 1 1 Flow statistics 10240 2048 500

Memory management in monitors Selected flows, non-selected flows coexist in hash table. Limited memory in devices. Collision may happen in hash table. Selected flows f7 Because devices have limited memory resource for monitoring, the size of hashtable used to store flow statistics is limited. When two flows are trying to occupy the same entry, collision happens. Thus, we design the memory management mechanism to utilize the limited memory in a better way. Flow ID f1 f2 f7 Selected flow? 1 1 1 Flow statistics 10240 2048 1024

Memory management in monitors Selected flows, non-selected flows coexist in hash table. Limited memory in devices. Collision may happen in hash table. Non-selected flows Selected flows f5 f6 f7 [comment] walk through the table, different rows, and f7=>f3 first. Then f5=>f1, then f6=>f2. Because devices have limited memory resource for monitoring, the size of hashtable used to store flow statistics is limited. When two flows are trying to occupy the same entry, collision happens. Thus, we design the memory management mechanism to utilize the limited memory in a better way. Flow ID f1 f2 f7 More memory is allocated to selected flows. Selected flow? 1 1 1 Flow statistics 10240 2048 1024

MOZART design challenges Coordination measurement Placement of MOZART tasks Now we introduce how to support one measurement tasks in MOZART. But in reality, there are many tasks to run, and we need to mange these many tasks in the network.

Placement of MOZART tasks Many candidate MOZART tasks to run Operators want to detect many events. Device Resource Constraints Switches: limited memory; Hosts: limited CPU. Measurement can just use leftover resources. Latency constraint within one MOZART task Timely communication is critical. Latency between selectors/monitors should be small. [this slide] say tens Mbs at most in switches. There are many … However, there are resource constraints in devices … Also, we notice that within one MOZART task, there is latency constraint between selectors and monitors. [explain] Mea

Placement of MOZART tasks Strawman algorithm Maximize Allocated Modules (MAM). Challenges One task - Selectors and monitors should all be placed. Multiple tasks - Joint placement to max running tasks. MOZART- Binary Integer Linear Programming Objective - Maximize the number of tasks to run. Subject to resource and latency constraints. [comment]: A strawman placement algorithm is …. There are two constrains: The first is devices … , The second is we need timely communication between selectors and monitors, thus, the latency between them should be small. In MOZART, we designed a binary integer linear programming solution. The objective is …., and we also meet the resource constraints in …, and the time constraints in each task.

Evaluation Setup Topology & Traffic Compared algorithms B4 topology (12 switches, 12 hosts). Implemented in Mininet. Switches run Open vSwitch. 2 hours Caida trace. Compared algorithms No-coordination - Just Sample and Hold (SH) in monitors. Coordination - Selectors sends selected flows; SH in monitors. [go to content directly] Now I will talk about our evaluations, first let’s discuss the evaluation setup. We setup the B4 topology in Mininet, which contains 12 switches and 12 data centers. The switches run open vswitch, and we add our coordination feature and measurement feature in user mode of open vswitch. IGNORE: Shortest path routing algorithm is used to forward packets in switches, and ECMP paths are applied if there are equal cost multiple paths. Some other points to talk is that we use 2 hours … . IGNORE: Multiple tasks … About the sampling techniques in normal mode, we use SH. SH is an efficient algorithm to capture large flows … We compare with a no-coordination algorithm, which is just running SH in monitors.

Example – loss detection measure flow volume of lossy flows High loss for some flows monitor monitor monitor S0 S1 S2 selector Traffic flow Selected flows from selector

MOZART achieves high accuracy Ratio of selected flows not captured [comments] x axis, y axis, and show the numbers. Show the example first, and say monitor is in the switches. [add 2M bytes points.] Strengthen reduce from 15% to 1.3% [end] we also run other examples in the testbed, but the achievement of MOZART is similar. 15% 1.3% Memory size in each monitor for measurement

MOZART supports more tasks Algorithms tasks assigned(%) Avg. latency(ms) Maximize Allocated Modules 77% 94 MOZART (Latency <= infinite) 100% 110 [comment:] wre [comment]: We fix the memory size in devices, and try to allocate ? Tasks in the network. Talk about the setup if time is enough. compare mozart with MAM, put MAM first, saying more tasks, but larger latency. If we add more

MOZART supports more tasks Algorithms tasks assigned(%) Avg. latency(ms) Maximize Allocated Modules 77% 94 MOZART (Latency <= infinite) 100% 110 (Latency <= 250ms) 98% 64 [comment:] wre [comment]: We fix the memory size in devices, and try to allocate ? Tasks in the network. Talk about the setup if time is enough. compare mozart with MAM, put MAM first, saying more tasks, but larger latency. If we add more

Conclusion Temporal coordination is important MOZART design highlights Collect data related to events. Different views/abilities of devices. MOZART design highlights Coordination algorithms. Placement algorithm for maximizing tasks to run. Benefits High measurement accuracy. Support more tasks. Meet memory constraints in devices. In order to support temporal coordination, MOZART has three design highlights.

Communication between selectors and monitors Same path Tag following packets of selected flows. Reverse path Tag reverse packets of selected flows. Different path Send explicit packets. [comment] merge 18&19. Add the tradeoff with previous slide. The second challenges is the communication between selectors and monitors. The first point is that Communication is necessary, as selectors and monitors could locate in different devices, and the selectors needs to notify monitors which flows are selected. The second point is that timely coordination is necessary. We know from the 1st challenge that part of the traffic might already pass by before one flow is selected. Thus, we need to notify monitors about the selected flows as early as possible to avoid more traffic not captured in monitors. The third point is that the coordination overhead should be small as well. One of the benefits of our architecture is that we can reduce the reporting overhead from devices to the controller, as they just need to report selected flows statistics. In order to reduce the overall overhead, we need to avoid introducing too much temporal coordination overhead to the network.