Connect. Communicate. Collaborate Using Temporal Locality for a Better Design of Flow-oriented Applications Martin Žádník, CESNET TNC 2007, Lyngby.

Slides:



Advertisements
Similar presentations
Collaborators: Mark Coates, Rui Castro, Ryan King, Mike Rabbat, Yolanda Tsang, Vinay Ribeiro, Shri Sarvotham, Rolf Reidi Network Bandwidth Estimation and.
Advertisements

COMPUTER SYSTEMS An Integrated Approach to Architecture and Operating Systems Chapter 9 Memory Hierarchy ©Copyright 2008 Umakishore Ramachandran and William.
IT253: Computer Organization
MEMORY popo.
A Preliminary Attempt ECEn 670 Semester Project Wei Dang Jacob Frogget Poisson Processes and Maximum Likelihood Estimator for Cache Replacement.
Traffic and routing. Network Queueing Model Packets are buffered in egress queues waiting for serialization on line Link capacity is C bps Average packet.
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Estimating TCP Latency Approximately with Passive Measurements Sriharsha Gangam, Jaideep Chandrashekar, Ítalo Cunha, Jim Kurose.
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
Analytical Modeling and Evaluation of On- Chip Interconnects Using Network Calculus M. BAkhouya, S. Suboh, J. Gaber, T. El-Ghazawi NOCS 2009, May 10-13,
Internet Traffic Patterns Learning outcomes –Be aware of how information is transmitted on the Internet –Understand the concept of Internet traffic –Identify.
Analysis of a Statistics Counter Architecture Devavrat Shah, Sundar Iyer, Balaji Prabhakar & Nick McKeown (devavrat, sundaes, balaji,
Hardware-based Load Generation for Testing Servers Lorenzo Orecchia Madhur Tulsiani CS 252 Spring 2006 Final Project Presentation May 1, 2006.
Performance Evaluation of IPv6 Packet Classification with Caching Author: Kai-Yuan Ho, Yaw-Chung Chen Publisher: ChinaCom 2008 Presenter: Chen-Yu Chaug.
CIS629 - Fall 2002 Caches 1 Caches °Why is caching needed? Technological development and Moore’s Law °Why are caches successful? Principle of locality.
Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.
INTRUSION DETECTION SYSTEMS Tristan Walters Rayce West.
Systems I Locality and Caching
Internet Traffic Management Prafull Suryawanshi Roll No - 04IT6008.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.
Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Internet Traffic Management. Basic Concept of Traffic Need of Traffic Management Measuring Traffic Traffic Control and Management Quality and Pricing.
P.1Service Control Technologies for Peer-to-peer Traffic in Next Generation Networks Part2: An Approach of Passive Peer based Caching to Mitigate P2P Inter-domain.
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
Achieving Non-Inclusive Cache Performance with Inclusive Caches Temporal Locality Aware (TLA) Cache Management Policies Aamer Jaleel,
Memory Management What if pgm mem > main mem ?. Memory Management What if pgm mem > main mem ? Overlays – program controlled.
Speeding Up Short Data Transfers Yin Zhang, Lili Qiu Cornell University Srinivasan Keshav Ensim Corporation NOSSDAV’00, Chapel Hill, NC, June 2000 Theory,
Vladimír Smotlacha CESNET Full Packet Monitoring Sensors: Hardware and Software Challenges.
DiFMon Distributed Flow Monitor Dario Salvi Consorzio Interuniversitario Nazionale per l’Informatica (CINI) Naples, Italy.
Chapter Twelve Memory Organization
Example: Sorting on Distributed Computing Environment Apr 20,
Authors: Haiquan (Chuck) Zhao, Hao Wang, Bill Lin, Jun (Jim) Xu Conf. : The 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems.
Proxy Cache and YOU By Stuart H. Schwartz. What is cache anyway? The general idea of cache is simple… Buffer data from a slow, large source within a (usually)
Author : Guangdeng Liao, Heeyeol Yu, Laxmi Bhuyan Publisher : Publisher : DAC'10 Presenter : Jo-Ning Yu Date : 2010/10/06.
Pending Interest Table Sizing in Named Data Networking Luca Muscariello Orange Labs Networks / IRT SystemX G. Carofiglio (Cisco), M. Gallo, D. Perino (Bell.
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
Sampling Dead Block Prediction for Last-Level Caches
Computer Organization & Assembly Language © by DR. M. Amer.
Memory Management What if pgm mem > main mem ?. Memory Management What if pgm mem > main mem ? Overlays – program controlled.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
University of Michigan, Ann Arbor
D 陳怡安 R 解巽評 R 高榮泰 IEEE/ACM TRANSACTIONS ON NETWORKING OCTOBER 2006 Cristian Estan, George Varghese, Member, IEEE, and Michael Fisk.
정하경 MMLAB Fundamentals of Internet Measurement: a Tutorial Nevil Brownlee, Chris Lossley, “Fundamentals of Internet Measurement: a Tutorial,” CMG journal.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
Tracking Millions of Flows In High Speed Networks for Application Identification Tian Pan, Xiaoyu Guo, Chenhui Zhang, Junchen Jiang, Hao Wu and Bin Liut.
Cache Memory By Ed Martinez.  The fastest and most expensive memory on a computer system that is used to store collections of data.  Uses very short.
Performance Limitations of ADSL Users: A Case Study Matti Siekkinen, University of Oslo Denis Collange, France Télécom R&D Guillaume Urvoy-Keller, Ernst.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
1 Monitoring: from research to operations Christophe Diot and the IP Sprintlabs ipmon.sprintlabs.com.
Sven Ubik, Aleš Friedl CESNET TNC 2009, Malaga, Spain, 11 June 2009 Experience with passive monitoring deployment in GEANT2 network.
Fall EE 333 Lillevik 333f06-l16 University of Portland School of Engineering Computer Organization Lecture 16 Write-through, write-back cache Memory.
Cache Advanced Higher.
Understanding Latency Variation in Modern DRAM Chips Experimental Characterization, Analysis, and Optimization Kevin Chang Abhijith Kashyap, Hasan Hassan,
H.264/SVC Video Transmission Over P2P Networks
Multilevel Memories (Improving performance using alittle “cash”)
Basic Performance Parameters in Computer Architecture:
COMP541 Memories II: DRAMs
Cache Memory Presentation I
Optimal Elephant Flow Detection Presented by: Gil Einziger,
Part V Memory System Design
Chapter 4 Multiprocessors
Cache Memory and Performance
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Connect. Communicate. Collaborate Using Temporal Locality for a Better Design of Flow-oriented Applications Martin Žádník, CESNET TNC 2007, Lyngby

Connect. Communicate. Collaborate Motivation Optimize performance of network applications Where context is retrieved with every arrival of the packet Such as passive monitoring applications such as NetFlow, IDS, … So far, scaling by sampling

Connect. Communicate. Collaborate Memory limitation Context must be stored in memory which is either –small and fast or –large and slow What about memory hierarchy? Use large memory with cache similarly to PC architecture Only if locality of traffic is good –spatial –temporal

Connect. Communicate. Collaborate Steps Find a network characteristic for locality Apply it on real samples Analyze results Optimize architecture Optimize performance Focus on flow-oriented applications

Connect. Communicate. Collaborate Time characteristic is dependent on the speed of link Pseudo-Time is counted in number of packets Not interested directly in time but rather in sequence locality (what is next) Metric

Connect. Communicate. Collaborate Characteristic Flow gap = gap (measured in number of diff. packets) between two packets of the same flow

Connect. Communicate. Collaborate Measurement Collecting data –samples of 8 – 30 mil. packets –tcpdump, headers only – :64540, :64510 Offline processing –Perl scripts –average gaps, maximum gaps –cumulative histograms

Connect. Communicate. Collaborate Results Distribution of flow-gaps is exponential for common traffic

Connect. Communicate. Collaborate Apply results Estimate size of the cache in system of cache and slow memory (DRAM) Optimize replacement policy Estimate the speed-up Case study on FlowMon probe

Connect. Communicate. Collaborate Real World On chip cache latency 1 clock cycle External cache 4 clock cycles DRAM average latency 16 cycles

Connect. Communicate. Collaborate Amdahl’s law

Connect. Communicate. Collaborate FlowMon context - speedup 8x 64bit words Internal Cache 9 cycles External Cache 12 cycles DRAM 24 cycles

Connect. Communicate. Collaborate Victim policy LRU x Random

Connect. Communicate. Collaborate Entering policy Sample&Hold [Estan,Varghese] Target elephants flows only Make sense only for really small cache

Connect. Communicate. Collaborate Conclusion PseudoTime locality of flows Measurements on real samples So far, on-chip CACHE only Speed-up 1.7x: Memory architecture described in VHDL and used for FlowMon probe on COMBO6X cards Future work: –Corelation with timestamps –Implement LRU or Sample&Hold