I2.1: In-Network Storage PI Presentation by Arun Iyengar NS-CTA INARC Meeting 23-24 March 2011 Cambridge, MA.

Slides:



Advertisements
Similar presentations
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Advertisements

Supporting Cooperative Caching in Disruption Tolerant Networks
ROUTING IN INTERMITTENTLY CONNECTED MOBILE AD HOC NETWORKS AND DELAY TOLERANT NETWORKS: OVERVIEW AND CHALLENGES ZHENSHENG ZHANG.
PROTOCOL VERIFICATION & PROTOCOL VALIDATION. Protocol Verification Communication Protocols should be checked for correctness, robustness and performance,
Difference Engine: Harnessing Memory Redundancy in Virtual Machines by Diwaker Gupta et al. presented by Jonathan Berkhahn.
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
A Probabilistic Misbehavior Detection Scheme towards Efficient Trust Establishment in Delay-tolerant Networks Haojin Zhu, Suguo Du, Zhaoyu Gao, Mianxiong.
ICDCS’07, Toronto, Canada1 SCAP: Smart Caching in Wireless Access Points to Improve P2P Streaming Enhua Tan 1, Lei Guo 1, Songqing Chen 2, Xiaodong Zhang.
I2.1: In-Network Storage NS-CTA INARC Meeting March 2011 Cambridge, MA.
Forwarding Redundancy in Opportunistic Mobile Networks: Investigation and Elimination Wei Gao 1, Qinghua Li 2 and Guohong Cao 3 1 The University of Tennessee,
By Libo Song and David F. Kotz Computer Science,Dartmouth College.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
Chapter 12 Pipelining Strategies Performance Hazards.
Internet Networking Spring 2004
IPv6 Mobility David Bush. Correspondent Node Operation DEF: Correspondent node is any node that is trying to communicate with a mobile node. This node.
1 A Comparison of Mechanisms for Improving TCP Performance over Wireless Links Course : CS898T Instructor : Dr.Chang - Swapna Sunkara.
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
TCP: Software for Reliable Communication. Spring 2002Computer Networks Applications Internet: a Collection of Disparate Networks Different goals: Speed,
Error Checking continued. Network Layers in Action Each layer in the OSI Model will add header information that pertains to that specific protocol. On.
Gursharan Singh Tatla Transport Layer 16-May
Team CMD Distributed Systems Team Report 2 1/17/07 C:\>members Corey Andalora Mike Adams Darren Stanley.
DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.
Switching Techniques Student: Blidaru Catalina Elena.
Process-to-Process Delivery:
Storage Allocation in Prefetching Techniques of Web Caches D. Zeng, F. Wang, S. Ram Appeared in proceedings of ACM conference in Electronic commerce (EC’03)
Presentation on Osi & TCP/IP MODEL
Qian Zhang Department of Computer Science HKUST Advanced Topics in Next- Generation Wireless Networks Transport Protocols in Ad hoc Networks.
Chapter 1 Computer System Overview Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
CH2 System models.
1 Chapter 16 Protocols and Protocol Layering. 2 Protocol  Agreement about communication  Specifies  Format of messages (syntax)  Meaning of messages.
University of the Western Cape Chapter 12: The Transport Layer.
Infocom’07 Authors:Liqian Luo, Chengdu Huang, Tarek Abdelzaher John Stankovic Presented By Rohini Kurkal Under Guidance of Dr.Bin Tang.
Switching breaks up large collision domains into smaller ones Collision domain is a network segment with two or more devices sharing the same Introduction.
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
Distributed Maintenance of Cache Freshness in Opportunistic Mobile Networks Wei Gao and Guohong Cao Dept. of Computer Science and Engineering Pennsylvania.
On Exploiting Transient Contact Patterns for Data Forwarding in Delay Tolerant Networks Wei Gao and Guohong Cao Dept. of Computer Science and Engineering.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
Rushing Attacks and Defense in Wireless Ad Hoc Network Routing Protocols ► Acts as denial of service by disrupting the flow of data between a source and.
PRoPHET+: An Adaptive PRoPHET- Based Routing Protocol for Opportunistic Network Ting-Kai Huang, Chia-Keng Lee and Ling-Jyh Chen.
User-Centric Data Dissemination in Disruption Tolerant Networks Wei Gao and Guohong Cao Dept. of Computer Science and Engineering Pennsylvania State University.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
A Low-bandwidth Network File System Athicha Muthitacharoen et al. Presented by Matt Miller September 12, 2002.
Networking Basics CCNA 1 Chapter 11.
Doc.: IEEE /0617r0 Submission May 2008 Tony Braskich, MotorolaSlide 1 Refining the Security Architecture Date: Authors:
Selective Retransmission of MPEG Video Streams over IP Networks Árpád Huszák, Sándor Imre Budapest University of Technology and Economics Department of.
Chapter 24 Transport Control Protocol (TCP) Layer 4 protocol Responsible for reliable end-to-end transmission Provides illusion of reliable network to.
Deadline-based Resource Management for Information- Centric Networks Somaya Arianfar, Pasi Sarolahti, Jörg Ott Aalto University, Department of Communications.
a/b/g Networks Routing Herbert Rubens Slides taken from UIUC Wireless Networking Group.
Joint Replication-Migration-based Routing in Delay Tolerant Networks Yunsheng Wang and Jie Wu Temple University Zhen Jiang Feng Li West Chester Unveristy.
The Internet Book. Chapter 16 3 A Packet Switching System Can Be Overrun Packet switching allows multiple computers to communicate without delay. –Requires.
Video Streaming Transmission Over Multi-channel Multi-path Wireless Mesh Networks Speaker : 吳靖緯 MA0G WiCOM '08. 4th International.
Optimization Problems in Wireless Coding Networks Alex Sprintson Computer Engineering Group Department of Electrical and Computer Engineering.
1 Transport Layer: Basics Outline Intro to transport UDP Congestion control basics.
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
Performance Comparison of Ad Hoc Network Routing Protocols Presented by Venkata Suresh Tamminiedi Computer Science Department Georgia State University.
1 The Data Link Layer A. S. Tanenbaum Computer Networks W. Stallings Data and Computer Communications Chapter 3.
Powerpoint Templates Data Communication Muhammad Waseem Iqbal Lecture # 07 Spring-2016.
Data Link Layer.
Reddy Mainampati Udit Parikh Alex Kardomateas
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Kalyan Boggavarapu Lehigh University
Process-to-Process Delivery:
Switching Techniques.
Data Link Layer. Position of the data-link layer.
Impact of transmission errors on TCP performance
Presentation transcript:

I2.1: In-Network Storage PI Presentation by Arun Iyengar NS-CTA INARC Meeting March 2011 Cambridge, MA

Key Aspects of this Work Make better use of space within network storage – Remove redundant content From communications between network nodes From network storage itself From images which may have similar content – Only store most relevant content Use proper cache replacement policies Handle disruptions and failures – Nodes may fail, become unreachable – Packet losses may occur Data Issues – Dealing with data provenance – Data consistency How provenance affects data consistency decisions

Key Problem Nodes within a network need adequate storage and memory – Mobile devices may have limited storage/memory – Content may be replicated for high availability, increasing storage/memory requirements – Even if persistent storage is sufficient, maintaining as much content in main memory may be desirable for performance reasons Our work – Redundancy elimination at multiple levels – Making better use of existing memory/storage space

Redundancy Elimination at Multiple Levels File Level (Disk) Page Level (main memory) Application Level (e.g. cached items) Communication Protocol Level Semantic Level (e.g. redundancy in visual images) Redundancy elimination at different levels can cause Mutual interference, complicated behavior

Redundancy Elimination at Communication Protocol Level In-network caching algorithms to reduce network utilization by removing redundant bytes communicated GW

Motivation Several in-network caching algorithms previously suggested (e.g. Spring, Wetherall, SIGCOMM 2000) However, evaluations mainly conducted on packet traces When deploying those algorithms on protocols (TCP/UDP/IP), several issues can arise – E.g., protocol correctness (e.g., termination) In addition, packet-traces studies are restrictive – Can only analyze certain metrics (e.g., bytes savings) – Cannot evaluate other metrics (e.g., TCP delay) and interactions between mechanisms (e.g., TCP exp. Backoff)

Goals Analyze Protocol Correctness Conduct a more comprehensive evaluation of performances in real environments – Packet losses, packet re-ordering, etc. – Bytes, Delay Design new algorithms that are more robust Develop analytical models to rigorously analyze and predict performances

Redundancy Elimination at Communication Protocol Level Communications between nodes over a network may contain redundant content – Removing those redundancies can reduce network utilization and congestion Fingerprint: number computed from a byte string using a one-way function. – Fingerprint consumes less space than byte string – Low probability of different strings having same fingerprints 1 Rabin fingerprint for string of bytes t 1, t 2, t 3,…, t n : 1 – RF(t 1, t 2, t 3,…, t n ) = (t 1 p n + t 2 p n-1 …+ t n-1 p + t n ) mod M – p and M are constant integers Computationally cheap to compute next fingerprint from previous one: i+1 – RF(t i+1, …, t n+i ) = (RF(t i, …,t n+i-1 ) – t i * p n ) * p + t n+i mod M – For faster execution, all values of t i * p n can be precomputed, stored in a table

Redundancy Elimination Algorithm Maintain caches (at both sender and receiver) of recent packets indexed by fingerprints Generate representative fingerprints for packets being sent Look up each fingerprint in cache If match found, compare bytes to make sure that there was no fingerprint collision If bytes match, expand match to largest possible matching string Update sender cache For byte strings with matching content, send tokens indentifying fingerprint instead of actual bytes

Our Implementation Original Spring-Weatherall paper did not implement a system with this redundancy scheme Our implementation encountered following issues

Illustration of Interactions RE creates dependencies between packets Wireless links are loss-prone Chains of packets may be undecodable TCP performance may be severely affected (e.g., exp. backoff) GW IP 1 IP 2 IP 3 x IP 1 IP 2 IP 3 IP 2 IP 3 IP 2, IP 3 can not be decoded Network in-caching alg. create dependencies between IP packets, which increase correlated losses and trigger TCP algorithms (e.g., exp. Backoff)

Illustration of Protocol Correctness Violation GW IP 1 IP 2 x IP 1 IP 2 IP 3 Re-transmit IP 1 IP 2 cannot be decoded IP 3 IP 2 IP 3 cannot be decoded Re-transmit IP 1 IP 4 IP 3 IP 4 IP 4 cannot be decoded A single lost (or re-ordered) packet can stall a TCP connection! IP 1 IP 3 IP 4 Same object

New Algorithms Algorithm 1: – Check TCP sequence numbers – Packets only encoded with previous packets – Variant: I-P frames (similar to MPEG) Algorithm 2: – Flush caches upon packet retransmissions

Results

Results: I/P frames

Memory Deduplication Memory system, cache, file system may have multiple entities which are identical – For example, two pages in a memory system may be identical Challenge is to identify duplicate entities, combine them so that only one copy is stored and shared Maintain a hash of stored entities. When two items hash to same value, do a byte-by- byte comparison to verify that the entities are in fact identical – Use of hash function significantly reduces number of comparison operations to check for identical entities

Delta Encoding Multiple objects within a cache may be similar but not identical – Deduplication will not work Identify similar cached objects by taking Rabin fingerprints of parts cached objects, looking for objects with similar Rabin fingerprints For objects with similar content, some of the objects o 1, o 2,…, o n can be stored as differences from one or more base objects – No need to store complete data for o 1, o 2,…, o n If overhead of unencoding a differenced object is an issue, delta encoding can be restricted to cached objects which are infrequently requested

Compression Cached objects can be compressed to further decrease memory/storage consumption If computational overhead of compression is a problem, compression should only be applied to cached objects which are infrequently accessed.

Cache Replacement Determine how to make best use of cache space LRU, Greedy-dual-size have been used in the past Have developed new cache replacement algorithms which are superior for DTNs

Caching: Basic Idea Utility-based data placement – A unified probabilistic framework – Ensure that the more popular data is always cached nearer to the brokers Data utility is calculated based on – Its chance to be forwarded to the brokers – Its popularity in the network Each two caching nodes optimize their data placement upon contact

Data Utility Data specifications – Data i is destined for brokers B 1, …, B k – Data i has a finite lifetime T The utility u ij of data i at node j evaluates the benefit of caching data i at node j – c ij : The probability that i can be forwarded to the brokers within T – w i : The probability that data i will be retrieved in the future

Cached Data Placement Whenever two nodes contact, they exchange their cached data to maximize the utilities of the data they cache – Hard replacement: only for data which is currently requested by the brokers – Soft replacement: for the other unrequested data Hard replacement is prioritized to ensure that the requested data is forwarded to the brokers

Unified Knapsack Formulation When nodes A and B contact, put the data cached on both A and B into the same selection pool with size k – – u iA : utility of data i at A – s i : size of data I – S A : buffer size of A Similar for node B

Hit Rate

Data access delay

Overhead

Military Relevance Our In-Network storage techniques are important for communications between soldiers in real deployments – Handling disruptions – Dealing with failed/captured nodes and unreliable network links – Handling lost communications, packet losses Collaborations with Robert Cole (CERDEC), John Hancock (ArtisTech/IRC), Matthew Aguirre (ArtisTech/IRC), Alice Leung (BBN/IRC) – They are studying how are techniques can be applied to military scenarios – Studying our DTN caching techniques to run experiments on traces typical of military scenarios Joint work with Guohong Cao’s group, CNARC

Impact and Collaborations ICDCS 2011 paper co-authored with Guohong Cao (Penn State) of CNARC Collaborations with Robert Cole (CERDEC), John Hancock (ArtisTech/IRC), Matthew Aguirre (ArtisTech/IRC), Alice Leung (BBN/IRC) – They have used our DTN caching code to run experiments on traces typical of military scenarios

Future Work Refine techniques on redundancy elimination in network communications Study redundancy at other levels in the system Study interactions, interferences, and synergies between redundancy elimination at different levels of the system

Summary and Conclusion New techniques for redundancy elimination – Reduces space requirements for in-network storage Redundancy elimination being performed at multiple levels of the entire system New methods for making better use of caches in DTNs – These methods are of interest to other collaborators in NS-CTA

Arun Iyengar’s NS-CTA Contributions I2.1 - In-Network Storage is only task funding Arun’s research Research Contributions in caching, redundancy elimination summarized in previous slides Publications: – “Supporting Cooperative Caching in Disruption Tolerant Networks”. W. Gao, G. Cao, A. Iyengar, M. Srivatsa. Accepted in ICDCS 2011 – "Social-Aware Data Diffusion in Delay Tolerant MANETs", Yang Zhang, Wei Gao, Guohong Cao, Tom La Porta, Bhaskar Krishnamachari, Arun Iyengar. Book chapter to appear in Handbook of Optimization in Complex Networks: Communication and Social Networks, Springer. – “Provenance driven data dissemination in disruption tolerant networks”. M. Srivatsa, W. Gao and A. Iyengar. Under submission, Fusion 2011 – “Resolving Negative Interferences between In-Network Caching Methods”. F. Le, M. Srivatsa, A. Iyengar and G. Cao. Under preparation Patent Application: – "System and method for caching provenance information", Wei Gao, Arun Iyengar (lead inventor), Mudhakar Srivatsa, IBM patent application. Initiated and established collaboration with Guohong Cao’s group (CNARC). – Mentor for Wei Gao (Guohong Cao’s PhD student) for internship at IBM. Initiated and established collaboration with Robert Cole (CERDEC), John Hancock (ArtisTech/IRC), Matthew Aguirre (ArtisTech/IRC), Alice Leung (BBN/IRC)

Data Consistency Problem: How to make sure cached data is current – Resolving inconsistencies between different copies Data consistency in DTNs – Limited connectivity can make it difficult to achieve strong consistency – Expiration times can be used for maintaining cache consistency