OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Coupling Prefix Caching and Collective Downloads for.

Slides:



Advertisements
Similar presentations
A Proposal of Capacity and Performance Assured Storage in The PRAGMA Grid Testbed Yusuke Tanimura 1) Hidetaka Koie 1,2) Tomohiro Kudoh 1) Isao Kojima 1)
Advertisements

Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki
Cost-Based Cache Replacement and Server Selection for Multimedia Proxy Across Wireless Internet Qian Zhang Zhe Xiang Wenwu Zhu Lixin Gao IEEE Transactions.
W3C Workshop on Web Services Mark Nottingham
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
Pervasive Web Content Delivery with Efficient Data Reuse Chi-Hung Chi and Cao Yang School of Computing National University of Singapore
Data Grids Darshan R. Kapadia Gregor von Laszewski
GridFTP: File Transfer Protocol in Grid Computing Networks
Cloud Download : Using Cloud Utilities to Achieve High-quality Content Distribution for Unpopular Videos Yan Huang, Tencent Research, Shanghai, China Zhenhua.
The File Mover: An Efficient Data Transfer System for Grid Applications C. Anglano, M. Canonico Dipartimento di Informatica Universita' del Piemonte Orientale,
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Multimedia Proxy Caching Mechanism for Quality Adaptive Streaming Applications in the Internet Reza Rejaie Haobo Yu Mark Handley Deborah Estrin Presented.
Positioning Dynamic Storage Caches for Transient Data Sudharshan VazhkudaiOak Ridge National Lab Douglas ThainUniversity of Notre Dame Xiaosong Ma North.
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
1 A Framework for Lazy Replication in P2P VoD Bin Cheng 1, Lex Stein 2, Hai Jin 1, Zheng Zhang 2 1 Huazhong University of Science & Technology (HUST) 2.
Lecture 17 I/O Optimization. Disk Organization Tracks: concentric rings around disk surface Sectors: arc of track, minimum unit of transfer Cylinder:
Energy Efficient Prefetching – from models to Implementation 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software Engineering.
Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?
Prefix Caching assisted Periodic Broadcast for Streaming Popular Videos Yang Guo, Subhabrata Sen, and Don Towsley.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies John Dilley and Martin Arlitt IEEE internet computing volume3 Nov-Dec 1999 Chun-Fu.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
1 stdchk : A Checkpoint Storage System for Desktop Grid Computing Matei Ripeanu – UBC Sudharshan S. Vazhkudai – ORNL Abdullah Gharaibeh – UBC The University.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
Loopback: Exploiting Collaborative Caches for Large-Scale Streaming Ewa Kusmierek, Yingfei Dong, Member, IEEE, and David H. C. Du, Fellow, IEEE.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
COnvergence of fixed and Mobile BrOadband access/aggregation networks Work programme topic: ICT Future Networks Type of project: Large scale integrating.
1 FreeLoader: borrowing desktop resources for large transient data Vincent Freeh 1 Xiaosong Ma 1,2 Stephen Scott 2 Jonathan Strickland 1 Nandan Tammineedi.
On-demand Grid Storage Using Scavenging Sudharshan Vazhkudai Network and Cluster Computing, CSMD Oak Ridge National Laboratory
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
High Performance I/O and Data Management System Group Seminar Xiaosong Ma Department of Computer Science North Carolina State University September 12,
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
IMDGs An essential part of your architecture. About me
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Web Cache Redirection using a Layer-4 switch: Architecture, issues, tradeoffs, and trends Shirish Sathaye Vice-President of Engineering.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
1 FreeLoader: Lightweight Data Management for Scientific Visualization Vincent Freeh 1 Xiaosong Ma 1,2 Nandan Tammineedi 1 Jonathan Strickland 1 Sudharshan.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
Multicache-Based Content Management for Web Caching Kai Cheng and Yahiko Kambayashi Graduate School of Informatics, Kyoto University Kyoto JAPAN.
OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY FreeLoader: Scavenging Desktop Storage Resources for.
A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer.
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 30 – Media Server (Part 5) Klara Nahrstedt Spring 2009.
Monte Carlo Data Production and Analysis at Bologna LHCb Bologna.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Presented by Robust Storage Management in the Machine Room and Beyond Sudharshan Vazhkudai Computer Science Research Group Computer Science and Mathematics.
International Conference on Autonomic Computing Governor: Autonomic Throttling for Aggressive Idle Resource Scavenging Jonathan Strickland (1) Vincent.
Content Delivery Networks: Status and Trends Speaker: Shao-Fen Chou Advisor: Dr. Ho-Ting Wu 5/8/
Andrew McNab - HTTP/HTTPS extensions HTTP/HTTPS as Grid data transport 6 March 2003 Andrew McNab, University of Manchester
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY The stagesub tool Sudharshan S. Vazhkudai Computer Science Research Group CSMD Oak Ridge National.
/ Fast Web Content Delivery An Introduction to Related Techniques by Paper Survey B Li, Chien-chang R Sung, Chih-kuei.
A Sneak Peak of What’s New in Globus GridFTP John Bresnahan Michael Link Raj Kettimuthu (Presenting) Argonne National Laboratory and The University of.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory.
CS 414 – Multimedia Systems Design Lecture 31 – Media Server (Part 5)
The Impact of Replacement Granularity on Video Caching
Web Caching? Web Caching:.
Memory Management for Scalable Web Data Servers
Plethora: Infrastructure and System Design
Evaluating Proxy Caching Algorithms in Mobile Environments
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
Optimizing End-User Data Delivery Using Storage Virtualization
Presentation transcript:

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Coupling Prefix Caching and Collective Downloads for Remote Scientific Data Xiaosong Ma, 1,2 Sudharshan Vazhkudai, 1 Vincent Freeh, 2 Tyler Simon, 2 Tao Yang, 2 and Stephen Scott 1 1 Oak Ridge National Laboratory 2 North Carolina State University ICS’06 Technical Paper Presentation Session: Memory I June 30, 2006 Cairns, Australia

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Outline  Problem space: Client-side caching  The Prefix caching problem  FreeLoader backdrop  Prefix caching  Architecture  Model  Collective downloads  Performance

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Problem Space: Client-side Caching  HTTP caches  Proxy caches (Squid), CDNs (Akamai)  Benefits  Reduces server bandwidth consumption, load and latency  Improves client perceived throughput  Helps exploit locality  Benefits amplified for large, media downloads  What of scientific data, then?  Data Deluge!  User access traits on large scientific data  Local processing/viz of data  Implies downloading remote data (FTP, GridFTP, HSI, wget)  Shared interest among groups of researchers  A Bioinformatics group collectively analyze and visualize a sequence database for a few days: Locality of interest!  More and more, applications are latency intolerant  Transient in nature  Examples: FreeLoader (ORNL/NCSU), IBP (UTK), DataCapacitor (IU), TSS (UND) Intermediat e data cache exploits this area

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY The Prefix Caching Problem  HTTP Prefix Caching  Multimedia, streaming data delivery  BitTorrent P2P System: leechers can download and yet serve  Benefits  Bootstrapping the download process  Store more datasets  Allows for efficient cache management  Enabling Trends: Scientific data properties  Usually write-once-read-many  Remote source copy held elsewhere  Primarily sequential accesses  Challenges  Clients should be oblivious to dataset being partially available  Performance hit?  How much of the prefix of a dataset to cache?  So, client accesses can progress seamlessly  Online patching issues  Client access to remote patching I/O mismatch  Wide-area download vagaries  Can we do something similar for large scientific data accesses?

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Backdrop: FreeLoader Collaborative Desktop Storage Cache

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Prefix Caching Architecture  Capability-based resource aggregation  Persistent storage & BW-only donors  Client serving: parallel get  Remote patching using URIs  Better cache management  Stripe entirely when space available  When eviction is needed, only stripe a prefix of the dataset  Victims based on LRU:  Evict chunks from the tail until a prefix  Entire datasets evicted only after all such tails are evicted

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Prefix Size Prediction  Goal: Eliminate client perceived delay in data access  What is an optimal prefix size to hide the cost of suffix patching?  Prefix size depends on:  Dataset size, S  In-cache data access rate by the client, R client  Suffix patching rate, R patch  Initial latency in suffix patching, L  Client access rate indicative of time to patch, S/R client = L + (S – S prefix )/R patch  Thus, S prefix = S(1 – R patch /R client ) + LR patch

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Collective Download  Why?  Wide-area transfer reasons:  Storage systems and protocols for HEC are tuned for bulk transfers (GridFTP, HSI)  Wide-area transfer pitfalls: high latency, connection establishment cost  Client’s local-area cache access reasons:  Client accesses to the cache use a smaller stripe size (e.g., 1MB chunks in FreeLoader)  Finer granularity for better client access rates  Can we derive from collective I/O in parallel I/O

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Collective Download Implementation  Patching nodes perform bulk, remote I/O; ~ 256MB per request  Reducing multiple authentication costs per dataset  Automated interactive session with “Expect” for single sign on  FreeLoader patching framework instrumented with Expect  Protocol needs to allow sessions (GridFTP, HSI)  Need to reconcile the mismatch in client access stripe size and the bulk, remote I/O request size  Shuffling  Patching nodes, p, redistribute the downloaded chunks among themselves according to the client’s striping policy  Redistribution will enable a round-robin client access  Each patching node redistributes (p – 1)/p of the downloaded data  Shuffling accomplished in memory to motivate BW-only donors  Thus, client serving, collective download and shuffling are all overlapped

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Testbed and Experiment setup  UberFTP stateful client to GridFTP servers at TeraGrid-PSC and TeraGrid-ORNL  HSI access to HPSS  Cold data from tapes  FreeLoader patching framework deployed in this setting

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Collective Download Performance PW=10; I/O=256M DownloadDownload + Shuffle Client access HPSS % % Tera- ORNL % % Tera-PSC % %

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Prefix Size Model Verification Data sources HPSS- ORNL Tera- ORNL Tera- PSC R client (MB/s) 52.2 R patch (MB/s) L (s) Predicted ratio 95%24.6%81%

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Impact of Prefix Caching on Cache Hit rate Jefferson Lab Asynchronous Storage Manager (JASMine) No of days19.1 No of accesses4000 No of unique datasets1686  Tera-ORNL will see improvements around 0.2 and 0.4 curve (308% and 176% for 20% and 40% prefix ratio)  Tera-PSC sees up to 76% improvement in hit rate with 80% prefix ratio

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Summary  Demonstrated prefix caching for large scientific datasets  Novel techniques to overlap remote I/O with cache I/O  A simple prefix prediction model  Patching with different storage transfer protocols  Rich resource aggregation model  Impact on cache hit ratio providing a “virtual cache”  In summary, novel combination of techniques from the fields HTTP multimedia streaming and parallel I/O  Future:  Use patching cost in conjunction with frequency of accesses to determine which/how much of a dataset to keep in cache: latency-based cache replacement