Optimizing End-User Data Delivery Using Storage Virtualization

Slides:



Advertisements
Similar presentations
Tivoli SANergy. SANs are Powerful, but... Most SANs today offer limited value One system, multiple storage devices Multiple systems, isolated zones of.
Advertisements

High Performance Computing Course Notes Grid Computing.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
Pervasive Web Content Delivery with Efficient Data Reuse Chi-Hung Chi and Cao Yang School of Computing National University of Singapore
OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Coupling Prefix Caching and Collective Downloads for.
An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Positioning Dynamic Storage Caches for Transient Data Sudharshan VazhkudaiOak Ridge National Lab Douglas ThainUniversity of Notre Dame Xiaosong Ma North.
1 A Framework for Lazy Replication in P2P VoD Bin Cheng 1, Lex Stein 2, Hai Jin 1, Zheng Zhang 2 1 Huazhong University of Science & Technology (HUST) 2.
RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.
1 stdchk : A Checkpoint Storage System for Desktop Grid Computing Matei Ripeanu – UBC Sudharshan S. Vazhkudai – ORNL Abdullah Gharaibeh – UBC The University.
Implementing ISA Server Caching. Caching Overview ISA Server supports caching as a way to improve the speed of retrieving information from the Internet.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
VAP What is a Virtual Application ? A virtual application is an application that has been optimized to run on virtual infrastructure. The application software.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
GridFTP Guy Warner, NeSC Training.
1 FreeLoader: borrowing desktop resources for large transient data Vincent Freeh 1 Xiaosong Ma 1,2 Stephen Scott 2 Jonathan Strickland 1 Nandan Tammineedi.
On-demand Grid Storage Using Scavenging Sudharshan Vazhkudai Network and Cluster Computing, CSMD Oak Ridge National Laboratory
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
Microsoft Active Directory(AD) A presentation by Robert, Jasmine, Val and Scott IMT546 December 11, 2004.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Infrastructure for Better Quality Internet Access & Web Publishing without Increasing Bandwidth Prof. Chi Chi Hung School of Computing, National University.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Secure, Collaborative, Web Service enabled and Bittorrent Inspired High-speed Scientific Data Transfer Framework.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
2  Supervisor : MENG Sreymom  SNA 2012_Group4  Group Member  CHAN SaratYUN Sinot  PRING SithaPOV Sopheap  CHUT MattaTHAN Vibol  LON SichoeumBEN.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
1 FreeLoader: Lightweight Data Management for Scientific Visualization Vincent Freeh 1 Xiaosong Ma 1,2 Nandan Tammineedi 1 Jonathan Strickland 1 Sudharshan.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY FreeLoader: Scavenging Desktop Storage Resources for.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
7. Grid Computing Systems and Resource Management
International Conference on Autonomic Computing Governor: Autonomic Throttling for Aggressive Idle Resource Scavenging Jonathan Strickland (1) Vincent.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
GridFTP Guy Warner, NeSC Training Team.
1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory.
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Database Applications (15-415) DBMS Internals- Part I Lecture 11, February 16, 2016 Mohammad Hammoud.
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Clouds , Grids and Clusters
Large-scale file systems and Map-Reduce
Introduction to Data Management in EGI
Wonderware Online Cost-Effective SaaS Solution Powered by the Microsoft Azure Cloud Platform Delivers Industrial Insights to Users and OEMs MICROSOFT AZURE.
Grid Computing.
Web Caching? Web Caching:.
Cache Memory Presentation I
Introduction to Networks
Memory Management for Scalable Web Data Servers
University of Technology
Grid Canada Testbed using HEP applications
An Introduction to Computer Networking
CloneManager® Helps Users Harness the Power of Microsoft Azure to Clone and Migrate Systems into the Cloud Cost-Effectively and Securely MICROSOFT AZURE.
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
EE 122: Lecture 22 (Overlay Networks)
Database System Architectures
Data Management Components for a Research Data Archive
Support for Adaptivity in ARMCI Using Migratable Objects
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets A.Chervenak, I.Foster, C.Kesselman, C.Salisbury,
Presentation transcript:

Optimizing End-User Data Delivery Using Storage Virtualization Sudharshan Vazhkudai Oak Ridge National Laboratory Ohio State University Systems Group Seminar October 20th, 2006 Columbus, Ohio

Problem space: Client-side caching Storage Virtualization: Outline Problem space: Client-side caching Storage Virtualization: FreeLoader Desktop Storage Cache A Virtual cache: Prefix caching End on a funny note!!

Problem Domain Data Deluge Experimental facilities: SNS, LHC (PBs/yr) Observatories: sky surveys, world-wide telescopes Simulations from NLCF end-stations Internet archives: NIH GenBank (serves 100 gigabases of sequence data) Typical user access traits on large scientific data Download remote datasets using favorite tools FTP, GridFTP, hsi, wget Shared interest among groups of researchers A Bioinformatics group collectively analyze and visualize a sequence database for a few days: Locality of interest! Often times, discard original datasets after interest dissipates

So, what’s the problem with this story? Wide-area data movement is full of pitfalls Sever bottlenecks, BW/latency fluctuations GridFTP-like tuned tools not widely available Popular Internet repositories still served through modest transfer tools! User applications are often latency intolerant e.g., real-time viz rendering of a TerraServer map from Microsoft on ORNL’s tiled display! Why can’t we address this with the current storage landscape? Shared storage: Limited quotas Dedicated storage: SAN storage is a non-trivial expense! (4TB disk array ~ $40K) Local storage: Usually not enough for such large datasets Archive in mass storage for future accesses: High latency Upshot Retrieval rates significantly lower than local I/O or LAN throughput

Is there a silver lining at all? (Desktop Traits) Desktop Capabilities better than ever before Space usage to Available storage ratio is significantly low in academic and industry settings Increasing numbers of workstations online most of the time At ORNL-CSMD, ~ 600 machines are estimated to be online at any given time At NCSU, > 90% availability of 500 machines Well-connected, secure LAN settings A high-speed LAN connection can stream data faster than local disk I/O

Storage Virtualization? Can we use novel storage abstractions to provide: More storage than locally available Better performance than local or remote I/O A seamless architecture for accessing and storing transient data

Desktop Storage Scavenging as a means to virtualize I/O access FreeLoader Imagine Condor for storage Harness the collective storage potential of desktop workstations ~ Harnessing idle CPU cycles Increased throughput due to striping Split large datasets into pieces, Morsels, and stripe them across desktops Scientific data trends Usually write-once-read-many Remote copy held elsewhere Primarily sequential accesses Data trends + LAN-Desktop Traits + user access patterns make collaborative caches using storage scavenging a viable alternative!

Old wine in a new bottle…? Key strategies derived from “best practices” across a broad range of storage paradigms… Desktop Storage Scavenging from P2P systems Striping, parallel I/O from parallel file systems Caching from cooperative Web caching And, applied to scientific data management for Access locality, aggregating I/O, network bandwidth and data sharing Posing new challenges and opportunities: heterogeneity, striping, volatility, donor impact, cache management and availability

FreeLoader Environment

FreeLoader Architecture Lightweight UDP Scavenger device: metadata bitmaps, morsel organization Morsel service layer Monitoring and Impact control Global free space management Metadata management Soft-state registrations Data placement Cache management Profiling

Testbed and Experiment setup FreeLoader installed in a user’s HPC setting GridFTP access to NFS GridFTP access to PVFS hsi access to HPSS Cold data from tapes Hot data from disk caches wget access to Internet archive

Comparing FreeLoader with other storage systems

Optimizing access to the cache: Client Access-pattern Aware Striping Uploading client likely to access more frequently So, let’s try to optimize data placement for him! Overlap network I/O with local I/O What is the optimal local:remote data ratio? Model

What the scavenged storage “is not”: Philosophizing… What the scavenged storage “is not”: Not a file system, not a replacement to high-end storage Not intended for wide-area resource integration What it “is”: Low-cost, best-effort storage cache for scientific data sources Intended to facilitate Transient access to large, read-only datasets Data sharing within administrative domain To be used in conjunction with higher-end storage systems

Towards a “virtual cache” Scientific data caches typically host complete datasets Not always feasible in our environment since: Desktop workstations can fail or space contributions can be withdrawn leaving partial datasets Not enough space in the cache to host the new dataset in entirety Cache evictions can leave partial copies of datasets Can we host partial copies of datasets and yet serve client accesses to the entire dataset? ~ FileSystem-BufferCache:Disk :: FreeLoader:RemoteDataSource

The Prefix Caching Problem: Impedance Matching on Steroids!! HTTP Prefix Caching Multimedia, streaming data delivery BitTorrent P2P System: leechers can download and yet serve Benefits Bootstrapping the download process Store more datasets Allows for efficient cache management Oh…, that scientific data trends again (how convenient…) Immutable data, Remote source copy, Primarily sequential accesses Challenges Clients should be oblivious to dataset being partially available Performance hit? How much of the prefix of a dataset to cache? So, client accesses can progress seamlessly Online patching issues Client access to remote patching I/O mismatch Wide-area download vagaries

Virtual Cache Architecture Capability-based resource aggregation Persistent storage & BW-only donors Client serving: parallel get Remote patching using URIs Better cache management Stripe entirely when space available When eviction is needed, only stripe a prefix of the dataset Victims based on LRU: Evict chunks from the tail until a prefix Entire datasets evicted only after all such tails are evicted

Prefix Size Prediction Goal: Eliminate client perceived delay in data access What is an optimal prefix size to hide the cost of suffix patching? Prefix size depends on: Dataset size, S In-cache data access rate by the client, Rclient Suffix patching rate, Rpatch Initial latency in suffix patching, L Client access rate indicative of time to patch, S/Rclient = L + (S – Sprefix)/Rpatch Thus, Sprefix = S(1 – Rpatch/Rclient) + LRpatch

Can we derive from collective I/O in parallel I/O Collective Download Why? Wide-area transfer reasons: Storage systems and protocols for HEC are tuned for bulk transfers (GridFTP, HSI) Wide-area transfer pitfalls: high latency, connection establishment cost Client’s local-area cache access reasons: Client accesses to the cache use a smaller stripe size (e.g., 1MB chunks in FreeLoader) Finer granularity for better client access rates Can we derive from collective I/O in parallel I/O

Collective Download Implementation Patching nodes perform bulk, remote I/O; ~ 256MB per request Reducing multiple authentication costs per dataset Automated interactive session with “Expect” for single sign on FreeLoader patching framework instrumented with Expect Protocol needs to allow sessions (GridFTP, HSI) Need to reconcile the mismatch in client access stripe size and the bulk, remote I/O request size Shuffling Patching nodes, p, redistribute the downloaded chunks among themselves according to the client’s striping policy Redistribution will enable a round-robin client access Each patching node redistributes (p – 1)/p of the downloaded data Shuffling accomplished in memory to motivate BW-only donors Thus, client serving, collective download and shuffling are all overlapped

Testbed and Experiment setup UberFTP stateful client to GridFTP servers at TeraGrid-PSC and TeraGrid-ORNL HSI access to HPSS Cold data from tapes FreeLoader patching framework deployed in this setting

Collective Download Performance PW=10; I/O=256M Download Download + Shuffle Client access HPSS 13.6 12.3 -9.6% 11.7 -4.9% Tera-ORNL 79.7 75.1 -5.8% 74.7 -1.3% Tera-PSC 21.9 20.2 -7.8% 20 -1.0%

Prefix Size Model Verification Data sources HPSS-ORNL Tera-ORNL Tera-PSC Rclient (MB/s) 52.2 Rpatch (MB/s) 7.6 42 10.8 L (s) 31.4 3 3.9 Predicted ratio 95% 24.6% 81%

Impact of Prefix Caching on Cache Hit rate Jefferson Lab Asynchronous Storage Manager (JASMine) No of days 19.1 No of accesses 4000 No of unique datasets 1686 Tera-ORNL will see improvements around 0.2 and 0.4 curve (308% and 176% for 20% and 40% prefix ratio) Tera-PSC sees up to 76% improvement in hit rate with 80% prefix ratio

Let me philosophize again… Novel storage abstractions as a means to: Provide performance impedance matching Overlap remote I/O, cache I/O and local I/O into a seamless “data pathway” Provide rich resource aggregation models Provide a low-cost, best-effort architecture for “transient” data A combination of best practices from: parallel I/O, P2P scavenging, cooperative caching, HTTP multimedia streaming; brought to bear on “scientific data caching” Intermediate data cache exploits this area

Collaborator: Xiaosong Ma (NCSU) Let me advertise… http://www.csm.ornl.gov/~vazhkuda/Storage.html Email: vazhkudaiss@ornl.gov Collaborator: Xiaosong Ma (NCSU) Funding: DOE ORNL LDRD (Terascale & Petascale initiatives) Interested in joining our team? Full time positions and summer internships available

More slides Some performance numbers Impact studies

Striping Parameters

Client-side Filters

Computation Impact

Network Activity Test

Disk-intensive Task

Impact Control