OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY FreeLoader: Scavenging Desktop Storage Resources for.

Slides:



Advertisements
Similar presentations
Tivoli SANergy. SANs are Powerful, but... Most SANs today offer limited value One system, multiple storage devices Multiple systems, isolated zones of.
Advertisements

Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Coupling Prefix Caching and Collective Downloads for.
 Introduction Originally developed by Open Software Foundation (OSF), which is now called The Open Group ( Provides a set of tools and.
Novell Server Linux vs. windows server 2008 By: Gabe Miller.
An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Positioning Dynamic Storage Caches for Transient Data Sudharshan VazhkudaiOak Ridge National Lab Douglas ThainUniversity of Notre Dame Xiaosong Ma North.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
October 16-18, Research Data Set Archives Steven Worley Scientific Computing Division Data Support Section.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
1 FreeLoader: borrowing desktop resources for large transient data Vincent Freeh 1 Xiaosong Ma 1,2 Stephen Scott 2 Jonathan Strickland 1 Nandan Tammineedi.
Part VII: Special Topics Introduction to Business 3e 18 Copyright © 2004 South-Western. All rights reserved. Using Information Technology.
Grid Computing. What is a Grid? Many definitions exist in the literature Early definitions: Foster and Kesselman, 1998 –“A computational grid is a hardware.
Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
On-demand Grid Storage Using Scavenging Sudharshan Vazhkudai Network and Cluster Computing, CSMD Oak Ridge National Laboratory
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
Presented by The Earth System Grid: Turning Climate Datasets into Community Resources David E. Bernholdt, ORNL on behalf of the Earth System Grid team.
1 Introduction to Grid Computing. 2 What is a Grid? Many definitions exist in the literature Early definitions: Foster and Kesselman, 1998 “A computational.
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
PPDG and ATLAS Particle Physics Data Grid Ed May - ANL ATLAS Software Week LBNL May 12, 2000.
1/15 G. Manduchi EPICS Collaboration Meeting, Aix-en-Provence, Spring 2010 INTEGRATION OF EPICS AND MDSplus G. Manduchi, A. Luchetta, C. Taliercio, R.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
2  Supervisor : MENG Sreymom  SNA 2012_Group4  Group Member  CHAN SaratYUN Sinot  PRING SithaPOV Sopheap  CHUT MattaTHAN Vibol  LON SichoeumBEN.
1 FreeLoader: Lightweight Data Management for Scientific Visualization Vincent Freeh 1 Xiaosong Ma 1,2 Nandan Tammineedi 1 Jonathan Strickland 1 Sudharshan.
DOE PI Meeting at BNL 1 Lightweight High-performance I/O for Data-intensive Computing Jun Wang Computer Architecture and Storage System Laboratory (CASS)
Rafaella Luque 8vo “a”. A computer network is a group of computers that are connected to each other for the purpose of communication.
OS Services And Networking Support Juan Wang Qi Pan Department of Computer Science Southeastern University August 1999.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
7. Grid Computing Systems and Resource Management
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
International Conference on Autonomic Computing Governor: Autonomic Throttling for Aggressive Idle Resource Scavenging Jonathan Strickland (1) Vincent.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Computing Strategies. A computing strategy should identify – the hardware, – the software, – Internet services, and – the network connectivity needed.
Background Computer System Architectures Computer System Software.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory.
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Clouds , Grids and Clusters
Grid Computing.
Thoughts on Computing Upgrade Activities
Grid Canada Testbed using HEP applications
An Introduction to Computer Networking
Milestone 2 Include the names of the papers
Optimizing End-User Data Delivery Using Storage Virtualization
Grid Application Model and Design and Implementation of Grid Services
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
LO3 – Understand Business IT Systems
Data Management Components for a Research Data Archive
Presentation transcript:

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY FreeLoader: Scavenging Desktop Storage Resources for Scientific Data Sudharshan Vazhkudai, 1 Xiaosong Ma, 1,2 Vincent Freeh, 2 Jonathan Strickland, 2 Nandan Tammineedi, 2 and Stephen Scott 1 1 Oak Ridge National Laboratory 2 North Carolina State University SC|05 Technical Paper Presentation Session: Storage and Data November 17, 2005 Seattle, WA

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Outline  Problem space  Desktop storage scavenging for scientific data  FreeLoader architecture  FreeLoader performance in a user’s HPC setting  Philosophizing…  Wrap up on a funny note!

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Problem Domain  Data Deluge  Experimental facilities: SNS, LHC (PBs/yr)  Observatories: sky surveys, world-wide telescopes  Simulations from NLCF end-stations  Internet archives: NIH GenBank (serves 100 gigabases of sequence data)  Typical user access traits on large scientific data  Download remote datasets using favorite tools  FTP, GridFTP, hsi, wget  Shared interest among groups of researchers  A Bioinformatics group collectively analyze and visualize a sequence database for a few days: Locality of interest!  Often times, discard original datasets after interest dissipates

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY So, what’s the problem with this story?  Wide-area data movement is full of pitfalls  Sever bottlenecks, BW/latency fluctuations  GridFTP-like tuned tools not widely available  Popular Internet repositories still served through modest transfer tools!  User applications are often latency intolerant  e.g., real-time viz rendering of a TerraServer map from Microsoft on ORNL’s tiled display!  Why can’t we address this with the current storage landscape?  Shared storage: Limited quotas  Dedicated storage: SAN storage is a non-trivial expense! (4TB disk array ~ $40K)  Local storage: Usually not enough for such large datasets  Archive in mass storage for future accesses: High latency  Upshot  Retrieval rates significantly lower than local I/O or LAN throughput

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Is there a silver lining at all? (Desktop Traits)  Desktop Capabilities better than ever before  Space usage to Available storage ratio is significantly low in academic and industry settings  Increasing numbers of workstations online most of the time  At ORNL-CSMD, ~ 600 machines are estimated to be online at any given time  At NCSU, > 90% availability of 500 machines  Well-connected, secure LAN settings  A high-speed LAN connection can stream data faster than local disk I/O

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Desktop Storage Scavenging?  FreeLoader  Imagine Condor for storage  Harness the collective storage potential of desktop workstations ~ Harnessing idle CPU cycles  Increased throughput due to striping  Split large datasets into pieces, Morsels, and stripe them across desktops  Scientific data trends  Usually write-once-read-many  Remote copy held elsewhere  Primarily sequential accesses  Data trends + LAN-Desktop Traits + user access patterns make collaborative caches using storage scavenging a viable alternative!

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Old wine in a new bottle?  Key strategies derived from “best practices” across a broad range of storage paradigms…  Desktop Storage Scavenging from P2P systems  Striping, parallel I/O from parallel file systems  Caching from cooperative Web caching  And, applied to scientific data management for  Access locality, aggregating I/O, network bandwidth and data sharing  Posing new challenges and opportunities: heterogeneity, striping, volatility, donor impact, cache management and availability

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY FreeLoader Environment

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY FreeLoader Architecture  Lightweight UDP  Scavenger device: metadata bitmaps, morsel organization  Morsel service layer  Monitoring and Impact control  Global free space management  Metadata management  Soft-state registrations  Data placement  Cache management  Profiling

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Testbed and Experiment setup  FreeLoader installed in a user’s HPC setting  GridFTP access to NFS  GridFTP access to PVFS  hsi access to HPSS  Cold data from tapes  Hot data from disk caches  wget access to Internet archive

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Comparing FreeLoader with other storage systems

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Client Access-pattern Aware Striping  Uploading client likely to access more frequently  So, let’s try to optimize data placement for him!  Overlap network I/O with local I/O  What is the optimal local:remote data ratio?  Model

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Striping Parameters

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Client-side Filters

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Computation Impact

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Network Activity Test

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Disk-intensive Task

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Impact Control

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Philosophizing…  What the scavenged storage “is not”:  Not a file system, not a replacement to high-end storage  Not intended for wide-area resource integration  What it “is”:  Low-cost, best-effort storage cache for scientific data sources  Intended to facilitate  Transient access to large, read-only datasets  Data sharing within administrative domain  To be used in conjunction with higher-end storage systems

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY