Download presentation
Presentation is loading. Please wait.
Published byCharleen Anthony Modified over 8 years ago
1
1 FreeLoader: Lightweight Data Management for Scientific Visualization Vincent Freeh 1 Xiaosong Ma 1,2 Nandan Tammineedi 1 Jonathan Strickland 1 Sudharshan Vazhkudai 2 1. North Carolina State University 2. Oak Ridge National Laboratory September, 2004
2
2 Roadmap Motivation FreeLoader architecture Initial design and optimization Preliminary results In-progress and future work
3
3 Motivation: Data Avalanche More data to process Science, industry, government Example: scientific data Better observational instruments Better experimental instruments More simulation power (Picture courtesy: Jim Gray, SLAC Data Management Workshop) Space Telescope P&E Gene Sequencer From http://www.genome.uci.edu/
4
4 Motivation: Needs for Remote Data Data acquisition, reduction, analysis, visualization, storage Data Acquisition System Remote users with local computing and storage Remote storage Local users High Speed Network Metadata raw data Remote users Supercomputers
5
5 Motivation: Remote Data Sources Supercomputing centers Shared file systems Archiving systems Data centers Internet World Wide Telescope Virtual Observatory NCBI bio databases Tools used in access FTP, GridFTP Grid file systems Customized data migration program Web browser
6
6 Motivation: Insufficient Local Storage End user consumes data locally Convenience and control Better CPU/memory configurations Problem 1: needs local space to hold data Problem 2: getting data from remote sources is slow Dataset characteristics Write-once, read-many (or a few) Raw data often discarded Shared interest to same data among groups Primary copy archived somewhere
7
7 Condor for Storage? Harnessing storage resources of individual workstations ~ Harnessing idle CPU cycles
8
8 Why would it work, and work well? Average workstations have more and more GBs And half of the space is idle! Even a modest contribution (Contribution << Available) can amass collective, staggering aggregate storage! Increasing numbers of workstations are online most of the time [desk-top grid research] Access locality, aggregate I/O and network bandwidth, data sharing
9
9 Use Cases FreeLoader storage cloud as a: Cache Local, client-side scratch Intermediate hop Grid replica RAS for Terascale Supercomputers
10
10 Related Work and Design Issues Related Work: Network/Distributed File Systems (NFS, LOCUS) Parallel File Systems (PVFS, XFS) Serverless File Systems (FARSITE, xFS, GFS) Peer-to-Peer Storage (OceanStore, PAST, CFS) Grid Storage Services (LegionFS, SRB, IBP, SRM, GASS) Design Issues & Assumptions: Scalability: O(100) or O(1000) Commodity Components User Autonomy Security and trust Heterogeneity Large, “write once read many” datasets Transparent Naming Grid Aware
11
11 Intended Role of FreeLoader What the scavenged storage “is not”: Not a replacement to high-end storage Not a file system Not intended for integrating resources at wide-area scale What it “is”: Low-cost, best-effort alternative to scientific data sources Intended to facilitate transient access to large, read-only datasets data sharing within administrative domain To be used in conjunction with higher-end storage systems
12
12 FreeLoader Architecture Pool n Morsel Access, Data Integrity, Non-invasiveness Management Layer Data Placement, Replication, Grid Awareness, Metadata Management Management Layer Data Placement, Replication, Grid Awareness, Metadata Management Pool A Registration Storage Layer Pool m Registration Grid Data Access Tools
13
13 Storage Layer Benefactors: Morsels as a unit of contribution Basic morsel operations [new(), free(), get(), put()…] Space Reclaim: User withdrawal / space shrinkage Data Integrity through checksums Performance history Pools: Benefactor registrations (soft state) Dataset distributions Metadata Selection heuristics dataset 1: 1 23 dataset n: 1a 2a 3a 4a 2a1a 21 4a3a 23 2a1a 3a1
14
14 Management Layer Manager: Pool registrations Metadata: datasets-to-pools; pools-to-benefactors, etc. Availability: Redundant Array of Replicated Morsels Minimum replication factor for morsels Where to replicate? Which morsel replica to choose? Grid Awareness: Information Providers Space reservations Transfer protocols Transparent Access: Namespace
15
15 Dataset Striping Stripe datasets across benefactors Morsel doubles as basic unit of striping Multiple-fold benefits Higher aggregate access bandwidth Better resource usage Lowering impact per benefactor Tradeoff between access rates and availability Need to consider Heterogeneity, network connections Working together with replication Serving partial datasets
16
16 Current Status Application Client Manager Benefactor OS Benefactor OS I/O interface UDP (A) UDP (C) UDP/TCP (B) reserve() cancel() store() retrieve() delete() open() close() read() write() new() free() get() put() (A) services: Dataset creation/deletion Space reservation (B) services: Dataset retrieval Hints (C) services: Registration Benefactor alerts, warnings, alarms to manager (D) services: Dataset store Morsel request UDP/TCP (D) Simple data striping
17
17 Preliminary Results: Experiment Setup FreeLoader prototype running at ORNL Client Box AMD Athlon 700MHz 400MB memory Gig-E card Linux 2.4.20-8 Benefactors Group of heterogeneous Linux workstations Contributing 7GB-30GB each 100Mb cards
18
18 Sample Data Sources Local GPFS Attached to ORNL SPs Accessed through GridFTP 1MB TCP buffer, 4 parallel streams Local HPSS Accessed through HSI client, highly optimized Hot: data in disk cache without tape unloading Cold: data purged, retrieval done in large intervals Remote NFS At NCSU HPC center Accessed through GridFTP 1MB TCP buffer, 4 parallel streams
19
19 FreeLoader Data Retrieval Performance Throughput (MB/s)
20
20 Impact Tests How uncomfortable donors may feel? A set of tests at NCSU Benefactor performing local tasks Client retrieving datasets at a given rate
21
21 CPU-intensive Task Time (s)
22
22 Network-intensive Task Normalized Download Time
23
23 Disk-intensive Task Throughput (MB/s)
24
24 Mixed Task: Linux Kernel Compilation Time (s)
25
25 In-progress and Future Work In-progress APIs for use as scratch space Windows support Future Complete pool structure, registration Intelligent data distribution, service profiling Benefactor impact control, self-configuration Naming and replication Grid awareness Potential extensions Harnessing local storage at cluster nodes? Complementing commercial storage servers?
26
26 Further Information http://www.csm.ornl.gov/~vazhkuda/Morsels/ http://www.csm.ornl.gov/~vazhkuda/Morsels/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.