CSCE 990: Advanced Distributed Systems

CSCE 990: Advanced Distributed Systems
Prof. Ying Lu Spring 2017 Usage Pattern-Driven Dynamic Data Layout Reorganization by Sai Suman Department of Computer Science University of Nebraska-Lincoln

What problem is the paper trying to address?
Scientific applications work on massive amounts of data Write patterns of data access doesn’t match read patterns This leads to poor performance Example: Scientific simulations write data of all variables in time steps (write layout) Analysis and visualization applications read subset of variables over several time steps (read pattern) Poor performance Mismatch between write layout and read patterns

Solution is to increase contiguous I/O accesses Layout reorganization methods are one way to do that Create multiple full replicas of data (or) Merge multiple non-contiguous data blocks into a single continuous chunk But, need to support multiple read access patterns Dynamically

Dynamic data reorganization framework Support multiple read access patterns Redirection of read accesses to favorable layout Also, Dynamic data access pattern tracing Efficient storage of replicas Partial replicas

Approach Dynamic Pattern Identification
Trace read accesses at runtime Identify data usage patterns of the application Flexible multi-layout management with storage budgets Evaluate user specified storage constraints and current usage patterns Support multiple layout reorganization techniques Select the most suitable technique Runtime decision making with partial match and redirection

Supported reorganization layouts
Data reorganization techniques that our framework supports. The numbers in each cell are the starting offsets of original data, and the arrow lines are the order of the reorganized offsets: (a) (original) row-major layout, (b) column-major (transposition) layout, (c) blocked (chunking) layout, used as a pre-processing step before applying (d) and (e), (d) z-curve, (e) Hilbertcurve, (f) custom merging of a subset (data at offsets 5, 9, 13, 7, 11, 15).

Approach (contd…) Runtime decision making with partial match and redirection Page-level cost model to estimate data access cost during decision making Automatic read-redirection to the selected layout Allow partial match for read patterns over reorganized replicas By allowing partial matches between read patterns and the reorganized replicas, we extend the usability of existing layouts compared to the exact match strategy from previous work

Design Trace Analyzer Layout Decision Maker
Trace I/O read calls and identify data access patterns Layout Decision Maker Analyze the cost of access over available layout for requested data Select layout with best access performance Pattern and Layout Knowledge Base Metadata for available layout and data access pattern history Data Reorganization Manager Reorganizes and replicates the data with optimized layouts When data present in different layouts  directs to layout with best performance

Design

Trace analysis and pattern recognition
To understand data usage of applications  trace I/O read calls and identifying patterns Data usage patterns: Variables within dataset being accessed Variables within accessed region Size of the request Identify metadata from read calls (HDF5) issued by the application Store it in an auxiliary data structure Analyze data selection information: Element point Bounding boxes

Bounding Box Selection: Applications reads data from a variable bounded by spatial locations Example: A sub-plane in 2D array, a sub-cube in 3D array Hyperslab in HDF5 I/O library Users can select multiple bounding boxes using set operations Useful when data of a dataset scattered in a file

Element selection: Uses a query to find the data Requires the coordinates of scattered data elements But, large number of non-contiguous reads with small request sizes  low I/O throughput Need for optimization

Generic optimization  optimize without requiring high-level criteria of selection Doesn’t change existing indexing techniques Example: Data of interest = particles of high energy i.e. Energy > some x Small subset of elements would be repeatedly accessed Place those elements in a contiguous data chunk  save access time for future accesses Range query Clustering

Layout decision making
Find best matching replica using pattern detection information 3 steps: Step 1: Candidate selection: A) Course grained pruning: Prune layouts that don’t satisfy the request B)Fine grained pruning: Load metadata of remaining layouts and compare with requested regions

consider replicas that partially overlap with the requested data as not eligible because the overlapping regions cannot be estimated accurately without loading the metadata of a replica.

Step 2: Layout ranking via cost model Use a page-level cost model to estimate file read cost Estimated read time for replica r across multiple Object Storage Targets (OSTs) O is the Object Storage Targets (OSTs) The requested region is converted to linear space and Nipg and Nichk are calculated within an OST Select replica with smallest Tr

Step 3: I/O redirection in HDF5 Automatically redirect the read to selected replica (from the cost model) using the replica’s metadata (file, variable’s name and path etc.) When a replica is not found by the model, default HDF5 read is used

Pattern and Layout Knowledge Base
Offline incremental analysis to extract and analyze data usage patterns On a read call, analyze: When and how many processes are issuing read requests together Total size and I/O throughput For each read, a usage pattern is generated and stored in pattern history: {variable name/path, selection type and spatial region, process IDs, start/end time, total size, I/O rate} Merge local patterns to create global patterns and store them in knowledge base Also store replica’s original file and reorganization information Merge is done as a preprocessing step Global patterns help in “hot” data

Data Layout Reorganization
3 separate tasks 1) Replica creation: When and how to create a new layout using knowledge base information 3 scenarios: a) The original dataset can be reorganized and limited additional storage space is allowed b) The original dataset cannot be modified, but unlimited storage space is allowed c) The original dataset can be reorganized but no additional storage is allowed

2) Replica Eviction Replicas ranked according according to a combination of: Recent usage Size effectiveness (performance improvement timeold/timenew) Evict older and less effective replicas More details needed from the paper but not provided

3) OST-Aware Replica Placement BAD: (a) Each process access data from all OSTs. (b) Each process access data from a subset of OSTs. IDEAL: (a) The number of processes equals to that of OSTs. (b) The number of processes is larger than that of OSTs.

3) OST-Aware Replica Placement Stripe count = number of OSTs to access Stripe size = size of data to write to an OST When multiple processes reading contiguous data with size ~ (Stripe count) * (Stipe size) Each OST accessed by several processes  contention To avoid OST contention, each OST should be contacted by as few processes as possible No details are provided on how this is done (?) OST  storage devices

Experimental comparisons
Plasma physics particle data : subset of a trillion particle dataset with Energy > 1.1. Queries selected using Energy > some X 1.3 to 90 times speedup overall

Electrocortocigraphy (ECoG) Data: records the electrical activity from the cerebral cortex when the patient is reading different words at different times. 2 to 6 times speedup overall Here, concatenation techniques is used for layout reorganization Read time comparison over different query selections

Adaptive Mesh Refinement (AMR) Data: allows user to specify a multi-dimensional region, and read the corresponding data of all levels 1.8 to 4 times speedup overall

Conclusion and future work
Data usage patterns to select the most suitable layout Storage efficient optimizations Improves read performance of several scientific applications involving data analysis Future work: Data compression techniques for reducing size of data accesses

Questions?

CSCE 990: Advanced Distributed Systems

Similar presentations

Presentation on theme: "CSCE 990: Advanced Distributed Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSCE 990: Advanced Distributed Systems

Similar presentations

Presentation on theme: "CSCE 990: Advanced Distributed Systems"— Presentation transcript:

Similar presentations

About project

Feedback