Download presentation
Presentation is loading. Please wait.
1
Year 2 Updates
2
Outline Cost Model Based Optimization of Distributed Joins
Under submission to SC 2016 Modeling In-Situ Analytics Using SKOPE Using an analytical model Paper in Preparation
3
DistriPlan - Context Processing (scientific) array data:
Geographically distributed Processed using structured operators Stored in file data stores (NetCDF/HDF5/..) What we needed Have an optimizer and an execution engine. Can execute queries remotely and accumulate results. What Complicates It Joins! Many practical operations come down to join over values/dimensions
4
Motivation Process on one node Process on two nodes From To 1 2 3 From
Total of 2 π Options !!! Too much for current optimizers!! β¦
5
DistriPlan Introducing DistriPlan
A Cost Based Optimizer (CBO) for distributed array data querying engine. Introduces two new features: Isomorphic plans pruning. Cost model for distributed data
6
DistriPlan (Isomorphism prunning)
SAME PLAN!! Will be counted as one plan, saving engine processing
7
DistriPlan (Cost Model)
Nodeβs Expected Results Nodeβs Evaluated Cost
8
Feasible problem scale
Performance Feasible problem scale
9
Performance Always reaches better performance (or same) as non CBO optimized plans
10
Performance Parameters for the CBO equations are essential for correct evaluation:
11
Summary Isomorphism pruning is essential
CBO improves performance tremendously So far only emulating wide-area networks Start Using Models Developed by Others?
12
Modeling In-Situ Analytics
Recap: Smart: A MapReduce-like Framework for Developing In-Situ Analytics Modeling problem: predict the time certain analytics will take Modelled a Typical Loop Using SKOPE Developed Analytical Model
13
Canonical Random Write Loop
A canonical random write loop has the following characteristics*: Reduction objects are updated by only associative and commutative operations; No other loop carried dependencies; The element(s) of the reduction object updated in a particular iteration cannot be determined statically or with inexpensive runtime preprocessing Structure of a typical canonical random write loop for data mining task: * Jin, Ruoming, and Gagan Agrawal. "Performance prediction for random write reductions: a case study in modeling shared memory programs.β ACM SIGMETRICS Performance Evaluation Review. Vol. 30. No. 1. ACM, 2002.
14
A Closed-Format Expression Model for Canonical Loop
Paramters Descriptions Selem size of input elements (GB) Sobj size of reduction objects (KB) Niter number of iterations Nnode number of nodes a, b, c, d, e, f factors determined by regression
15
Modeling Cache Performance
For a cache of two-level hierarchy, the memory access time can be modeled as: π πππ = π πππ β( π‘ π1ββππ‘ + π π1βπππ π β π‘ π2ββππ‘ + π π2βπππ π β πΏ πππ ) For applications used in our experiments, the cache miss rate is dependent on dataset size(D), data access stride(s), cache capacity(C1 and C2 for L1 and L2 respectively) and block size(B1 and B2 for L1 and L2 respectively)
16
Modeling Cache Performance
By defining parameters in the hardware model in SKOPE, and collecting memory access information such as read, write, access stride from code skeletons, we can utilize SKOPE to predict cache performance Cache Performance Model for a Two-Level Cache Hierarchy
17
Modeling Page Fault Penalty
When the dataset size exceeds a threshold(~10 GB), the execution time shows super-linear growth trend and the number of page faults increases markedly. Execution Time and Page Faults of In-Situ Analytics with Varying Output Dataset Size (per time step)
18
Modeling Page Fault Penalty
The page fault penalty can be modeled as: π πππππππ’ππ‘ = π πππ β π πππππππ’ππ‘ β πΏ πππππππ’ππ‘ The page fault rate is dependent on the memory required by the application (Mreq) and the available physical memory capacity(M): π πππππππ’ππ‘ =( π πππ βπ)/ π πππ Memory allocation information can be collected from code skeletons and other parameters can be defined in hardware models.
19
Predicting Scalability of In-Situ Analytics (Smart)
Predicting Scalability of Smart (conducted on the TACC Stampede cluster using 4 nodes )
20
Predicting Scalability of In-Situ Analytics (Smart)
For applications which have better scalability, such as histogram and moving average, both the prediction framework and analytical model perform accurate prediction. For K-means in which there are more synchronization overheads, the extended SKOPE framework outperforms the analytic model approach
21
Predicting Performance of In-Situ Analytics (Smart) over Large Dataset Involving Page Faults
Comparison of Predicted Performance between Original SKOPE and Extended SKOPE (conducted on one node of OSU RI cluster using 4 threads, memory capacity = 12 GB)
22
Proposed Work Predicting performance for stencil computation
Abstracting data access pattern of stencil computation Modeling cache performance Modeling performance for optimization approaches such as tiling
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.