RAMSES@Ohio-State Year 2 Updates
Outline Cost Model Based Optimization of Distributed Joins Under submission to SC 2016 Modeling In-Situ Analytics Using SKOPE Using an analytical model Paper in Preparation
DistriPlan - Context Processing (scientific) array data: Geographically distributed Processed using structured operators Stored in file data stores (NetCDF/HDF5/..) What we needed Have an optimizer and an execution engine. Can execute queries remotely and accumulate results. What Complicates It Joins! Many practical operations come down to join over values/dimensions
Motivation Process on one node Process on two nodes From To 1 2 3 From Total of 2 𝑛 Options !!! Too much for current optimizers!! …
DistriPlan Introducing DistriPlan A Cost Based Optimizer (CBO) for distributed array data querying engine. Introduces two new features: Isomorphic plans pruning. Cost model for distributed data
DistriPlan (Isomorphism prunning) SAME PLAN!! Will be counted as one plan, saving engine processing
DistriPlan (Cost Model) Node’s Expected Results Node’s Evaluated Cost
Feasible problem scale Performance Feasible problem scale
Performance Always reaches better performance (or same) as non CBO optimized plans
Performance Parameters for the CBO equations are essential for correct evaluation:
Summary Isomorphism pruning is essential CBO improves performance tremendously So far only emulating wide-area networks Start Using Models Developed by Others?
Modeling In-Situ Analytics Recap: Smart: A MapReduce-like Framework for Developing In-Situ Analytics Modeling problem: predict the time certain analytics will take Modelled a Typical Loop Using SKOPE Developed Analytical Model
Canonical Random Write Loop A canonical random write loop has the following characteristics*: Reduction objects are updated by only associative and commutative operations; No other loop carried dependencies; The element(s) of the reduction object updated in a particular iteration cannot be determined statically or with inexpensive runtime preprocessing Structure of a typical canonical random write loop for data mining task: * Jin, Ruoming, and Gagan Agrawal. "Performance prediction for random write reductions: a case study in modeling shared memory programs.” ACM SIGMETRICS Performance Evaluation Review. Vol. 30. No. 1. ACM, 2002.
A Closed-Format Expression Model for Canonical Loop Paramters Descriptions Selem size of input elements (GB) Sobj size of reduction objects (KB) Niter number of iterations Nnode number of nodes a, b, c, d, e, f factors determined by regression
Modeling Cache Performance For a cache of two-level hierarchy, the memory access time can be modeled as: 𝑇 𝑚𝑒𝑚 = 𝑁 𝑚𝑒𝑚 ∗( 𝑡 𝑙1−ℎ𝑖𝑡 + 𝑃 𝑙1−𝑚𝑖𝑠𝑠 ∗ 𝑡 𝑙2−ℎ𝑖𝑡 + 𝑃 𝑙2−𝑚𝑖𝑠𝑠 ∗ 𝐿 𝑚𝑒𝑚 ) For applications used in our experiments, the cache miss rate is dependent on dataset size(D), data access stride(s), cache capacity(C1 and C2 for L1 and L2 respectively) and block size(B1 and B2 for L1 and L2 respectively)
Modeling Cache Performance By defining parameters in the hardware model in SKOPE, and collecting memory access information such as read, write, access stride from code skeletons, we can utilize SKOPE to predict cache performance Cache Performance Model for a Two-Level Cache Hierarchy
Modeling Page Fault Penalty When the dataset size exceeds a threshold(~10 GB), the execution time shows super-linear growth trend and the number of page faults increases markedly. Execution Time and Page Faults of In-Situ Analytics with Varying Output Dataset Size (per time step)
Modeling Page Fault Penalty The page fault penalty can be modeled as: 𝑇 𝑝𝑎𝑔𝑒𝑓𝑎𝑢𝑙𝑡 = 𝑁 𝑚𝑒𝑚 ∗ 𝑃 𝑝𝑎𝑔𝑒𝑓𝑎𝑢𝑙𝑡 ∗ 𝐿 𝑝𝑎𝑔𝑒𝑓𝑎𝑢𝑙𝑡 The page fault rate is dependent on the memory required by the application (Mreq) and the available physical memory capacity(M): 𝑃 𝑝𝑎𝑔𝑒𝑓𝑎𝑢𝑙𝑡 =( 𝑀 𝑟𝑒𝑞 −𝑀)/ 𝑀 𝑟𝑒𝑞 Memory allocation information can be collected from code skeletons and other parameters can be defined in hardware models.
Predicting Scalability of In-Situ Analytics (Smart) Predicting Scalability of Smart (conducted on the TACC Stampede cluster using 4 nodes )
Predicting Scalability of In-Situ Analytics (Smart) For applications which have better scalability, such as histogram and moving average, both the prediction framework and analytical model perform accurate prediction. For K-means in which there are more synchronization overheads, the extended SKOPE framework outperforms the analytic model approach
Predicting Performance of In-Situ Analytics (Smart) over Large Dataset Involving Page Faults Comparison of Predicted Performance between Original SKOPE and Extended SKOPE (conducted on one node of OSU RI cluster using 4 threads, memory capacity = 12 GB)
Proposed Work Predicting performance for stencil computation Abstracting data access pattern of stencil computation Modeling cache performance Modeling performance for optimization approaches such as tiling