RAMSES@Ohio-State Year 2 Updates.

Year 2 Updates

Outline Cost Model Based Optimization of Distributed Joins
Under submission to SC 2016 Modeling In-Situ Analytics Using SKOPE Using an analytical model Paper in Preparation

DistriPlan - Context Processing (scientific) array data:
Geographically distributed Processed using structured operators Stored in file data stores (NetCDF/HDF5/..) What we needed Have an optimizer and an execution engine. Can execute queries remotely and accumulate results. What Complicates It Joins! Many practical operations come down to join over values/dimensions

Motivation Process on one node Process on two nodes From To 1 2 3 From
Total of 2 𝑛 Options !!! Too much for current optimizers!! …

DistriPlan Introducing DistriPlan
A Cost Based Optimizer (CBO) for distributed array data querying engine. Introduces two new features: Isomorphic plans pruning. Cost model for distributed data

DistriPlan (Isomorphism prunning)
SAME PLAN!! Will be counted as one plan, saving engine processing

DistriPlan (Cost Model)
Node’s Expected Results Node’s Evaluated Cost

Feasible problem scale
Performance Feasible problem scale

Performance Always reaches better performance (or same) as non CBO optimized plans

Performance Parameters for the CBO equations are essential for correct evaluation:

Summary Isomorphism pruning is essential
CBO improves performance tremendously So far only emulating wide-area networks Start Using Models Developed by Others?

Modeling In-Situ Analytics
Recap: Smart: A MapReduce-like Framework for Developing In-Situ Analytics Modeling problem: predict the time certain analytics will take Modelled a Typical Loop Using SKOPE Developed Analytical Model

Canonical Random Write Loop
A canonical random write loop has the following characteristics*: Reduction objects are updated by only associative and commutative operations; No other loop carried dependencies; The element(s) of the reduction object updated in a particular iteration cannot be determined statically or with inexpensive runtime preprocessing Structure of a typical canonical random write loop for data mining task: * Jin, Ruoming, and Gagan Agrawal. "Performance prediction for random write reductions: a case study in modeling shared memory programs.” ACM SIGMETRICS Performance Evaluation Review. Vol. 30. No. 1. ACM, 2002.

A Closed-Format Expression Model for Canonical Loop
Paramters Descriptions Selem size of input elements (GB) Sobj size of reduction objects (KB) Niter number of iterations Nnode number of nodes a, b, c, d, e, f factors determined by regression

Modeling Cache Performance
For a cache of two-level hierarchy, the memory access time can be modeled as: 𝑇 𝑚𝑒𝑚 = 𝑁 𝑚𝑒𝑚 ∗( 𝑡 𝑙1−ℎ𝑖𝑡 + 𝑃 𝑙1−𝑚𝑖𝑠𝑠 ∗ 𝑡 𝑙2−ℎ𝑖𝑡 + 𝑃 𝑙2−𝑚𝑖𝑠𝑠 ∗ 𝐿 𝑚𝑒𝑚 ) For applications used in our experiments, the cache miss rate is dependent on dataset size(D), data access stride(s), cache capacity(C1 and C2 for L1 and L2 respectively) and block size(B1 and B2 for L1 and L2 respectively)

Modeling Cache Performance
By defining parameters in the hardware model in SKOPE, and collecting memory access information such as read, write, access stride from code skeletons, we can utilize SKOPE to predict cache performance Cache Performance Model for a Two-Level Cache Hierarchy

Modeling Page Fault Penalty
When the dataset size exceeds a threshold(~10 GB), the execution time shows super-linear growth trend and the number of page faults increases markedly. Execution Time and Page Faults of In-Situ Analytics with Varying Output Dataset Size (per time step)

Modeling Page Fault Penalty
The page fault penalty can be modeled as: 𝑇 𝑝𝑎𝑔𝑒𝑓𝑎𝑢𝑙𝑡 = 𝑁 𝑚𝑒𝑚 ∗ 𝑃 𝑝𝑎𝑔𝑒𝑓𝑎𝑢𝑙𝑡 ∗ 𝐿 𝑝𝑎𝑔𝑒𝑓𝑎𝑢𝑙𝑡 The page fault rate is dependent on the memory required by the application (Mreq) and the available physical memory capacity(M): 𝑃 𝑝𝑎𝑔𝑒𝑓𝑎𝑢𝑙𝑡 =( 𝑀 𝑟𝑒𝑞 −𝑀)/ 𝑀 𝑟𝑒𝑞 Memory allocation information can be collected from code skeletons and other parameters can be defined in hardware models.

Predicting Scalability of In-Situ Analytics (Smart)
Predicting Scalability of Smart (conducted on the TACC Stampede cluster using 4 nodes )

Predicting Scalability of In-Situ Analytics (Smart)
For applications which have better scalability, such as histogram and moving average, both the prediction framework and analytical model perform accurate prediction. For K-means in which there are more synchronization overheads, the extended SKOPE framework outperforms the analytic model approach

Predicting Performance of In-Situ Analytics (Smart) over Large Dataset Involving Page Faults
Comparison of Predicted Performance between Original SKOPE and Extended SKOPE (conducted on one node of OSU RI cluster using 4 threads, memory capacity = 12 GB)

Proposed Work Predicting performance for stencil computation
Abstracting data access pattern of stencil computation Modeling cache performance Modeling performance for optimization approaches such as tiling

RAMSES@Ohio-State Year 2 Updates.

Similar presentations

Presentation on theme: "RAMSES@Ohio-State Year 2 Updates."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

RAMSES@Ohio-State Year 2 Updates.

Similar presentations

Presentation on theme: "RAMSES@Ohio-State Year 2 Updates."— Presentation transcript:

Similar presentations

About project

Feedback