Presentation is loading. Please wait.

Presentation is loading. Please wait.

MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.

Similar presentations


Presentation on theme: "MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction."— Presentation transcript:

1 MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction

2 Matrix Multiplication Fundamental kernel algorithm used by many applications Examples: Graph Theory, Physics, Electronics

3 Matrix Multiply Approaches Programming Mdoel AlgorithmCustomized Libraries User Implementation SequentialNaïve approach, tiles matrix multiply, Blas_dgemm Vendor supplied package (ie, Intel, AMD Blas), ATLAS Fortran, C, C++, C#, Java Shared memory parallelism Row PartitionATLASMulti Threads, TPL, PLINQ, OpenMP Distributed memory parallelism Row Column Partition, Fox Algorithm ScalePackOpenMPI, Twister, Dryad, Hadoop

4 Single Node Test Bare MetalVM Node Java1 node 8 coresc1.xlarge C1 node 8 coresc1.xlarge Bare Metal in Future Grid: Cache = 8*8MB = 64 MB Working memory of matrix multiply for 8 threads, 1500x1500 matrix: WM = 3*1500*1500 * 8 = 54 MB VM Node: ClassSlotsCoreMemoryDisk c1.xlarge5082000020

5 Result of Total Time C Jav a Bare MetalVM node

6 Bare Metal Parallel Efficiency CJava Speed up

7 VM Node Speed up Parallel Efficiency CJava

8 8 DryadLINQ Subquer y Query Application Program Legacy Code Pleasingly Parallel Programing Patterns User defined function Sample applications: 1.SAT problem 2.Parameter sweep 3.Blast, SW-G bio Issue: Scheduling resources in the granularity of node rather than core lead to relative low system utilization

9 Hybrid Parallel Programming Pattern 9 Query DryadLINQ PLINQ subquery User defined function User defined function TPL Sample applications: 1.Matrix Multiplication 2.GTM and MDS Solve previous issue by using PLINQ, TPL, Thread Pool technologies

10 Implementation and Performance TEMPEST TEMPEST-CNXX CPUIntel E7450 Cores24 Memory24.0 GB50.0 GB Memory/Core1 GB2 GB STORM STORM- CN01,CN02, CN03 STORM- CN04,CN05 STORM- CN06,CN07 CPUAMD 2356AMD 8356Intel E7450 Cores81624 Memory16 GB 48 GB Memory/Core2 GB1 GB2 GB Hardware configuration 1.We use DryadLINQ CTP version released in December 2010 2.Windows HPC R2 SP2 3..NET 4.0, Visual Studio 2010

11 Matrix multiplication performance results 1core on 16 nodes V.S 24 cores on 16 nodes

12 Matrix Multiply with Different Runtimes Implemented with different runtimes 1.Dryad 2.MPI 3.Twister 4.Hadoop Implemented Fox algorithm Run on a mesh of 4x4 nodes in both Windows and HPC environments.

13 DryadLINQ Client machine (11) Distributed query plan.NET program Query Expr HPC Cluster Output Tables Results Input Tables Invoke Query Output DryadTable Dryad Execution.Net Objects JM ToTable foreach Vertex code Dryad and DryadLINQ Isard, Michael., Mihai. Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly. (2007). Dryad: distributed data-parallel programs from sequential building blocks. Yu, Yuan., Michael. Isard, Dennis Fetterly, Mihai Budiu. (2008). DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. Hui Li, Yang Ruan, Yuduo Zhou, Judy Qiu, Geoffrey Fox. (2011). Design Patterns for Scientific Application6s in DryadLINQ CTP. DataCloud-SC11.

14 Key FeaturesDryad CTPHadoop 0.20.0Comments Programming Interface 1 Execution model DAG of data flowing between operations [1,2,3] Map, Shuffle, Merge, Reduce stages [16] DAG is more flexible than MapReduce to express data flow processing 3 Programming Interface 1)Based on LINQ [2,3] model, add interface extension for Dryad 2)Able to use relational operator defined in LINQ 1)Map and Reduce class [16] 2)Not natively support relational operations that have multi- heterogeneous input data sets There is no public document about Dryad raw API 4 Higher Level Programming Language 1)DryadLINQ allow developers use standard query operations defined within LINQ such as Select, Join, and GroupBy. 2)Evaluations of queries are converted into DAG. 1)Pig allows Hadoop developers to utilize relational queries as well, but it’s found to be not efficient [9]. 2)YSmart is another SQL- to_MapReduce translator which outperforms Pig. DryadLINQ outperform Pig when processing relational datasets. Performance Issues 7 Data movement; communication DryadLINQ provide three channel protocols: File (the default), TCP Pipe, Shared-memory FIFO [1] Note: RDMA is available in Windows8 Hadoop uses HTTP to transfer data between Map tasks and Reduce tasks during shuffling [15]. Dryad provider better data transferring approaches than Hadoop 9 Pipelining between Jobs. (iterative MapReduce) Chain the execution of multiple queries by using late evaluation technology, TCP pipe, shared memory FIFO [2, 3]. Hadoop cannot pipeline the execution of jobs as it needs materialize output of MapReduce jobs into disk (HDFS) when job is done [6, 7]. In Dryad, the pipelining can be broken when it explicitly evaluate the queries or materialize output results to disk.

15 Backup slides

16 Dryad Job Execution Flow

17 Performance for Multithreaded MM Test done on one node of Tempest, 24 cores


Download ppt "MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction."

Similar presentations


Ads by Google