PARALLEL TREE MANIPULATION Islam Atta. Sources Islam Atta, Hisham El-Shishiny. System and method for parallel processing. 20110016153 TX, US, 2010. Experimental.

PARALLEL TREE MANIPULATION Islam Atta

Sources Islam Atta, Hisham El-Shishiny. System and method for parallel processing. 20110016153 TX, US, 2010. Experimental evaluation was done as part of course work at UofT (ECE1749H, ECE1755H). Copyright © 2010-2012 Islam Atta 2

Trees…? Widely used Hierarchical data (financial data, NLP, machine vision, GIS, DNA, protein sequences…) Indexing/hashing (search engines…) Tree Manipulation context Full traversal of a tree (or sub-tree) for read or write accesses. E.g. CBIR, DNA sequence alignment Copyright © 2010-2012 Islam Atta 4

Problem Categorized as Non-Uniform Random Access structures Bad spatial locality Incurring high miss rates Worse for multiprocessing (Berkley, 2006) Requires High on-/off-chip bandwidth Copyright © 2010-2012 Islam Atta 5

Multiprocessing Platforms Non-shared memory architectures (Cell BE, Blue Gene, Intel SCC) Explicit message passing  Many small messages Shared memory architectures (Intel Quad-core) Coherent cache banks  cache blocks grabbed when referenced ABCDEFGHIJKLMNOPQRSTUVABCDEFGHIJKLMNOPQRSTUV ABCDEFGHIJKLMNOPQRSTUVABCDEFGHIJKLMNOPQRSTUV Optimal Solution Reallocate tree elements in memory to form contiguous memory regions Copyright © 2010-2012 Islam Atta 7

Non-shared memory architectures Explicit message passing  Few Large messages Shared memory architectures Minimal False sharing Spatial Locality ABCDEFGHIJKLMNOPQRSTUV ABCDEFGHIJKLMNOPQRSTUV Copyright © 2010-2012 Islam Atta 12

Scheduling Algorithm Designed for Cell BE and Blue Gene /L Message-passing (DMA, MPI, mailboxes) Challenges Unbalanced trees with varying computation complexity Limited local storage Larger data chunks, 128 byte aligned Algorithm properties Master-slave Dynamic scheduling of sub-workloads Work-stealing: coordinated by the master Double buffering Copyright © 2010-2012 Islam Atta 15

Methodology Application: Sequence Alignment problem DNA, RNA, protein, NLP, Financial data Implementation: pthreads on x86 Intel machines UG: Quad-core Kodos: 2-socket quad-core Kang: 4-socket dual-core Data Cache Simulation In-memory Trees Copyright © 2010-2012 Islam Atta 17

Memory Access Time Copyright © 2010-2012 Islam Atta 18 Naïve sequence alignment consists of only read/write operations. Random: Sub-linear increase up to 4 threads. Saturates after 4 threads Linear: Sequential - 2.7X gain Hit memory-wall

Other Experimental Results MetricRandomLinear Miss Rate (L2)14%1.6% Sequential/Parallel fractions Sequential is 10% with minor improvement for Linear Load balancingMaximum 4% deviation (no work-stealing required) Stalling on LocksNo difference Memory size ratio10.32 Copyright © 2010-2012 Islam Atta 23

Discussion Limitations: Only shared memory architectures Max tree size: 4G Bytes, 47M nodes Compression First-child references can be reduced to 1-bit per node. Use Delta distance instead of full address. Copyright © 2010-2012 Islam Atta 25

Next… Path #1: Implement and evaluate a commercial/scientific workload Developing a library/framework for parallel tree manipulation Path #2: Algorithm evaluation for non-shared memory architectures E.g. Blue Gene, Intel SCC Both Copyright © 2010-2012 Islam Atta 27

Conclusion Tree manipulation using typical data representation is not well suited for parallel processing. Propose and evaluate a technique for parallel tree manipulation Performance gain for sequential and parallel processing Saves memory and bandwidth Scalable For our experiments, on-chip communication with fewer cores is superior to off-chip communication. Copyright © 2010-2012 Islam Atta 28

QUESTIONS Fact: 42,270 runs were executed in the experimentation using 91 TBs of data. Thank You. Please send me your comments, iatta@eecg.toronto.eduiatta@eecg.toronto.edu

PARALLEL TREE MANIPULATION Islam Atta. Sources Islam Atta, Hisham El-Shishiny. System and method for parallel processing. 20110016153 TX, US, 2010. Experimental.

Similar presentations

Presentation on theme: "PARALLEL TREE MANIPULATION Islam Atta. Sources Islam Atta, Hisham El-Shishiny. System and method for parallel processing. 20110016153 TX, US, 2010. Experimental."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

PARALLEL TREE MANIPULATION Islam Atta. Sources Islam Atta, Hisham El-Shishiny. System and method for parallel processing. 20110016153 TX, US, 2010. Experimental.

Similar presentations

Presentation on theme: "PARALLEL TREE MANIPULATION Islam Atta. Sources Islam Atta, Hisham El-Shishiny. System and method for parallel processing. 20110016153 TX, US, 2010. Experimental."— Presentation transcript:

Similar presentations

About project

Feedback