Download presentation
Presentation is loading. Please wait.
Published byCharles Tate Modified over 9 years ago
1
GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering The Ohio State University ICPP 2015, Beijing China
2
Motivation The sequencing costs are decreasing Available data is increasing! ICPP'152 *Adapted from www.genome.gov/sequencingcosts *Adapted from www.nlm.nih.gov/about/2015CJ.html Parallel processing is inevitable!
3
Besides sequencing technologies, computational technologies are also developing fast New technology trend: Many cores but limited memory per core A prominent example: Intel Xeon Phi (MIC) architectures – 61 cores and 16 GB Memory in 7200 series – Many advantages SIMD vector operation Compatibility with CPUs Challenges: – Load balancing – Memory over consumption or disk trashing – I/O contention New Trends in Computational Technologies ICPP'153
4
Proposed Middleware System We propose GEM for developing shared-memory parallel genomic applications with memory- constrained many-core architectures Runs with MIC but not designed specifically for MIC – Doesn’t utilize MIC’s specific features (512-bit SIMD instruction set) Supports two execution models similar to other middleware systems for genomic data processing (GATK and PAGE) ICPP'154
5
Inter-dependent Processing of Genomes ICPP'155
6
Independent Processing of Genomes ICPP'156
7
Load Tasks Reads data chunks from the disk Generates Genome Matrix data structure – Locus-based Genome Matrix – Sequence-based Genome Matrix ICPP'157
8
Enhancements on Load Tasks In order to decrease memory requirement – Selective Loading There are 11 data fields in SAM format and we don’t need all of them for many types of analyses. We ask user to define the data fields needed for processing. We keep only those needed data fields. – Compact Storage We modify the Samtools libraries and decrease the bits needed to define certain fields For example, practically, number of alignments for a particular locus doesn’t exceed 2^16 −1 in a single genomic data, thus we use 16 bits instead of an integer for that data field In order to decrease overhead of load tasks – Time consuming to find a specific region in genomic data Too many load tasks will increase the overhead Few number of load tasks can damage the load balance – Subchunking: Each load task fills multiple GMs ICPP'158
9
Map & Reduce Tasks Map Tasks – Defined by the user – Takes a genome matrix as input Intermediate Results – User can define combine function to reduce memory consumption (optional) – User can choose where to keep them. In the memory or in the disk Reduce Tasks – Defined by the user – Takes a list of intermediate results – Intermediate results should be removed from the memory by the user ICPP'159
10
Scheduling Scheme Load tasks increase memory consumption by loading data into memory Map tasks decrease memory consumption by removing genome matrices from the memory If we assign load tasks to all cores, I/O contention increases and memory can be over-consumed. Our goal is to schedule the tasks such that – Memory is not over-consumed – Concurrently running map and load tasks are balanced – Load balance is maintained We use the following thresholds – Maximum number of concurrently running load tasks – Maximum number of genome matrices in the memory ICPP'1510
11
ICPP'1511 Genome Matrix Container 1 1 2 2 3 3 4 4 5 5 6 6 Master W1 W2 W3 IDLE Load Map Max. Number of Load Tasks: 2 Max. Number of GMs: 6 Max. Number of Elements in GMs: 100 R1, 1 R1 R2, 2 R2 Temp. Map Empty Loading Available Being Processed If there is no empty GM, a load task can temporarily process a GM When a load task loads all data in a region, a map task is assigned if there is no available GM
12
Sample Implementation with GEM ICPP'1512 Parameters defined by the user Execution model: Independent Selective Loading: Base sequences Where to keep intermediate results: In memory
13
Experiments ICPP'1513 Architecture: Xeon Phi SE10P architecture Number of cores: 61 Processor Speed: 1.1 GHz Memory: 8GB Applications SNP Calling: A very similar version of VarScan’s algorithm. Locus-based Statistical Analysis (LSA): A simplified version of DepthOfCoverage tool of GATK. Statistical Analysis Per Genome (SAG): Performs various statistical analysis (such as finding the number of sequences in the given list of genomic regions, the number of each nucleotide base ) for each genome separately. Parameter Configuration Based on executions performed on a training set Maximum Number of Load Tasks: 40 (but decreased to 20 when input size is 20 files due to I/O contention) Inter-dependent processing (SNP Calling and LSA) Region Length: 12M Maximum size of genome matrices: 2400 Independent Processing (SAG) Region Length: 64M Maximum size of genome matrices: 800
14
Parallel Scalability ICPP'1514 GEM’s Scalability: 14.4x GEM’s Scalability: 15.4x Speedup Over Basic Method GEM’s Scalability: 12.4x
15
Comparison with Other Middleware Systems ICPP'1515 Architecture: CPU with 8 cores and 12 GB memory Applications: two tools of GATK, which are Countbase and CountLoci Execution Time of GATK, PAGE and GEM with Varying Data Sizes
16
Summary We developed a middleware system for developing parallel genomic applications with memory-constrained many-core architectures. – Decreases memory requirements of tasks – Prevents over-consumption of the memory – Decreases I/O contention Good scalability results GEM outperforms GATK and PAGE. ICPP'1516
17
Thank you! ICPP'1517
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.