Download presentation
Presentation is loading. Please wait.
Published byYulia Atmadja Modified over 6 years ago
1
Adopting OpenCAPI for High Bandwidth Database Accelerators
Authors: Jian Fang1, Yvo T.B. Mulder1, Kangli Huang1, Yang Qiao1, Xianwei Zeng1, Jan Hidders2, Jinho Lee3, H. Peter Hofstee1,3 SC’17, Denver, USA Speaker: Jian Fang November 17th, 2017
2
Netezza Data Appliance Architecture
Source: The Netezza data appliance architecture: A platform for high performance data warehousing and analytics
3
S-Blade in Netezza Decompressed Data Compressed Data Filtered Data
Source: The Netezza data appliance architecture: A platform for high performance data warehousing and analytics
4
Netezza Data Appliance Architecture
There are also some commercial products shown the possibility of doing this like Kickfire and Netezza. In Netezza, FPGAs are placed in the data path to perform as a decompress-filter, increasing the disk bandwidth indirectly. The improvement comes from solving the disk bandwidth limitatoin. Bottleneck Source: The Netezza data appliance architecture: A platform for high performance data warehousing and analytics
5
DB with FPGAs: What is new?
Databases move from Disk to Memory Databases move from Disk to Flash FPGAs still help? √ Faster Data Movement Neo4j Touts 10x Performance Boost of Graphs on IBM Power FPGAs FADS: Failed Aspiration on Database System Source: neo4j-touts-10x-performance-boost-of-graphs-on-ibm-power-fpgas/
6
OpenCAPI Helps High Bandwidth Low Latency Shared Memory
100GB/s in total with 4 Channels OpenCAPI 3.0 High Bandwidth OpenCAPI brings FPGAs memory scale bandwidth OpenCAPI 3.0(x8) -> 25GB/s OpenCAPI 4.0(x32) -> 100GB/s Shared memory Address Translation Save extra copies Target on more than computation-intensive applications OpenCAPI 4.0 Low Latency 100GB/s in total with 1 Channel OpenCAPI 4.0 OpenCAPI brings FPGAs memory scale bandwidth Target more than computation-intensive applications Shared Memory Source:
7
Acceleration DBs with OpenCAPI
Decompress-Filter Hash-Join Merge-Sorter • • • Each has different buffering requirements, which are challenging at this speed. Requirements vary over having to hide latency versus the number of read ports.
8
Do we need compression & decompression?
Decompress-Filter Do we need compression & decompression? Parquet format Partitionable Supports GZIP, LZO, Snappy, ... Snappy (de)compress algorithm Based on LZ77, byte-oriented Low compress ratio, but fast (de)compress speed Computation-bound Highly data dependent Multiple engines to keep up the bandwidth Trade off between stronger but fewer engines and simpler but more engines (64KB history for each engine) Memory access pattern Sequential read for each stream (engine) Each has different buffering requirements, which are challenging at this speed. Requirements vary over having to hide latency versus the number of read ports.
9
Hash-Join Memory-bound Memory access pattern 40% Waste
Low locality of the data and multiple passes of data transfers The internal memory (BRAM) is too small to store the hash table Memory access pattern Sequentially read the relations Randomly write/read the hash table Granularity matters during random accesses Each has different buffering requirements, which are challenging at this speed. Requirements vary over having to hide latency versus the number of read ports. 40% Waste Require: 40B tuple Access : 64B cacheline Wasted: 24B Fang J, et al. Analyzing In-Memory Hash Joins: Granularity Matters, ADMS 2017.
10
Merge-Sorter Need strong sorter for the final pass
Memory access pattern Sequentially read within each stream, but randomly choose between streams Solutions Even-odd sorter to continuously produce multiple tuples per cycle Multi-stream bufferring to feed this beast Each has different buffering requirements, which are challenging at this speed. Requirements vary over having to hide latency versus the number of read ports. Q(N) 10,15,11,16,13,14,12,9 3,5,6,1,8,4,2,7 Q3 Q4 Odd Cycle OUTPUT 1,2,3,4,5,6,7,8 9,10,11,12,13,14,15,16 Even Cycle INPUT
11
Summary Databases have/need faster rate moving data
With OpenCAPI, FPGAs can help DBs more Challenges of high bandwidth acclerator design Three examples You can use OpenCAPI by join OpenPOWER community.
12
Vrije Universiteit Brussel
Authors Jian Fang TU Delft Xianwei Zeng Yvo T.B. Mulder Jan Hidders Vrije Universiteit Brussel Jinho Lee IBM Research Kangli Huang H. Peter Hofstee TU Delft & IBM Research Yang Qiao You can use OpenCAPI by join OpenPOWER community.
13
Thank You More Detail: Progress with Power Systems and CAPI
Leveraging the bandwidth of OpenCAPI with reconfigurable logic Contact Me:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.