A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu

Motivation  Today’s Database Management Systems (DBMS): software running on a standard operating system on a general purpose CPU  DBMS frequently used in analytics and scientific computing, but bottlenecked by:  Processor speed, software overhead, latency & bandwidth  Proposal: FPGA Based Relational Algebra Processor Host PC (DBMS) FPGA Relational Algebra Processor Physical Storage 2

Background |Relational Algebra (RA)  Many database queries are fundamentally decomposable to five basic RA operators  Although SQL is capable of much more Operator Functions SelectionFilter rows based on a Boolean condition ProjectionEliminate selected attributes (columns) of a table; remove duplicated results Cartesian Product Combine several tables with unique attributes UnionCombine several tables with the same attributes DifferenceSelect rows of several tables where the rows do not match Design dedicated processors on the FPGA for each operator 3

Project Goal  Design and implement an in-memory relational algebra processor on the FPGA  Explore the types of queries that can benefit from FPGA acceleration  Secondary: Outperform SQLite!  Some assumptions:  32-bit wide table entries  Tables fit in memory  Max number of columns is 32  Read only 4

Microarchitecture | Host Software 5 FPGA

Microarchitecture | Top-Level RAProcessor Host PC (C++ functions) RA Processor DRAM PCIe Host PC (DBMS) RA Processor Physical Storage 6

Microarchitecture | Row Marshaller 7  Exposes a simple interface for operators to access tables in DRAM  Address translation, burst aggregation, truncation & alignment  Multiplexes requests  Table values sent/received as 32-bit bursts

Microarchitecture | Selection 8  Filters rows based on predicates (e.g. age < 40)  16 predicate evaluators  Internally comparators  A tree of gates to qualify the predicates  Max: 4 ORs of 4 ANDs

Microarchitecture | Projection 9  Select columns of a table  Column mask one-hot encoded  Do not need to buffer row; operate directly on data bursts

Microarchitecture | Binary Operators 10  Cartesian Product, Union, Difference and Deduplication  Nested loop implementation

Microarchitecture| Inter-operator Bypassing 11  Operators enabled concurrently; data passed between operators  No intermediate storage  Conditions: 1. A singly link of unary operators 2. Each operator has a single target output 3. No structural hazard  Software reorders and schedules the RA commands  Data source/destination encoded in command

Microarchitecture| Inter-operator Bypassing 12  Multiple 32-bit wide output FIFOs to other operators

Implementation Evaluation 13  Timing  Maximum Frequency: 55.786MHz  Critical Path: Row Marshaller mux  Area  Slice Registers: 50%  LUTs: 85%  BRAM/FIFOs: 47% ModulesSlice RegistersLUTsBRAM/FIFOs TOTAL 34649 (50%)59328 (85%)71 (47%) Row Mashaller280466270 Controller4570627729 Selection3137196330 Projection7396540 Cartesian Product193514780 Union193919830 Difference187519490 Deduplication182219700

Performance Benchmark | Setup 14  SQLite  Internal SQLite timer to report execution time of the query  Thinkpad T430, Core i7-3520M @ 2.90Ghz, 1x8GB DDR3-1600  RA Processor  Performance counters: cycles from start to ack of an operator TableRelational Algebra QuerySQL Query 1 table 100k x 30 SELECT,starLong,tableOut, mass,>,80000,AND,pos_x,>,10, OR,pos_x,<,pos_z, OR,col12,>,col14, AND,col20,<,col21 SELECT * FROM starLong WHERE mass > 80000 AND pos_x > 10 OR pos_x < pos_z OR col12 > col14 AND col20 < col21; 1 table 100k x 30 PROJECT,starLong,tableOut, pos_x,col19,col25,col29 SELECT pos_x,col19, col25, col29 FROM starLong; 2 tables 1k x 30 UNION,starMed1,starMed2,starUnionSELECT * FROM starMed1 UNION SELECT * FROM starMed2; 2 tables 1k x 30 XPROD,starMed1,starMed2,starXprod RENAME,starXprod,0,iOrder0,1,mass0,8,phi0 SELECT,starXprod,starFiltered, iOrder0,=,iOrder, AND,phi0,>,1, AND,mass0,>,mass PROJECT,starFiltered,starOut,mass0 SELECT s1.mass FROM starMed1 s1, starMed2 s2 WHERE s1.vx > s2.vx AND s1.phi > 1 AND s1.mass > s2.mass;

Performance Benchmark | Results 15  Limitation: Memory Bandwidth: 200MB/s vs 12.8GB/s

Performance Benchmark | Results 16  Select operator most competitive with SQLite  What happens with more predicates?

Improvements 17  Increasing data burst width  32-bit to 256-bit: potential 8x speedup  Area/critical path increase  Maximizing memory bandwidth  Additional row buffers to buffer data from DDR2 Memory  Larger, faster DRAM; Higher clock speed

Conclusion & Future Work 18  Complex filtering operations performs well on the FPGA  Better than SQLite with sufficient memory bandwidth  Data intensive operators do not perform well  Future opportunities:  An accelerator alongside SQLite  Integration with HDD/SSD controller

A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

Similar presentations

Presentation on theme: "A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu.

Similar presentations

Presentation on theme: "A Relational Algebra Processor 6.375 Final Project Ming Liu, Shuotao Xu."— Presentation transcript:

Similar presentations

About project

Feedback