Download presentation
Presentation is loading. Please wait.
1
Fast Data Analytics with FPGAs
Louis Woods supervised by Gustavo Alonso Systems Group, Department of Computer Science, ETH Zurich 11 April 2011
2
Fast Data Analytics with FPGAs
FPGAs in the Data Path Field-Programmable Gate Array (FPGA) Benefits of FPGAs Affordable custom hardware solutions Massive parallelism Multi-gigabit I/O transceivers Low power consumption / security FPGAs in the data path Process data close to source → directly from network → directly from storage device Outline of talk Complex event processing at wire speed with FPGAs FPGAs for dynamic (XML) query workloads Reconfigurable digital logic devices → soft hardware Louis Woods Fast Data Analytics with FPGAs
3
High-Speed Complex Event Detection
What is the problem? Detect complex events on high-rate data streams Example: financial services industry Risk analysis Algorithmic trading FPGA programming is difficult Use query-to-circuit compiler Supported queries Based on Match-Recognize clause [F. Zemke et al., Pattern Matching in Sequences of Rows, 2007] Specify complex events with regular expression operators More info → Complex Event Detection at Wire Speed with FPGAs (VLDB’10) µs = PATTERN A B* C DEFINE A AS (tuple.price = 20) B AS (tuple.price > 20) C AS (tuple.price < 20) Query VHDL/Verilog Louis Woods Fast Data Analytics with FPGAs
4
ICDE’11 Demo Real-Time Pattern Matching with FPGAs Interactive Demo
We demonstrate an FPGA-based complex event processor Demo 1A/1B, 11-12:30 Interactive Demo Click stream monitoring application Multi-step web form Click Pattern Queries F1 F2 F3 C T1 T2 T3 FPGA Switch Serial Link Web Server Client A Client B Client C PARTITION BY tuple.src-ip PATTERN direct-buy (F1 F2? F3 C T) PATTERN indirect-buy (F1 F2? F3 C ([F1-F3]+ C)+ T) PATTERN aborted-buy ([F1-C]+ O) DEFINE F1 AS (tuple.payload = /Regex/) … e.g. (GET|POST) \/form1\.html Louis Woods Fast Data Analytics with FPGAs
5
Architecture of CEP Engine
Network Headers Ethernet frames FPGA Hard Ethernet MAC Core Network Packet Decoder Source IP Address Payload Stream Partitioner Predicate Decoder PARTITION BY ... PATTERN A B* C DEFINE A AS (...) B AS (...) C AS (...) PARTITION BY ... PATTERN A B* C DEFINE A AS (...) B AS (...) C AS (...) PARTITION BY ... PATTERN A B* C DEFINE A AS (...) B AS (...) C AS (...) PARTITION BY ... PATTERN A B* C DEFINE A AS (...) B AS (...) C AS (...) Complex Event Detection Engine Sub-stream state vector Basic events vector Compiler-generated circuits Notify Louis Woods Fast Data Analytics with FPGAs
6
Fast Data Analytics with FPGAs
Evaluation We saturated a gigabit Ethernet link with UDP packets (VLDB’2010) FPGA fully sustains gigabit traffic We avoid the network-memory-CPU bottleneck by inserting the complex event processor directly in the data path tuples/second processed tuples/packet 90 tuples per packet Tuple = 16 bytes - Ethernet frame = 1’486 bytes - Bandwidth ≈ 1 Gbit/s - 1 tuple per packet - Ethernet frame = 64 bytes - Bandwidth ≈ 1 Gbit/s Louis Woods Fast Data Analytics with FPGAs
7
FPGAs for Dynamic Query Workloads
Previous example → Queries compiled off-line before loaded into FPGA Converting HDL to circuits is time-consuming (minutes to hours) Hybrid CPU/FPGA settings with frequently changing query workloads may call for a more dynamic solution Example → an FPGA implementation for XML projection Idea : FPGA filters XML stream to reduce load on backend (CPU-based) XQuery processor Use the best of both worlds Queries VHDL/ Verilog Synthesis Map Place & Route Bitstream FPGA Network Commodity System (CPU-based) unfiltered filtered FPGA data stream efficient flexible Louis Woods Fast Data Analytics with FPGAs
8
Fast Data Analytics with FPGAs
XML Projection What is XML Projection? [A. Marian and J. Siméon, Projecting XML Documents, VLDB’03] Idea : prune irrelevant XML elements before query processing XML document / stream XQuery expression Complex queries → flexible CPU Projection paths Simple projection paths → efficient FPGA for $i in //things//person return <person> { $i/name } <num-categories> { count($i/incategory) } </num-categories> </person> <things> <person> <name>John Doe</name> <incategory>x</incategory> <incategory>y</incategory> </person> <animal>Cat</animal> <animal>Dog</animal> <animal>Fish</animal> <fruit>Orange</fruit> <fruit>Apple</fruit> <fruit>Banana</fruit> </things> <things> <person> <name>John Doe</name> <incategory>x</incategory> <incategory>y</incategory> </person> <animal>Cat</animal> <animal>Dog</animal> <animal>Fish</animal> <fruit>Orange</fruit> <fruit>Apple</fruit> <fruit>Banana</fruit> </things> (1) //things//person (2) //things//person/name # (3) //things//person/incategory Louis Woods Fast Data Analytics with FPGAs
9
XPath Matching in Hardware
Non-deterministic finite automata (NFA) are efficient on FPGAs Example → //a/b/c//d What about closing tags and recursion, e.g., <a><b><a>…</a></b></a>? Maintain a stack to keep track of open tags In hardware XPath expression = sequence of runtime-(re)configurable segment matchers A segment matcher corresponds to a node test + a navigation axis * <d> q4 <c> q3 <b> q2 <a> q1 q0 root()/desc:: a/child:: b/child:: c/desc:: d * s Louis Woods Fast Data Analytics with FPGAs
10
Runtime-(re)configuration
Node test Use fast on-chip memory to store node test information, e.g., tag names, in segment matcher Parser (FSM) Annotates raw XML input (SAX-like) Node test circuit → matches character by character of opening tags as they stream by Navigation axis Child axis Descendant axis Combined & configurable Node test Segment Matcher i i-1 Axis < foo > / τ1 τ2 τ3 τ4 τ1 : TagStart τ3 : OpeningTagEnd τ2 : TagNameChar τ4 : ClosingTagSlash Louis Woods Fast Data Analytics with FPGAs
11
Multiple XPath Expressions
Local match signal for each state in automaton Global match signal that combines local matches Various XPath expression of different length feasible Design suits internal architecture of FPGAs well XML Parser 1 s s Serializer Louis Woods Fast Data Analytics with FPGAs
12
Layout Algorithms in 2D Space
FPGAs – Under the hood IOB: Input/output block CLB: Configurable logic block Interconnect network How do we layout algorithms in 2-dimensional FPGA space? Long signal paths across entire chip will degrade performance Efficient designs have short communication paths Slow clock Louis Woods Fast Data Analytics with FPGAs
13
Fast Data Analytics with FPGAs
Evaluation Scalability Clock frequency does not degrade even with high chip utilization Application Speedup (XMark) XML parsing cost is reduces significantly 100% 20 40 Clock frequency [MHz] Number of segment matchers Louis Woods Fast Data Analytics with FPGAs
14
Fast Data Analytics with FPGAs
Summary FPGAs are great for in-network processing to overcome the network-memory-CPU bottleneck Truly runtime-(re)configurable designs are feasible FPGAs impose very different design constraints than CPU-based solutions and call for new creative approaches More Info The work presented here is part of the Avalanche project: research/projects/avalanche Visit our Demo: Real-Time Pattern Matching with FPGAs (Demo 1A/1B, 11-12:30) Thank you! Louis Woods Fast Data Analytics with FPGAs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.