Presentation is loading. Please wait.

Presentation is loading. Please wait.

Power-efficient range-match-based packet classification on FPGA

Similar presentations


Presentation on theme: "Power-efficient range-match-based packet classification on FPGA"— Presentation transcript:

1 Power-efficient range-match-based packet classification on FPGA
2019/5/23 Power-efficient range-match-based packet classification on FPGA Authors : Yun R. Qu, Viktor K. Prasanna Publisher :Field Programmable Logic and Applications (FPL), th International Conference on Presenter : Kai-Hsun Li Date : 2015/11/25 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C. CSIE CIAL Lab 1

2 2019/5/23 Introduction Many classification engines are optimized for prefix and exact match, while a range-to-prefix translation can lead to rule set expansion. Under limited power budget, it is challenging to achieve high classification throughput. In this paper, we construct a modular Processing Element (PE); each PE compares a stride of the input packet header against a stride of a range boundary. Experimental results show that, for 4K 15-field rule sets, our prototype on a state-of-the-art FPGA can achieve 250 MPPS throughput. while a range-to-prefix translation can lead to rule set expansion => 例如TCAM 是high cost and power consumption 所以這篇提出了ㄧ個high-performance and power-efficient packet classification engine on FPGA 這個方法可以支援4K 15-field的rule set,並且可以達到250MPPS的throughput National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

3 Proposed Scheme(1/6) 這張圖是一個PE的內部構造(1個field會有幾個PE要根據stride的大小來決定)
2019/5/23 Proposed Scheme(1/6) 這張圖是一個PE的內部構造(1個field會有幾個PE要根據stride的大小來決定) s是stride的長度、c是endpoint的數量,c=2所以有2組 x是s-bits stride of input packet header、y是end-point stored in memory National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

4 Proposed Scheme(2/6) 2019/5/23 CSIE CIAL Lab
National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

5 Proposed Scheme (3/6) - Example 1
2019/5/23 Proposed Scheme (3/6) - Example 1 Suppose s=2, c=3, End point : 2(000010) 4(000100) 6(000110) Input = 5(000101) National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

6 Proposed Scheme (3/6) - Example
2019/5/23 Proposed Scheme (3/6) - Example input = (5) 00 01 01 PE0 00 eql0_0 less0_0 en0_0 1 eql1_0 less1_0 en1_0 eql2_0 less2_0 en2_0 PE0 00 eql0_1 less0_1 en0_1 01 eql1_1 less1_1 en1_1 eql2_1 less2_1 en2_1 PE0 10 eql0_2 less0_2 en0_2 00 eql1_2 less1_2 en1_2 eql2_2 less2_2 en2_2 1 1 1 1 1 1 1 1 1 National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

7 Proposed Scheme (4/6) - Example 2
2019/5/23 Proposed Scheme (4/6) - Example 2 Suppose s=2, c=2, End point : 3(0101) 12(1100) Input = 5(0101) National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

8 Proposed Scheme (4/6) - Example2
2019/5/23 Proposed Scheme (4/6) - Example2 input = 01 01(5) 01 01 PE0 00 eql0_0 less0_0 en0_0 1 11 eql1_0 less1_0 en1_0 PE0 11 eql0_0 less0_0 en0_0 00 eql1_0 less1_0 en1_0 這個例子是照一般stride的切法,會有問題的例子 但這邊,會延續前面的結果,所以不會導致不match的情況發生 1 1 1 National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

9 Proposed Scheme (5/6) Fig. 2: Interconnections between modular PEs
2019/5/23 Proposed Scheme (5/6) Fig. 2: Interconnections between modular PEs Type I : for the same field, both comparing upperbounds / lowerbounds Type II : for the same field, one comparing upperbound but the other comparing lowerbound Type III : for different fields National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

10 Proposed Scheme (6/6) c是一個PE裡有幾個end point,所以垂直的PE數量是 𝑁/𝑐
2019/5/23 Proposed Scheme (6/6) c是一個PE裡有幾個end point,所以垂直的PE數量是 𝑁/𝑐 水平的PE數量是2* 𝑊𝑚/𝑠 Fig. 3: Systolic array consisting of 2 rows and 6 columns, and matching M = 2 fields. PE[i , j] denotes the modular PE in the i-th row and j-th column National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

11 Power Optimizations(1/6) - Motivation
2019/5/23 Power Optimizations(1/6) - Motivation As can be seen in Figure 1, each enable signal is used to enable the corresponding data memory module in the next PE. The intuitions of using the enable signals are: If any field of the input packet header does not match a rule, then the packet is considered as not matching this rule. If a packet header has been identified as not matching the rule in one field, then the PEs in other fields can clock-gate their data memory modules to save power. signals 1.如果input packet有某個 field沒有match,代表此packet 不會match此rule 2.在說明如果有上述情況發生,那麼就不用再繼續比對剩下的部分,希望藉此來節省power National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

12 Power Optimizations(2/6) - Self-enabled Power Gating
2019/5/23 Power Optimizations(2/6) - Self-enabled Power Gating This technique has two properties: Chaining : If one data memory module is deactivated, a chain of data memory modules in the following PEs will be deactivated, saving a great amount of power. Fine-grained : Since the rules are independent, the activation / deactivation for different data memory modules is also independent. Chaining是前面的memory modules是否active會影響後面的memory modules Fine-grained是不同rule的memory modules是獨立的 National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

13 Proposed Scheme(3/6) - Example
2019/5/23 Proposed Scheme(3/6) - Example input = (5) 00 01 01 PE0 00 eql0_0 less0_0 en0_0 1 eql1_0 less1_0 en1_0 eql2_0 less2_0 en2_0 PE0 00 eql0_1 less0_1 en0_1 eql1_1 less1_1 en1_1 eql2_1 less2_1 en2_1 PE0 00 eql0_2 less0_2 en0_2 eql1_2 less1_2 en1_2 eql2_2 less2_2 en2_2 1 X X 1 1 1 1 1 1 1 1 National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

14 Power Optimizations(4/6) - Entropy-based Scheduling
2019/5/23 Power Optimizations(4/6) - Entropy-based Scheduling To make the self-enabled power gating more effective, for a given packet header and a given rule, we need to report the mismatch, if any, as early as possible. Consider the following two cases, as shown in this figure: Case (1) : The input packet header matches the rule in field m, but does not match this rule in field m’. The mismatch is identified in the higher order stride of field m’. This is quite late; the 3 data memory modules in the first 3 PEs are activated. Case (2) : The input packet header does not match the rule in field m’; the mismatch is identified in the higher order stride of field m’. This is early enough to have 3 data memory modules deactivated for this input packet header. 希望剛剛提到的self-enabled power gating 可以更有效率的使用 National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

15 Power Optimizations(5/6) - Entropy-based Scheduling
2019/5/23 Power Optimizations(5/6) - Entropy-based Scheduling Case 1指上面的部份,最後一個PE到很後面才知道data memory modules不用啟動,代表著在前一個PE中並沒有match,但是太晚才知道了 Case 2是希望可以將這種情況移到前面,這樣可以提早之到何時可以讓data memory modules deactivated,如此可節省power。 National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

16 Power Optimizations(6/6) - Entropy-based Scheduling
2019/5/23 Power Optimizations(6/6) - Entropy-based Scheduling 但剛剛的想法實做起來太複雜,所以他改用計算entropy的方式來達到類似的效果 2 3 1 2 1 1 1 2 3 4 National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

17 Power Optimizations(6/6) - Entropy-based Scheduling
2019/5/23 Power Optimizations(6/6) - Entropy-based Scheduling Step 1 Calculate H(m), ∀m using following equation. Step 2 Schedule all the fields in descending order of H(m);i.e., the fields with higher H(m) are always scheduled (early) in the front of the horizontal pipelines. Entropy高: 1.種類相同,數量不同 => 分佈平均 2.種類多,數量相同 的情況 Entropy低: 1.種類同,數量不同 =>分佈不平均 National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

18 EXPERIMENTAL RESULTS(1/5)
2019/5/23 EXPERIMENTAL RESULTS(1/5) Environment FPGA Xilinx Virtex 7 Logic Slice 218800 I/O pins 1100 BRAM 68MB Development Tool Xilinx Vivado National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

19 EXPERIMENTAL RESULTS(2/5)
2019/5/23 EXPERIMENTAL RESULTS(2/5) 這邊是不同的stride跟ㄧ個PE內有幾個endpoint的數據決定s=4, c=64 National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

20 EXPERIMENTAL RESULTS(3/5)
2019/5/23 EXPERIMENTAL RESULTS(3/5) 這邊是不同rule數的throughput的比較 Fig. 6: Throughput with respect to the length of the packet header ( W m ) and the number of rules (N), s = 4, c = 64 National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

21 EXPERIMENTAL RESULTS(3/5)
2019/5/23 EXPERIMENTAL RESULTS(3/5) 這邊是不同rule數的logic utilization的比較 Fig. 6: Logic slice utilization with respect to the length of the packet header ( W m ) and the number of rules (N), s = 4, c = 64 National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

22 EXPERIMENTAL RESULTS(4/5)
2019/5/23 EXPERIMENTAL RESULTS(4/5) 紅色的是沒有任何 power optimization technique. 粉紅色的是self-enabled power gating, but randomly scheduling of packet header fields. 紫色的是self-enabled power gating along with entropy based scheduling of packet header fields. test指trace的rule。 Fig. 7: Power consumption of 100 tests (each test processing 1K P packet headers) for N = 128 , 256 , 512 , 1024 (i)M = 5, W m = 104 National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

23 EXPERIMENTAL RESULTS(4/5)
2019/5/23 EXPERIMENTAL RESULTS(4/5) Fig. 7: Power consumption of 100 tests (each test processing 1K P packet headers) for N = 128 , 256 , 512 , 1024 (ii)M = 15, W m = 356 National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

24 EXPERIMENTAL RESULTS(5/5)
2019/5/23 EXPERIMENTAL RESULTS(5/5) National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab


Download ppt "Power-efficient range-match-based packet classification on FPGA"

Similar presentations


Ads by Google