LZRW3 Data Compression Core Project part B final presentation Shahar Zuta Netanel Yamin Advisor: Moshe porian December 2013
Contents Project goals & overview Algorithm review Architectures GUI Implementation Test Plan Methods Project Movie
Project Overview Why do we need an hardware data compressor? Reduce storage capacity Reduce power consumption Reduce amount of data to handle with. Reduce communication co$ts Speed
COMPRESSION & SYSTEMS EFFICIENCY Reduce traffic packets Reduce comm. resources Reduce expenses Better traffic reliability STORAGE LZRW3 DECOMPRESSOR LZRW3 COMPRESSOR LZRW3 DECOMPRESSOR Less capacity Less storage discs Less Area Less Power More environmental Fast lossless compression Full data recovery Efficient System LAN/WAN NETWORK USER LZRW3 COMPRESSOR
Project Goals Implementation of LZRW3 data compression algorithm High performance- data transfer of 1Gbps Adapted to data templates of 2Kbyte to 32Kbyte Internal memory on FPGA only ( Virtex-5 ), no interface to external memory Implementing strong debugging capabilities via GUI
Lossless Data Compression The following algorithm is lossless, which guarantees that the original data could be reconstructed from the compressed file. Lossy compression could give better compression rates in exchange of data loss (JPEG, MP3) Known lossless application: ZIP, GZIP and other. Lossless data compressions are which mean the compressor and decompressor maintain a data structure to help them find repeated strings. dictionary coding,
mechanism HASH FUNCTION INDEX INPUT FILE: Offset Exp ression_compress_ion Exp Offset value= 0 XXX ZZZ YYY UUU demonstration UUU res 3 XXX Output Exp res L.I NOTE: The next 3 byte should be “x p r”, then “ p r e “ and only then “r e s”, we did’nt demonstrate all the actions for simplicity. “L.I“ stands for “ Literal Item “
mechanism HASH FUNCTION INDEX INPUT FILE: Expression_compress_ion Offset value= XXX ZZZ YYY UUU demonstration ZZZ s s i 9 _ _ o o YYY Exp res Output L.I sio L.I n_c L.I Offset c c n n
mechanism HASH FUNCTION INDEX INPUT FILE: Expression_compress_ion Offset value= XXX ZZZ YYY UUU demonstration om p Exp res Output L.I sio L.I n_c L.I omp L.I Offset
mechanism HASH FUNCTION INDEX INPUT FILE: Express_compress_io Offset value= XXX ZZZ YYY UUU re s XXX demonstration Exp res Output L.I sio L.I n_c L.I omp L.I 123 C.I XXX ionn Offset “C.I“ stands for “ Copy Item “
Input file analyze Output file analyze Input file made of repeating strings Output file made of header + groups File compression example
Top Architecture Rx PATH Tx PATH INPUT BLOCK memory LZRW3 COMPRESSOR CORE LZRW3 COMPRESSOR CORE COMPRESSED FILE memory GUI XILINX VIRTEX 5 ON XUVP505 BOARD UART
Top Architecture Rx PATH Tx PATH INPUT BLOCK memory LZRW3 COMPRESSOR CORE LZRW3 COMPRESSOR CORE COMPRESSED FILE memory GUI XILINX VIRTEX 5 ON XUVP505 BOARD UART
Simulations Frame type 1 Frame type 2 INPUT BLOCK save bytes from RX PATH Compression & transffering to output block OUTPUT BLOCK transffers data to TX PATH TX PATH Data Encapsulation & Transmittion Data out Frame type 0 OUTPUT BLOCK receive compressed file to FIFO RAW DATA: COMPRESSED DATA: Zoom on compressor
Simulations (Cont.) Compressor START, filling the pipeline “FLY” mode, All stages enabled EOF arrives, Last 18 bytes Sending HEADER Sending DATA Zoom on compressor Compressor DONE Pipeline CLEAR COMPRESSOR READY
Test Plan The tests include different series of data insertion which are supposed to bring the core to it’s extreme cases. Whenever a change was made in the design all the tests validity was reasserted.
Basic set examples Random input (Length, Num Of Vars) CASEOUTPUT LENGTH VAR.INPUT Reasonable compression23K32K10RANDOM Each variable repeated 18 times740732K10RANDOM Very high compression ratio ~ 90 % compression K1RANDOM Output = input32K 256RANDOM
Input buffer Test methods: Hash function Hash table comparator Output block Compression core TB checked TB checked
Test methods: Compression core Simulation and comparison against golden model Core periphery Compression core
Core periphery Simulation adapted to the full chip and comparison against golden model Verification environment GUI Simulating full testplan + debug
GUI Direct file path insertion interface Manually inserted text interface LZRW3. Verification environment
GUI Console Box Progress bar Manually inserted text interface LZRW3. Verification environment
GUI Random data generation characteristics Random data generation start button CLICK LZRW3. Verification environment
GUI Start analyzing button CLICK LZRW3. Verification environment
Work method Identifying bugs such as unreceived data or any difference from the GM using the verification environment. Tracing the problem using ChipScope or the simulation environment. Solving the issue and fixing the code. Asserting the solution validity in the simulation environment. Resynthesizing the solution and burn it to the FPGA. Verification using the GUI. Reasserting the validity of former tests. Progress.
Resource Utilization Plan Ahead synthesis result
Implementation Results Post -place & route report showed that user timing constraint is not met. Project goal was 125MHz, the achieved frequency of the full design is 88MHz. The critical path found in stage 4.
Input file memory banks Input file memory banks comparator Continue 1 0 clk Tentative Next address Tentative Next address clk counter offset TAG Comprison_valid Compare_success clk Offset_tag Tentative_tag clk Tentative_taken Compare_success_P Item_length_p Offset_valid Bank 0,1,2 addresses 0 1 Addresses alignment Older_byte_P Offset_valid TENT B C D B D C C
Input file memory banks Input file memory banks comparator Continue 1 0 clk Tentative Next address Tentative Next address clk counter offset TAG Comprison_valid Compare_success clk Offset_tag Tentative_tag clk Tentative_taken Compare_success_P Item_length_p Offset_valid Bank 0,1,2 addresses 0 1 Addresses alignment Older_byte_P Offset_valid 0 1
Performance improving actions Analyzing critical paths and rewriting their logic. Checking the extra effort flag (in ISE) and target to maximum speed. Trying to synthesize with third party synthesis tools (Precision, Synplify_pro). Swapping the RAM simple blocks implemented by VHDL code with blocks created by the Core generator (Xilinx tool).
What have we learned? Planning and Specifying a Project. Consider the specifications for micro/macro architecture RTL coding and “hardware thinking”. Utilizing softwares: Modelsim, ISE, PlanAhead, Synplify, ChipScope, CoreGenerator, Visual Studio (and learning C#)… Testing blocks and determining wanted results Synthesizing design and validating it using GUI Incisively explore the FPGA-computer communication Protocols: UART, Wishbone
Lab examinationAll we saw this summer
Add picture ofXUP5 BOARD XUP5 BOARD
Project Movie LZRW3 Project movie on YouTube