Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm.

Slides:



Advertisements
Similar presentations
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Advertisements

A HIGH-PERFORMANCE IPV6 LOOKUP ENGINE ON FPGA Author : Thilan Ganegedara, Viktor Prasanna Publisher : FPL 2013.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
Implementation methodology for Emerging Reconfigurable Systems With minimum optimization an appreciable speedup of 3x is achievable for this program with.
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
From Sequences of Dependent Instructions to Functions An Approach for Improving Performance without ILP or Speculation Ben Rudzyn.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Computes the partial dot products for only the diagonal and upper triangle of the input matrix. The vector computed by this architecture is added to the.
Paper Title Your Name CMSC 838 Presentation. CMSC 838T – Presentation Motivation u Problem paper is trying to solve  Characteristics of problem  … u.
Multithreaded ASC Kevin Schaffer and Robert A. Walker ASC Processor Group Computer Science Department Kent State University.
ASIC vs. FPGA – A Comparisson Hardware-Software Codesign Voin Legourski.
Programming with actors Jörn W. Janneck Xilinx Research Labs.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
Implementation of DSP Algorithm on SoC. Characterization presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompany engineer : Emilia Burlak.
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
GPGPU platforms GP - General Purpose computation using GPU
Study of AES Encryption/Decription Optimizations Nathan Windels.
Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Block Permutations in Boolean Space to Minimize TCAM for Packet Classification Authors: Rihua Wei, Yang Xu, H. Jonathan Chao Publisher: IEEE INFOCOM,2012.
CPU Cache Prefetching Timing Evaluations of Hardware Implementation Ravikiran Channagire & Ramandeep Buttar ECE7995 : Presentation.
1 Hardware Support for Collective Memory Transfers in Stencil Computations George Michelogiannakis, John Shalf Computer Architecture Laboratory Lawrence.
Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob.
Efficient FPGA Implementation of QR
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Hartman1P1004 Leo Hartman Canadian Space Agency A VHDL Implementation of an On-board Autonomy Solution.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
PERFORMANCE ANALYSIS cont. End-to-End Speedup  Execution time includes communication costs between FPGA and host machine  FPGA consistently outperforms.
Floating-Point Reuse in an FPGA Implementation of a Ray-Triangle Intersection Algorithm Craig Ulmer June 27, 2006 Sandia is a multiprogram.
2013/10/21 Yun-Chung Yang An Energy-Efficient Adaptive Hybrid Cache Jason Cong, Karthik Gururaj, Hui Huang, Chunyue Liu, Glenn Reinman, Yi Zou Computer.
FPGA Based String Matching for Network Processing Applications Janardhan Singaraju, John A. Chandy Presented by: Justin Riseborough Albert Tirtariyadi.
Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.
AMIN FARMAHININ-FARAHANI CHARLES TSEN KATHERINE COMPTON FPGA Implementation of a 64-bit BID-Based Decimal Floating Point Adder/Subtractor.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.
COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION 03/26/
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
Virtual Memory Virtual Memory is created to solve difficult memory management problems Data fragmentation in physical memory: Reuses blocks of memory.
Paper Review Presentation Paper Title: Hardware Assisted Two Dimensional Ultra Fast Placement Presented by: Mahdi Elghazali Course: Reconfigurable Computing.
1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
Algorithm and Programming Considerations for Embedded Reconfigurable Computers Russell Duren, Associate Professor Engineering And Computer Science Baylor.
Implementing and Optimizing a Direct Digital Frequency Synthesizer on FPGA Jung Seob LEE Xiangning YANG.
Jason Li Jeremy Fowers 1. Speedups and Energy Reductions From Mapping DSP Applications on an Embedded Reconfigurable System Michalis D. Galanis, Gregory.
FPGA Implementation of RC6 including key schedule Hunar Qadir Fouad Ramia.
08/10/ NRL Hybrid QR Factorization Algorithm for High Performance Computing Architectures Peter Vouras Naval Research Laboratory Radar Division Professor.
Mid presentation Part A Project Netanel Yamin & by: Shahar Zuta Moshe porian Advisor: Dual semester project November 2012.
Company LOGO Final presentation Spring 2008/9 Performed by: Alexander PavlovDavid Domb Supervisor: Mony Orbach GPS/INS Computing System.
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Custom Computing Machines for the Set Covering Problem Paper Written By: Christian Plessl and Marco Platzner Swiss Federal Institute of Technology, 2002.
1 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 23:
EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.
The Pentium Series CS 585: Computer Architecture Summer 2002 Tim Barto.
Matrix Multiplication in Hadoop
Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.
Buffering Techniques Greg Stitt ECE Department University of Florida.
Presenter: Darshika G. Perera Assistant Professor
Backprojection Project Update January 2002
Improved Resource Sharing for FPGA DSP Blocks
Hiba Tariq School of Engineering
Parallel Beam Back Projection: Implementation
Cache Memory Presentation I
FPGAs in AWS and First Use Cases, Kees Vissers
Sum of Absolute Differences Hardware Accelerator
Instructor: Dr. Phillip Jones
Alireza Hodjat IVGroup
Presentation transcript:

Bryan Lahartinger

“The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm utilize…methods...for the support and candidate generation operations” “This paper demonstrates an efficient structure for computing the support of a set of candidates.” “…though the combination of Content-Addessable-Memories (CAM)” “As far as we know, the Aprioiri algorithm has not been studied in any significant way for hardware implementation.” Objective Investigation

To exploit parallelism in hardware to accelerate a bottleneck in the Apriori algorithm with applications specifically to data mining. What is the Aprioiri algorithm? What is the bottleneck? How does hardware acceleration fit into the picture? Objective

Background Apriori Algorithm Apriori bottleneck Bitmapped CAM Implementing Bitmap CAM Analysis of the Approach Results of software comparisons Conclusions Paper Overview

Given transactions consisting of sets: {1,2,3,4}, {2,3,4}, {2,3}, {1,2,4}, {1,2,3,4}, and {2,4} Apriori ItemSupport ItemSupport {1,2}3 {1,3}2 {1,4}3 {2,3}4 {2,4}5 {3,4}3 ItemSupport {1,2,4}3 {2,3,4}3

Each candidate can be addressed to a row of bits Each column represents if a candidate is included in the CAM entry as a candidate Column bits can be summed to form the number of matching candidates Bitmapped CAM

Large LUT in memory Candidate 249 is frequently associated with candidates 1-11 but not 12… Implemented CAM Bitmap

They varied the number of CAM elements to candidates Max CAM blocks of Blocks fit most cases When they didn’t… Solution: Stop adding candidates to the block when full [why?] Analysis of the Approach

VHDL architecture req only 10 cycles per CAM stage (Xilinx 7.2 on Viritex II) Max clock rate 120MHz Used standard datasets Compared software from only 1 hardware platform Used half logic cells per candidates compared to USC FCCM05 (Half FPGA Area?) Results

CAM = awesome VS software = sucks Allows similarities between candidates to be utilized Their previous paper on systolic array architecture of Apriori Algo in hardware would work even better with this improvement An ideal architecture will be constructed/tested with both arch’s combined Conclusions

Pros Intro was unclear at first i.e. NOT about Apriori, but more general applications Reasonable explanation of Apriori and CAM Criticisms

Cons No VHDL implementation details – “highly pipelined”, that’s it…for real Software only tested on one hardware platform – 2.8Ghz Xeon 3Gb ram

Bad analysis of their methodology Hard to follow Unclear how to reproduce Unclear results  Questionable standard datesets 120Mhz??? 10 cycles/CAM stage?????