Processor Memory Networks Based on Steiner Systems Tom VanCourtBoston University Martin C. HerbordtECE Department.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

COMP375 Computer Architecture and Organization Senior Review.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
1 Optimizing compilers Managing Cache Bercovici Sivan.
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Parallell Processing Systems1 Chapter 4 Vector Processors.
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.
Hidden Markov Models Theory By Johan Walters (SR 2003)
A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems Authors: Seongwook Youn and Dennis McLeod Presenter:
Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data Tom VanCourtBoston University Yongfeng GuECE Department Martin Herbordt CAAD.
The Memory Hierarchy CPSC 321 Andreas Klappenecker.
1 Single Reference Frame Multiple Current Macroblocks Scheme for Multiple Reference IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY Tung-Chien.
1 Bio-Sequence Analysis with Cradle’s 3SoC™ Software Scalable System on Chip Xiandong Meng, Vipin Chaudhary Parallel and Distributed Computing Lab Wayne.
Fast Temporal State-Splitting for HMM Model Selection and Learning Sajid Siddiqi Geoffrey Gordon Andrew Moore.
Basic Computer Organization CH-4 Richard Gomez 6/14/01 Computer Science Quote: John Von Neumann If people do not believe that mathematics is simple, it.
Chapter 6 Memory and Programmable Logic Devices
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
Microprocessor-based systems Curse 7 Memory hierarchies.
Chapter One Introduction to Pipelined Processors.
PERFORMANCE ANALYSIS cont. End-to-End Speedup  Execution time includes communication costs between FPGA and host machine  FPGA consistently outperforms.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
Chapter 9 Memory Organization By Jack Chung. MEMORY? RAM?
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
MEMORY ORGANIZTION & ADDRESSING Presented by: Bshara Choufany.
COMPUTER ORGANISATION I HIGHER STILL Computing Computer Systems Higher Marr College Computing Department 2002.
Principles of Linear Pipelining
Computer Organization. This module surveys the physical resources of a computer system.  Basic components  CPU  Memory  Bus  I/O devices  CPU structure.
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Application of the MCMC Method for the Calibration of DSMC Parameters James S. Strand and David B. Goldstein The University of Texas at Austin Sponsored.
Vector and symbolic processors
Research on TCAM-based OpenFlow Switch Author: Fei Long, Zhigang Sun, Ziwen Zhang, Hui Chen, Longgen Liao Conference: 2012 International Conference on.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 22 Memory Definitions Memory ─ A collection of storage cells together with the necessary.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
1 Applications of Hidden Markov Models (Lecture for CS498-CXZ Algorithms in Bioinformatics) Nov. 12, 2005 ChengXiang Zhai Department of Computer Science.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.
CMSC 611: Advanced Computer Architecture
Backprojection Project Update January 2002
William Stallings Computer Organization and Architecture 8th Edition
buses, crossing switch, multistage network.
핵심어 검출을 위한 단일 끝점 DTW 알고리즘 Yong-Sun Choi and Soo-Young Lee
Parallel Programming By J. H. Wang May 2, 2017.
CLUSTER COMPUTING Presented By, Navaneeth.C.Mouly 1AY05IS037
5.2 Eleven Advanced Optimizations of Cache Performance
Cache Memory Presentation I
How does an SIMD computer work?
COMP4211 : Advance Computer Architecture
Multivector and SIMD Computers
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
buses, crossing switch, multistage network.
Connecting Data with Domain Knowledge in Neural Networks -- Use Deep learning in Conventional problems Lizhong Zheng.
Implementation of Relational Operations
A Few Sample Reductions
Authors: Ding-Yuan Lee, Ching-Che Wang, An-Yeu Wu Publisher: 2019 VLSI
Presentation transcript:

Processor Memory Networks Based on Steiner Systems Tom VanCourtBoston University Martin C. HerbordtECE Department

30 Jan Introduction Problem: Problem: “Big Science” brought genomes to the desktop “Big Science” brought genomes to the desktop Primary analysis engine is still PC! Our Goal: Our Goal: Bring “Big Computation” to the desktop. Bring “Big Computation” to the desktop. Proposed Solution: Proposed Solution: PC/Workstation plus PC/Workstation plus FPGA-based coprocessor

30 Jan Desired Problem Characteristics Low resolution data Low resolution data Data have few bits Large but manageable data sets Large but manageable data sets Thousands of genes, thousands of samples (at a time) High-dimensional parameter set High-dimensional parameter set Must be searched or enumerated to identify solution Simple performance criteria (score functions) Simple performance criteria (score functions) Evaluated for each candidate parameter vector Decomposable search strategy Decomposable search strategy Multiple small problems to solve in parallel Heavy reuse of data Heavy reuse of data Combinations, permutations, orientations

30 Jan Biology/Chemistry Problems Sequence alignment Sequence alignment Approximate string matching, dynamic programming Molecule interactions – voxel model Molecule interactions – voxel model 3-axis rotation, 3D convolution Microarray data analysis Microarray data analysis Typical: 10 to 10 2 samples,10 3 to 10 4 genes/sample Hidden Markov models Hidden Markov models Baum-Welch training, soft Viterbi decode Compute-intensive statistical analysis Compute-intensive statistical analysis Bootstraps, sampling background models

30 Jan Sample Task-Specific Processor Input: Input: Vectors V 1, V 2, … V v Query: Query: Which set of t vectors maximizes f(V a, V b, …)? Architecture: Architecture: Parallel PEs on FPGA coprocessor f VaVa VbVb VcVc

Problem: Input Bandwidth Assume ~128 PEs Assume ~128 PEs × 3 inputs per PE × 3 inputs per PE = 384 values per cycle × 4 bits per value = 1536 bits per cycle Wide-word solution: INFEASIBLE Wide-word solution: INFEASIBLE 400-ported RAM? Data fetched faster than host can load it 384 input values needed

Distribution Network X memory: k data values supply PEs. X memory: k data values supply PEs. k = 9  84 PEs,252 PEs X inputs,28× reuse k = 9  84 PEs,252 PEs X inputs,28× reuse k = 10  120 PEs,360 PEs X inputs,36× reuse k = 10  120 PEs,360 PEs X inputs,36× reuse Generates all size-3 subsets of k data values Generates all size-3 subsets of k data valuesk kk3kk3 PE 1 PE 2 PE 3 PE 0 … X 1-10 PE 4 Vector Data Memory PE 119

Vector Data Memory Steiner system S(v, k, t) Steiner system S(v, k, t) Divide v objects into subsets of size k, so that every size-t subset is in just one size-k subset. t = 3 (triplets), k = X memories, v = total genes t = 3 (triplets), k = X memories, v = total genes Host selects vector sets { m 1, m 2,…} via RAM content Host selects vector sets { m 1, m 2,…} via RAM content X1X1 X2X2 XmXm Vector Select RAM indx 1 indx 2 indx m indx SEL sets of vector choices ~100 vectors per X memory

Memory and Data Distribution Two-level data reuse Two-level data reuse Temporal reuse by Temporal reuse by Vector Select Ram Vector Select Ram Spatial reuse by Spatial reuse by Distribution Network Distribution Network Whole VDM duplicated Double buffering Overlaps reload, reading Divide v objects into subsets of size k … … so that every size-3 subset is injust one size-k subset is in just one size-k subset k Vector data memory (VDM) …

30 Jan Conditions for Success Data must be strings/vectors Data must be strings/vectors If scalar, then Vector Select RAM would be enough … host to PE transfer would be enough Longer vectors better Longer vectors better Longer vector  More time per Vector Select word …  Longer time between reloads Narrow data words better Narrow data words better Fewer bits per vector, more vectors for bandwidth

Open problems Steiner systems are special cases Steiner systems are special cases k-sets that contain each t-set exactly once Theory guarantees large numbers of cases Set-covering problem in other cases Set-covering problem in other cases k-sets that contain each t-set at least once Finding collections of k-sets hard Finding collections of k-sets hard Believed NP  hard Believed NP  hard Constructive forms of existence theorems?