George Viamontes, Mohammed Amduka, Jon Russo, Matthew Craven, Thanhvu Nguyen Lockheed Martin Advanced Technology Laboratories 3 Executive Campus, 6th Floor.

Slides:



Advertisements
Similar presentations
V-Detector: A Negative Selection Algorithm Zhou Ji, advised by Prof. Dasgupta Computer Science Research Day The University of Memphis March 25, 2005.
Advertisements

Learning deformable models Yali Amit, University of Chicago Alain Trouvé, CMLA Cachan.
EEL6686 Guest Lecture February 25, 2014 A Framework to Analyze Processor Architectures for Next-Generation On-Board Space Computing Tyler M. Lovelly Ph.D.
Using GPUs to Enable Highly Reliable Embedded Storage University of Alabama at Birmingham 115A Campbell Hall 1300 University Blvd. Birmingham, AL
 Understanding the Sources of Inefficiency in General-Purpose Chips.
電腦視覺 Computer and Robot Vision I Chapter2: Binary Machine Vision: Thresholding and Segmentation Instructor: Shih-Shinh Huang 1.
Empowering visual categorization with the GPU Present by 陳群元 我是強壯 !
K-means clustering –An unsupervised and iterative clustering algorithm –Clusters N observations into K clusters –Observations assigned to cluster with.
Computational Vision: Object Recognition Object Recognition Jeremy Wyatt.
APPLIED SIGNAL PROCESSING AND IMPLEMENTATION Introduction Spring 2005 Embedded Systems group: pk, oo, ylm,.... Dicom group: kjh, pr, uh,..
Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray
Motion Analysis (contd.) Slides are from RPI Registration Class.
Reconfigurable Application Specific Computers RASCs Advanced Architectures with Multiple Processors and Field Programmable Gate Arrays FPGAs Computational.
1Ellen L. Walker Matching Find a smaller image in a larger image Applications Find object / pattern of interest in a larger picture Identify moving objects.
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
“Evaluating MapReduce for Multi-core and Multiprocessor Systems” Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis Computer.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Implementation of Digital Front End Processing Algorithms with Portability Across Multiple Processing Platforms September 20-21, 2011 John Holland, Jeremy.
COMPUTER ORGANIZATIONS CSNB123 May 2014Systems and Networking1.
“Low-Power, Real-Time Object- Recognition Processors for Mobile Vision Systems”, IEEE Micro Jinwook Oh ; Gyeonghoon Kim ; Injoon Hong ; Junyoung.
“Study on Parallel SVM Based on MapReduce” Kuei-Ti Lu 03/12/2015.
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
Exploiting Data Parallelism in SELinux Using a Multicore Processor Bodhisatta Barman Roy National University of Singapore, Singapore Arun Kalyanasundaram,
HPEC_NDP-1 MAH 10/11/2015 MIT Lincoln Laboratory Efficient Multidimensional Polynomial Filtering for Nonlinear Digital Predistortion Matthew Herman, Benjamin.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Floating-Point Reuse in an FPGA Implementation of a Ray-Triangle Intersection Algorithm Craig Ulmer June 27, 2006 Sandia is a multiprogram.
HPEC-1 MIT Lincoln Laboratory Session 1: GPU: Graphics Processing Units Miriam Leeser / Northeastern University HPEC Conference 15 September 2010.
© 2007 SET Associates Corporation SAR Processing Performance on Cell Processor and Xeon Mark Backues, SET Corporation Uttam Majumder, AFRL/RYAS.
MotivationFundamental ProblemsProblems on Graphs Parallel processors are becoming common place. Each core of a multi-core processor consists of a CPU and.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
A Closer Look At GPUs By Kayvon Fatahalian and Mike Houston Presented by Richard Stocker.
HPEC2002_Session1 1 DRM 11/11/2015 MIT Lincoln Laboratory Session 1: Novel Hardware Architectures David R. Martinez 24 September 2002 This work is sponsored.
Improving I/O with Compiler-Supported Parallelism Why Should We Care About I/O? Disk access speeds are much slower than processor and memory access speeds.
SLAAC SLD Update Steve Crago USC/ISI September 14, 1999 DARPA.
Slide-1 Multicore Theory MIT Lincoln Laboratory Theory of Multicore Algorithms Jeremy Kepner and Nadya Bliss MIT Lincoln Laboratory HPEC 2008 This work.
GPUs – Graphics Processing Units Applications in Graphics Processing and Beyond COSC 3P93 – Parallel ComputingMatt Peskett.
Embedded System Lab. 오명훈 Addressing Shared Resource Contention in Multicore Processors via Scheduling.
COMPUTER ORGANIZATIONS CSNB123 NSMS2013 Ver.1Systems and Networking1.
HPEC HN 9/23/2009 MIT Lincoln Laboratory RAPID: A Rapid Prototyping Methodology for Embedded Systems Huy Nguyen, Thomas Anderson, Stuart Chen, Ford.
1 An interactive tool for analyzing and visualizing Kronecker Graph Huy N. Nguyen / MIT CSAIL Alan Edelman / MIT Dep. of Mathematics 13th Annual Workshop.
NSF/TCPP Curriculum Planning Workshop Joseph JaJa Institute for Advanced Computer Studies Department of Electrical and Computer Engineering University.
High Performance Flexible DSP Infrastructure Based on MPI and VSIPL 7th Annual Workshop on High Performance Embedded Computing MIT Lincoln Laboratory
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
MIT Lincoln Laboratory 1 of 4 MAA 3/8/2016 Development of a Real-Time Parallel UHF SAR Image Processor Matthew Alexander, Michael Vai, Thomas Emberley,
Static Translation of Stream Program to a Parallel System S. M. Farhad The University of Sydney.
CS203 – Advanced Computer Architecture Performance Evaluation.
Institute of Software,Chinese Academy of Sciences An Insightful and Quantitative Performance Optimization Chain for GPUs Jia Haipeng.
Presenter: Jae Sung Park
K-Nearest Neighbor Digit Recognition ApplicationDomainConstraintsKernels/Algorithms Voice Removal and Pitch ShiftingAudio ProcessingLatency (Real-Time)FFT,
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
HPEC-1 SMHS 7/7/2016 MIT Lincoln Laboratory Focus 3: Cell Sharon Sacco / MIT Lincoln Laboratory HPEC Workshop 19 September 2007 This work is sponsored.
CS203 – Advanced Computer Architecture
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
A Level Computing – a2 Component 2 1A, 1B, 1C, 1D, 1E.
M. Richards1 ,D. Campbell1 (presenter), R. Judd2, J. Lebak3, and R
Embedded Systems Design
Presented by: Tim Olson, Architect
SmartCell: A Coarse-Grained Reconfigurable Architecture for High Performance and Low Power Embedded Computing Xinming Huang Depart. Of Electrical and Computer.
Texas Instruments TDA2x and Vision SDK
ANTHAN HALKO, PER-GUNNAR MARTINSSON, YOEL SHAOLNISKY, AND MARK TYGERT
Compiler Back End Panel
Summary Background Introduction in algorithms and applications
Compiler Back End Panel
Coe818 Advanced Computer Architecture
Final Project presentation
Introduction to Heterogeneous Parallel Computing
EE 4xx: Computer Architecture and Performance Programming
Parallel Algorithm Models
COT 4600 Operating Systems Fall 2009
Presentation transcript:

George Viamontes, Mohammed Amduka, Jon Russo, Matthew Craven, Thanhvu Nguyen Lockheed Martin Advanced Technology Laboratories 3 Executive Campus, 6th Floor Cherry Hill, NJ Phone (856) Fax (856) {gviamont, mamduka, jrusso, mcraven, Efficient Memoization Strategies for Object Recognition with a Multi-Core Architecture 11th Annual Workshop on High Performance Embedded Computing (HPEC) MIT Lincoln Laboratory September 2007 This work was sponsored by DARPA/IPTO in the Architectures for Cognitive Information Processing program; contract number FA C-0266.

Object Recognition Goal: efficiently identify objects in a 2D image Good techniques must account for translational, rotational, and size variance as well as partial obscurement when comparing a real object to a library Examples include:  The Chunky SLD algorithm from Sandia is representative of the best statistical methods and has been ported to FPGAs  Geometric hashing makes use of simple distance calculations in “hash space” to account for image variations Object recognition techniques are highly parallelizable and require only simple calculations, but they are extremely memory intensive

Memory Bottleneck in Multi-Core Statistical methods require simple computations on individual pixels. The image library in memory must be compared to many times. Geometric hashing applies a hash function on pixels or small groups of pixels (features) to map to a location in memory. Simple distance calculations in hash space account for image variations. In both types of algorithms, memory access is the bottleneck, not processing power.

Programmable Objective Evaluation Memory (POEM) For Hardware-level Memoization Programmable Objective  Inexact associative lookup  vector chunks and operands  Room for additional variable (random, decay) Example:  Conditional Sum of Absolute Differences  Computed SIMD parallel (vector and chunk) 8 x N dual port memory channels Vector operation parallelized and pipelined  Trained to optimize spanning coverage of correct lookup Performance  FPGA: 50 MegaChunks per Second  Projected Chip: 200 MegaChunks per Second Chunk 1 Chunk 2 … Chunk G F F F F Operand < < < reference Content 1 Content 2 … Content G Chunk k*G+1 Chunk k*G+2 … Chunk N F F F F < < < reference Content 1 Content 2 … Content G … … (Pipeline) Operand v v v v v v v v 4x8 evaluation cores3 minimization cores

Proposed Architecture Each memory/processor chunk is POEM-based to implement hash space calculations Avoid memory bottleneck by distributing image across POEM chunks