04/04/20071 Image Understanding Architecture: Exploiting Potential Parallelism in Machine Vision.

Slides:



Advertisements
Similar presentations
DSPs Vs General Purpose Microprocessors
Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Lecture 6: Multicore Systems
Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project.
SISD—Single Instruction Single Data Xin Meng Tufts University School of Engineering.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
Real-Time Video Analysis on an Embedded Smart Camera for Traffic Surveillance Presenter: Yu-Wei Fan.
Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Reconfigurable Application Specific Computers RASCs Advanced Architectures with Multiple Processors and Field Programmable Gate Arrays FPGAs Computational.
A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
Fall 2008Introduction to Parallel Processing1 Introduction to Parallel Processing.
Router Architectures An overview of router architectures.
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.
Router Architectures An overview of router architectures.
Computer Science Department, Duke UniversityPhD Defense TalkMay 4, 2005 Fast Extraction of Feature Salience Maps for Rapid Video Data Analysis Nikos P.
Parallel Architectures
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Multiprocessor systems Objective n the multiprocessors’ organization and implementation n the shared-memory in multiprocessor n static and dynamic connection.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Chapter 1 The Big Picture.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
F. Gharsalli, S. Meftali, F. Rousseau, A.A. Jerraya TIMA laboratory 46 avenue Felix Viallet Grenoble Cedex - France Embedded Memory Wrapper Generation.
Chapter 9: Alternative Architectures In this course, we have concentrated on single processor systems But there are many other breeds of architectures:
1 Introduction CEG 4131 Computer Architecture III Miodrag Bolic.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
Ch. 2 Data Manipulation 4 The central processing unit. 4 The stored-program concept. 4 Program execution. 4 Other architectures. 4 Arithmetic/logic instructions.
Flynn’s Architecture. SISD (single instruction and single data stream) SIMD (single instruction and multiple data streams) MISD (Multiple instructions.
Fall 2004EE 3563 Digital Systems Design EE 3563 VHSIC Hardware Description Language  Required Reading: –These Slides –VHDL Tutorial  Very High Speed.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
1 Implementing An Associative Processor on FPGAs.
Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.
CS 4396 Computer Networks Lab Router Architectures.
Department of Industrial Engineering Sharif University of Technology Session# 6.
Parallel Computing.
Lecture 7: Overview Microprocessors / microcontrollers.
Lecture 3: Computer Architectures
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.
Self-Tuned Distributed Multiprocessor System Xiaoyan Bi CSC Operating Systems Dr. Mirela Damian.
Generations of Computing. The Computer Era Begins: The First Generation  1950s: First Generation for hardware and software Vacuum tubes worked as memory.
Los Alamos National Laboratory Streams-C Maya Gokhale Los Alamos National Laboratory September, 1999.
Computer Operation. Binary Codes CPU operates in binary codes Representation of values in binary codes Instructions to CPU in binary codes Addresses in.
Array computers. Single Instruction Stream Multiple Data Streams computer There two types of general structures of array processors SIMD Distributerd.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Visual Information Processing. Human Perception V.S. Machine Perception  Human perception: pictorial information improvement for human interpretation.
These slides are based on the book:
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Edexcel GCSE Computer Science Topic 15 - The Processor (CPU)
CS703 - Advanced Operating Systems
Laxmi Narayan Bhuyan SIMD Architectures Laxmi Narayan Bhuyan
Parallel Architectures
Mapping DSP algorithms to a general purpose out-of-order processor
Presentation transcript:

04/04/20071 Image Understanding Architecture: Exploiting Potential Parallelism in Machine Vision

04/04/20072 Overview Heterogeneous parallel processor –Three distinct layers Meet the real-time computational requirements of computer vision systems Exploit the various forms of parallelism within a computer vision algorithm suite –Data parallelism –Control parallelism Collaborative effort –University of Massachusetts, Amhurst (UMASS) –Hughes Research Laboratory Circa 1980’s –Part of the DARPA image understanding research initiative

04/04/20073 Background What is computer vision? Process images (or streams of images) with the intent of –Object recognition –Vehicle guidance –Manufacturing –etc. Create a system that extracts information (not just data) from a picture

04/04/20074 An Image (as we see it)

04/04/20075 Mechanization of Processing

04/04/ An Image (as the computer sees it)

04/04/20077 What gets processed? Gradients (edges) Color Transformations –Rotation –Translation –Stretching Texture Shading Shape Context All of these are based on searching for patterns in the image colors

04/04/20078 How should a computer process visual scenes? As it turns out, mimicking a biological system in circuits and software is extremely difficult –Cameras are not as sophisticated as the eye –Processors/software are not as sophisticated as the brain

04/04/20079 How should a computer process visual scenes? Preprocessing –Image conditioning (number crunching) Low Level Vision –Feature extraction (number crunching) Mid Level Vision –Feature description (symbolic processing) High Level Vision –Object recognition (advanced data structures)

04/04/ Three distinct “levels” Data intensive –Preprocessing –Low Level Vision Semi-data intensive, semi-control intensive –Mid Level Vision Control intensive –High Level Vision

04/04/ How do we process visual scenes? Preprocessing –Image conditioning Low Level Vision –Feature extraction Mid Level Vision –Feature description High Level Vision –Object recognition R1 R2 R3 G1 Intersects(70) Intersects(40) Bounds

04/04/ Basic Philosophy Create a computer comprised of three architectures, each suited to one of the levels of a computer vision application Heterogeneous architecture –Multiple types of processing elements –Multiple interconnect topologies –Multiple programming languages

04/04/ The Architecture

04/04/ The Architecture Content Addressable Array Parallel Processor (CAAPP) –SIMD –Configurable into separate groups Intermediate Communication Associative Processor (ICAP) –MIMD –SPMD (Single Program Multiple Data) All PEs have the same program but each has it’s own program counter (Asynchronous SIMD) Symbolic Processing Array (SPA) –MIMD

04/04/ CAAPP Bit-serial processors ALU 320 bits of cache 32KBits of main memory Instructions come “from above” –Array Control Unit (ACU) Communication –Configurable mesh – coterie network –A “coterie” is a group of PEs that work [somewhat] independently (still only 1 instruction stream)

04/04/ ICAP Digital Signal Processor (DSP) –Specialty architecture for performing numerical transforms 320 bits of cache 256KBytes of main memory (128K program, 128K data) Communication –Cross-bar switch

04/04/ SPA Not fully specified –Commercially available multi-processor –Networked workstations

04/04/ CAAPP to ICAP Communication One ICAP PE is responsible for (communicates with) 64 CAAPP PEs (8 x 8 mesh) Communication is via a dual-port [shared] memory

04/04/ ICAP to SPA Communication One SPA PE is responsible for (communicates with) 64 ICAP PEs (8 x 8 mesh) Communication is via a dual-port [shared] memory

04/04/ Full System Specification 64 SPA PEs –MIMD –RISC processing architecture 4K ICAP PEs (64 x 64) –MIMD/SPMD –Digital Signal Processor (DSP) 256K CAAPP PEs (512 x 512) –SIMD –1-bit processing architecture

04/04/ st Generation Proof of concept –4096 CAAPP processors (64 x 64) –64 ICAP processors –1 SPA processor CAAPP chip ICAP board System chasis

04/04/ nd Generation 1/16 th of a full scale system –16K CAAPP processors –64 ICAP processors Commercial chips – TI TMS320C40 32-bit processor Communication is token-ring over 2 x 2 meshes plus inter-processor DMA channels –4 SPA processors Networked workstations

04/04/ nd Generation CAAPP chip ICAP board ICAP communcation topology

04/04/ Programming Required writing separate code for each level In the beginning there were 3 different programming languages involved –Forth, C, Assembly Plan was to move to C/C++/Lisp with parallel extensions (class libraries) –Ada was planned Goal was to develop a single language compiler for the entire system

04/04/ Interesting Bits This group actually started from the problem specification and set out to build an architecture to support it –Contrary to other parallel processor developments of the time Programming proved very difficult –Require intimate knowledge of architecture (especially the coterie network) and algorithms –A simulator of the full system existed for program development, research

04/04/ End Notes Emphasis was on –Proof of concept –Mapping algorithms to the architecture –Fabricating chips Cancelled 1995 –Various chips/board fabricated and tested –Various software components developed and tested