A new concept to use 3D vertical integration technology for fast pattern recognition Ted Liu, Jim Hoff, Grzegorz Deptuch, Ray Yarema Fermilab Questions.

Slides:

Advertisements

Similar presentations

1/1/ / faculty of Electrical Engineering eindhoven university of technology Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir.

Advertisements

CSC1016 Coursework Clarification Derek Mortimer March 2010.

Optimal Layout of CMOS Functional Arrays ECE665- Computer Algorithms Optimal Layout of CMOS Functional Arrays T akao Uehara William M. VanCleemput Presented.

Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.

Andrei Nomerotski 1 3D ISIS : Different approach to ISIS Andrei Nomerotski, LCFI Collaboration Meeting Bristol, 20 June 2006 Outline  What is 3D ? u Reviewed.

Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.

A Cost-Driven Lithographic Correction Methodology Based on Off-the-Shelf Sizing Tools.

Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik

FTK poster F. Crescioli Alberto Annovi

Associative Pattern Memory (APM) Larry Werth July 14, 2007

Power Reduction for FPGA using Multiple Vdd/Vth

SVT workshop October 27, 1998 XTF HB AM Stefano Belforte - INFN Pisa1 COMMON RULES ON OPERATION MODES RUN MODE: the board does what is needed to make SVT.

MICROPROCESSOR INPUT/OUTPUT

Open Discussion of Design Flow Today’s task: Design an ASIC that will drive a TV cell phone Exercise objective: Importance of codesign.

07/19/2005 Arithmetic / Logic Unit – ALU Design Presentation F CSE : Introduction to Computer Architecture Slides by Gojko Babić.

1 Digital Active Pixel Array (DAPA) for Vertex and Tracking Silicon Systems PROJECT G.Bashindzhagyan 1, N.Korotkova 1, R.Roeder 2, Chr.Schmidt 3, N.Sinev.

Optimal digital circuit design Mohammad Sharifkhani.

Electrical Engineering at Fermilab The Hidden Agenda Behind All This Physics Stuff.

Design of Flash-Based DBMS: An In-Page Logging Approach Sang-Won Lee and Bongki Moon Presented by Chris Homan.

1 Associative Memory R&D in “extra dimension” Ted Liu Fermilab WIT 2012, Pisa May 5, 2012.

VIP1: a 3D Integrated Circuit for Pixel Applications in High Energy Physics Jim Hoff*, Grzegorz Deptuch, Tom Zimmerman, Ray Yarema - Fermilab *

GLD DOD to do list and plan for IR section What to write in DOD What do be done before Jan-19 meeting (and after up to Bangalore meeting) - length : (guideline.

NA62 Trigger Algorithm Trigger and DAQ meeting, 8th September 2011 Cristiano Santoni Mauro Piccini (INFN – Sezione di Perugia) NA62 collaboration meeting,

CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.

Beam Tests of 3D Vertically Interconnected Prototypes Matthew Jones (Purdue University) Grzegorz Deptuch, Scott Holm, Ryan Rivera, Lorenzo Uplegger (FNAL)

AMB HW LOW LEVEL SIMULATION VS HW OUTPUT G. Volpi, INFN Pisa.

Chapter 4 MARIE: An Introduction to a Simple Computer.

Computer Architecture Lecture 32 Fasih ur Rehman.

PHASE-1B ACTIVITIES L. Demaria – INFN Torino. Introduction  The inner layer of the Phase 1 Pixel detector is exposed to very high level of irradiation.

G. Volpi - INFN Frascati ANIMMA Search for rare SM or predicted BSM processes push the colliders intensity to new frontiers Rare processes are overwhelmed.

Valerio Re, Massimo Manghisoni Università di Bergamo and INFN, Pavia, Italy Jim Hoff, Abderrezak Mekkaoui, Raymond Yarema Fermi National Accelerator Laboratory.

A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.

Priority encoder. Overview Priority encoder- theoretic view Other implementations The chosen implementation- simulations Calculations and comparisons.

A High-Speed & High-Capacity Single-Chip Copper Crossbar John Damiano, Bruce Duewer, Alan Glaser, Toby Schaffer, John Wilson, and Paul Franzon North Carolina.

1 Level 1 Pre Processor and Interface L1PPI Guido Haefeli L1 Review 14. June 2002.

Electronics Preparatory Group 6 June Events which happened  Meeting of all the conveners of working groups 

1 protoVIPRAM2D: Realization and Testing T. Liu*, G. Deptuch, J. Hoff, S. Jindariani, S. Joshi, J. Olsen, N.Tran, M. Trimpl Fermi National Accelerator.

A Fast Hardware Tracker for the ATLAS Trigger System A Fast Hardware Tracker for the ATLAS Trigger System Mark Neubauer 1, Laura Sartori 2 1 University.

Living Long At the LHC G. WATTS (UW/SEATTLE/MARSEILLE) WG3: EXOTIC HIGGS FERMILAB MAY 21, 2015.

FTK high level simulation & the physics case The FTK simulation problem G. Volpi Laboratori Nazionali Frascati, CERN Associate FP07 MC Fellow.

Hybrid CMOS strip detectors J. Dopke for the ATLAS strip CMOS group UK community meeting on CMOS sensors for particle tracking , Cosenors House,

Associative Memory design for the Fast Track processor (FTK) at Atlas I.Sacco (Scuola Superiore Sant’Anna) On behalf Amchip04 project (A. Annovi, M. Beretta,

Calliope-Louisa Sotiropoulou FTK: E RROR D ETECTION AND M ONITORING Aristotle University of Thessaloniki FTK WORKSHOP, ALEXANDROUPOLI: 10/03/2014.

GUIDO VOLPI – UNIVERSITY DI PISA FTK-IAPP Mid-Term Review 07/10/ Brussels.

April 21, 2016Introduction to Artificial Intelligence Lecture 22: Computer Vision II 1 Canny Edge Detector The Canny edge detector is a good approximation.

The AMchip on the AMBoard Saverio Citraro PhD Student University of Pisa & I.N.F.N. Pisa.

FSSR2 block diagram The FSSR2 chip architecture is virtually identical to that of FPIX2. Each strip is treated as one pixel cell (Pseudo-Pixel architecture)[*]

Outline The Pattern Matching and the Associative Memory (AM)

Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir. A.C. Verschueren Eindhoven University of Technology Section of Digital.

Computer Organization and Architecture + Networks

The Associative Memory – AM = Bingo

Design of the 64-channel ASIC: status

HEP Track Finding with the Micron Automata Processor and Comparison with an FPGA-based Solution Michael Wang, Gustavo Canelo, Christopher Green, Ted Liu,

2018/6/15 The Fast Tracker Real Time Processor and Its Impact on the Muon Isolation, Tau & b-Jet Online Selections at ATLAS Francesco Crescioli1 1University.

Pending technical issues and plans to address and solve

Full Custom Associative Memory Core

Meeting at CERN March 2011.

Accelerating Pattern Matching for DPI

FTK variable resolution pattern banks

Cache Memory Presentation I

Interfacing Memory Interfacing.

The digital read-out for the CSC system of the TOTEM experiment at LHC

The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.

The digital read-out for the CSC system of the TOTEM experiment at LHC

Jason Klaus, Duncan Elliott Confidential

Fourier Transform of Boundaries

SVT detector electronics

Chapter 13: I/O Systems.

Authors: A. Rasmussen, A. Kragelund, M. Berger, H. Wessing, S. Ruepp

Preliminary design of the behavior level model of the chip

Presentation transcript:

A new concept to use 3D vertical integration technology for fast pattern recognition Ted Liu, Jim Hoff, Grzegorz Deptuch, Ray Yarema Fermilab Questions or Comments:

Introduction and Outline The development of 3D technology for the solution of the fast pattern recognition problem is part of a broader, ongoing R&D effort that includes both 2D and 3D solutions. This talk will cover: ◦ An introduction to the problem ◦ A description of the Associative Memory solution ◦ A new concept – VIPRAM – that uses emerging 3D technology

cm -2 s simulation The Obvious Problem… There are enormous challenges in implementing pattern recognition for a tracking trigger at LHC (L1&L2), due to 1.The much higher occupancy and event rates at the LHC 2.The much more massive detectors 3.The larger number of channels in their tracking volumes There is a clear need to develop/improve the hardware-based pattern recognition technology to advance the state-of-the-art for the future

The Challenges To increase the patterns density by 3 orders of magnitude (from the original AMchips) and increase the speed by more than a factor of 3 while reducing power consumption (or at least dramatically reducing the rate of increase of power consumption) 1. [1] Based on the extensive simulation studies by Atlas FTK Collaboration

Some Obvious Questions… Can’t we just use what we currently have and just make bigger PC boards or more of them? ◦ No. This results in severe speed bottlenecks and power issues. Can’t we just use commercial CAMs? ◦ No. CAMs are part of the fast pattern recognition process, but not all of it. Alone, CAMs lack certain necessary features, making them unsuitable for fast track triggering.

It’s not a CAM; it’s a PRAM A CAM (Content Addressable Memory) is a classical digital system building block One pattern at a time Each CAM cell responds or does not respond to the current pattern There is no memory of previous matches Pattern 1 Match Pattern 3 Match Pattern 7 Match

It’s not a CAM; it’s a PRAM A PRAM on the other hand is a Pattern Recognition Associative Memory. Layer 1 Address 4 Match Layer 1 Address 4 Match Layer 3 Address 7 Match Layer 3 Address 7 Match Layer 3 Address 9 Match Layer 3 Address 9 Match Layer 2 Address 1 Match Layer 2 Address 1 Match Layer 4 Address 4 Match Layer 4 Address 4 Match Layer 2 Address 4 Match Layer 2 Address 4 Match Road!

History and the “traditional” effort The AMchips were invented and developed in Italy resulting in the AMchip03 which is currently being used by CDF. There is an ongoing effort, led by Italians, to improve on the AMchip03 design. We are now a part of this collaboration. The idea, of course, is to increase pattern density and speed and to optimize the performance. Design in deep sub-micron processes. The current target is 65nm.

Limitations in 2D…

A Single PRAM Cell (in 2 dimensions) CAM Cells Match Storage Match lines Glue Logic In the older version of the AMchip, the match lines were a source of speed limitation because of their length and capacitance. The Glue Logic was large and slow. Length -> Capacitance -> Reduced Speed

THE CONCEPT – VIPRAM Vertically Integrated Pattern Recognition Associative Memory A Reduced Footprint and therefore greater pattern density. Shorter Match lines and therefore greater speed. Less Capacitance and therefore reduced power consumption Each detector layer corresponds to a single tier All communication from “CAM Tiers” to the single “Control Tier” The PRAM concept is tailor- made for 3D design. Much Shorter

Another Single CAM Cell (this time in 3 dimensions) Viewing this structure as a pseudo-layout some of the aforementioned benefits become even more obvious. The 3-dimensional design of the VIPRAM makes the PRAM appear like a 2- dimensional array of “tubes”, each dedicated to a single pattern. Communication with the outside world during normal operation is done solely through the Control Tier (the blue tier on top).

Pattern recognition for tracking is naturally a task in 3D track road

Majority Logic – Old Version Match Lines User-defined Threshold Road Flag Adder Digital Comparator

Majority Logic – New Version 0 1 Sel Match1Match2Match3Match4 Match Pattern Pass Transistor Logic

Majority Logic – New Approach Majority PatternMeaning 111Perfect Match 0111 Missing Layer 0012 Missing Layers 0003 or More Missing Layers Stage Input Stage Output: Match Stage Output: Mismatch For each stage… In the end…

Can 3D exploit even more advantages from the new Majority Logic? Yes. We have divided the 3D design by detector layer (i.e. each CAM Tier is dedicated to one detector layer) Therefore, any logical division by detector layer results in functions that can be sub- divided by tier.

Can 3D exploit even more advantages from the new Majority Logic? 0 1 Sel Match1Match2Match3Match4 Match Pattern

Readout The top tier (a.k.a. the Control Tier) is a two dimensional array of elements whose position is indicative of its address and that contains an indication of whether or not a road was found. Compare this with a pixel array which is a two dimensional array of elements whose position is indicative of its address and that contains an indication of whether or not a hit was found. In other words, high-speed readout architectures for pixel arrays can and should be used for VIPRAM readout.

Design for Simplicity The VIPRAM has two types of tiers, CAM and Control. In the final design, there will be several CAM tiers and only one Control tier. Each CAM tier is functionally identical to the others, but must maintain a unique relationship to the Control tier in order to work. In other words, patterns that come into the Control Tier from Detector “1” must be sent to the CAM tier dedicated to Detector “I”. Similarly, when data is sent from CAM tier #3, the Control Tier must know it came from CAM tier #3 and not some other CAM tier. How can this be done without requiring unique mask sets for each CAM tier?

Great minds think alike? Having gone part-way through this design procedure, the collaboration had the opportunity to meet with Bob Patti of Tezzaron who has been involved in 3D memory design from the beginning. Tezzaron’s 3D Memories follow exactly this arrangement of Control Tier and (in Tezzaron’s case) Memory Tier. In other words, we are following a beaten path, not blazing a new trail.

The Diagonal Via The Diagonal Via was patented by Bob Patti and Tezzaron in It converts vertical position to horizontal position and allows a common mask set to provide unique access to each layer.

Conclusions and Future Work The VIPRAM is a new concept and now we are developing a collaboration with Fermilab, University of Chicago, INFN and Argonne. ◦ The immediate goal is a proof of principal ◦ The ultimate goal is a 3 order-of-magnitude increase in performance (density+speed). At present, we are seeking funding for the VIPRAM development. You will hear from us again at the next TIPP (please pick a nice place for my wife…)

Background

Figure 13 - Pass Transistor Multiplexors in the Majority Logic

VIPRAM – A Vertically Integrated PRAM Modern technology provides us with another approach…and another dimension. At first, the idea was extremely simple – increase the pattern density by stacking otherwise normal AMchips. The outputs of existing AMchips are already in a daisy chain. The stacked AMchips would not need to “know” that they were part of a stack.

VIPRAM – A Vertically Integrated PRAM This was necessarily modified to include “wrapping” an AMchip in circuitry that dealt with the 3D stacking, leaving an AMchip core that was identical to the 2D AMchips that are under development.

Not the first to consider 3D Content Addressable Memory Oh and Franzone 1 first suggested the advantages of 3D design on CAMs in 2007 Their idea involved vertically integrating the CAM cell itself so that the Matchline was vertical. This minimized its length and therefore its capacitance. The method is highly impractical since it requires f(N) 3D layers where N is the number of bits in the CAM cell. CAM Bit Cell CAM Bit Cell CAM Bit Cell CAM Bit Cell Matchline 3D Layer 1 3D Layer 2 3D Layer 3 3D Layer N [1] E.C. Oh and P.D. Franzon, “Design Considerations and Benefits of Three-Dimensional Ternary Content Addressable Memory”, IEEE Custom Integrated Circuits Conference, 2007, p. 591

Again, this is a PRAM not a CAM There is a perfectly natural, 3D functional division in a PRAM. Each detector layer gets its own 3D layer. The vertical interconnect is not the CAM match line, but the Road line. Moreover, each detector layer has independent data lines for both pattern matching and pattern loading, and this is a natural consequence of this architecture. Match Road! 3D Layer 1 3D Layer 2 3D Layer 3 3D Layer N

How can we improve on this design? 4 blocks of layer patterns ~80% AM bank ~20% control & interface Move to another tier in 3D

How can we fundamentally improve on this design?

Majority block still in standard cell ~ 30% can be also moved to the control/interface tier in 3D within each pattern block

Fischer Tree (Mephisto Logic) P. Fischer introduced the Mephisto readout architecture [1]. We found “Fischer Tree” easier to say. It is a self-selecting, self-addressing priority encoding architecture that performs the task in log[N] time. [1]“First implementation of the MEPHISTO binary readout architecture for strip detectors” Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment Volume 461, Issues 1-3, 1 April 2001, Pages th Pisa Meeting on Advanced Detectors

Fischer Tree (Mephisto Logic) Fischer Trees can be stacked if need be, so the two dimensional array in the Control Tier can be handled this way. An alternate approach could take each output and push it into a stack. Fischer Tree Col 1Col 2Col 3Col N Fischer Tree …