Outline The Pattern Matching and the Associative Memory (AM)

Slides:



Advertisements
Similar presentations
Token Bit Manager for the CMS Pixel Readout
Advertisements

Track quality - impact on hardware of different strategies Paola FTK meeting Performances on WH and Bs   2.Now we use all the layers.
Simulation Tasks  Understanding Tracking  Understanding Hardware 1.Two types of tasks: a.Implementing known functions in ATLAS framework b.Understanding.
FTK poster F. Crescioli Alberto Annovi
Trigger A front end chip for SLHC CMS strip tracker IN2P3 microelectronic Summer School Frejus, June 2011  Concept of Silicon strip Pt-module.
A new concept to use 3D vertical integration technology for fast pattern recognition Ted Liu, Jim Hoff, Grzegorz Deptuch, Ray Yarema Fermilab Questions.
1 Digital Active Pixel Array (DAPA) for Vertex and Tracking Silicon Systems PROJECT G.Bashindzhagyan 1, N.Korotkova 1, R.Roeder 2, Chr.Schmidt 3, N.Sinev.
CPT Week, April 2001Darin Acosta1 Status of the Next Generation CSC Track-Finder D.Acosta University of Florida.
Development of an ASIC for reading out CCDS at the vertex detector of the International Linear Collider Presenter: Peter Murray ASIC Design Group Science.
AMB HW LOW LEVEL SIMULATION VS HW OUTPUT G. Volpi, INFN Pisa.
G. Volpi - INFN Frascati ANIMMA Search for rare SM or predicted BSM processes push the colliders intensity to new frontiers Rare processes are overwhelmed.
Alberto AnnoviFTK meeting - September 30, 2004 Ideas for a Fast-Track trigger processor - FTK... an evolution of the CDF Silicon Vertex Trigger (SVT) A.
ATLAS Trigger Development
Richard E. Hughes 21 September 2003; p.1 IEEE/NSS 2003 Portland, OR eXtremely Fast Tracker; The Sequel Richard Hughes, Kevin Lannon Ben Kilminster, Brian.
1 FTK AUX Design Review Functionality & Specifications M. Shochet November 11, 2014AUX design review.
Raw Status Update Chips & Fabrics James Psota M.I.T. Computer Architecture Workshop 9/19/03.
A Fast Hardware Tracker for the ATLAS Trigger System A Fast Hardware Tracker for the ATLAS Trigger System Mark Neubauer 1, Laura Sartori 2 1 University.
Software for tests: AMB and LAMB configuration - Available tools FTK Workshop – Pisa 13/03/2013 Daniel Magalotti University of Modena and Reggio Emilia.
System Demonstrator: status & planning The system demonstrator starts as “vertical slice”: The vertical slice will grow to include all FTK functions, but.
AM chip schedule Alberto. Design activities (17/11/2010) Adapt JTAG and bounday scan to MPW chip Design new CAM cells, Buffer logic New logic majority.
Associative Memory design for the Fast Track processor (FTK) at Atlas I.Sacco (Scuola Superiore Sant’Anna) On behalf Amchip04 project (A. Annovi, M. Beretta,
Status of FTK & requests 2013 Paola Giannetti, INFN Pisa, for the FTK Group ATLAS Italia, Sep 5, 2012 Status of FTK work IMOU NEWS & Future steps TDR with.
PRM for AM06 Daniel Magalotti Collaboration between: KIT, INFN Pisa and INFN Perugia.
Future evolution of the Fast TracKer (FTK) processing unit C. Gentsos, Aristotle University of Thessaloniki FTK FP7-PEOPLE-2012-IAPP FTK executive.
Status of FTK Paola Giannetti, INFN Pisa, for the FTK Group ATLAS Italia, Fabruary 2, 2010 Status & Evolution of FTK (impact on Italian groups) Schedule.
Calliope-Louisa Sotiropoulou FTK: E RROR D ETECTION AND M ONITORING Aristotle University of Thessaloniki FTK WORKSHOP, ALEXANDROUPOLI: 10/03/2014.
Paola TDAQ FTK STATUS (valid for both Option A & B) Paola Giannetti for the FTK collaboration  Work done for each milestone since the TDAQ.
IAPP - FTK workshop – Pisa march, 2013 Marco Piendibene – University of Pisa & INFN FTK and the AM system.
Off-Detector Processing for Phase II Track Trigger Ulrich Heintz (Brown University) for U.H., M. Narain (Brown U) M. Johnson, R. Lipton (Fermilab) E. Hazen,
Status of FTK Paola Giannetti INFN Pisa for the FTK Group ATLAS Italia November 17, 2009.
FTK crates, power supplies and cooling issues 13/03/20131FTK-IAPP workshop - A. Lanza  Racks, crates and PS: requirements  Wiener crates  Rittal crates.
AMBFTK Report AMBFTK: problems to solve Power distribution: Crates – compatibility with CDF crates? Thermal dissipation: Cooling Signals I/O:
Alberto Stabile 1. Overview This presentation describes status of the research and development of main boards for the FTK project. We are working for.
New AMchip features Alberto Annovi INFN Frascati.
The AMchip on the AMBoard Saverio Citraro PhD Student University of Pisa & I.N.F.N. Pisa.
Firmware development for the AM Board
The Associative Memory Chip
IAPP - FTK workshop – Pisa march, 2013
FTK: update on progress, problems, need
D. Breton, S. Simion February 2012
The Associative Memory – AM = Bingo
FTK Update Approved by TDAQ in april
The Totem trigger architecture The LONEG firmware archtecture
More technical description:
LAMB: Hardware & Firmware
Project definition and organization milestones & work-plan
APSEL6D Architecture, simulations and results
AM system Status & Racks/crates issues
An online silicon detector tracker for the ATLAS upgrade
* Initialization (power-up, run)
2018/6/15 The Fast Tracker Real Time Processor and Its Impact on the Muon Isolation, Tau & b-Jet Online Selections at ATLAS Francesco Crescioli1 1University.
Pending technical issues and plans to address and solve
96-channel, 10-bit, 20 MSPS ADC board with Gb Ethernet optical output
Scheme for the large full custom cell
SLP1 design Christos Gentsos 9/4/2014.
Meeting at CERN March 2011.
Optical data transmission for
Overview of the ATLAS Fast Tracker (FTK) (daughter of the very successful CDF SVT) July 24, 2008 M. Shochet.
8 Input Layers (14b+2ctrl=16 b) = 8 coppie LVDS
CMS EMU TRIGGER ELECTRONICS
ATLAS L1Calo Phase2 Upgrade
Some basic ideas (not a solution)
eXtremely Fast Tracker; An Overview
FTK variable resolution pattern banks
14-BIT Custom ADC Board JParc-K Collaboration Meeting
A Fast Hardware Tracker for the ATLAS Trigger System
PID meeting Mechanical implementation Electronics architecture
SVT detector electronics
Data Concentrator Card and Test System for the CMS ECAL Readout
FTK Fibers Pixel & SCT RODS ATCA
Presentation transcript:

Outline The Pattern Matching and the Associative Memory (AM) Why more dense AM we get better it is Associative memory architecture How chips are put together: Lamb → AMboard → crate The Tree Search Processor & its location

The Event TRACKING WITH PATTERN MATCHING The Pattern Bank ...

The Associative Memory – AM = Bingo Dedicated device - maximum parallelism: Each pattern with private comparator Track search during detector readout Bingo scorecard Full custom 700 nm: 0,128 6L kpat/chip FPGA 350 nm: 0,128 6L kpat/chip standard cell 180 nm: 5,0 6L kpat/chip new for FTK 90 nm: ~60 8L kpat/chip new for FTK 65 nm: ~120 8L kpat/chip 2 Tiers 65 nm 2,5 D: 240 8L kpat/chip

A schematic drawing of the AM ONE PATTERN Layer 1 Layer 2 Layer 3 Layer 4 Cell 0 word FF word word word Cell 1 FF Output Bus Cell 2 FF Cell 3 FF HIT HIT HIT HIT

More powerful is the AM better it is. WHY? Tracking in 2 steps: find Roads first (Pattern Matching with Associative Memory, AM) then find Tracks inside Road (Fit by TF) Hits Associative Memory (AM) Data Organizer (DO) Hits Roads Hot point @high occupancy Super Strip (SS) Roads + hits Track Fitter (TF) Tracks parameters (d, pT, , h, z) Track fitting using full resolution of the detector Full Resolution Hits Large SS: a lot of fakes + combinatorics inside roads Road Road size: a parameter to balance the AM size & the DO-TF workload

Which banks we would like to have       What we have now: Standard Cell 180 mm pattern/chip for 6-layer patterns, 2500 pattern/chip for 12-layer patterns “A VLSI Processor for Fast Track Finding Based on Content Addressable Memories”, IEEE Transactions on Nuclear Science, Volume 53, Issue 4, Part 2, Aug. 2006 Page(s):2428 - 2433 90 nm technology provides a factor 4 → 10000 patterns/chip Full custom cell provides at least a factor 2 → 20000 patterns/chip 8 layers instead of 12 provides a factor 1,5 → 30000 patterns/chip 1,5 x 1,5 cm**2 2D chip → 60000 patterns/chip Going to 65 nm → 120000 patterns/chip With a 2 D chip we gain a factor 50! 1 AMboard: 128 chips → ~15 Mpatterns per board 1 Crate: 16 AMboard → ~245 Mpatterns per crate 100 MHz running clock NEXT: NEW VERSION For both L1 & L2

The CDF final AMchip architecture Pattern bank Add encoder kill Bus0[17:0] Bus1[17:0] Bus2[17:0] Bus3[17:0] Bus4[17:0] Bus5[17:0]

Power consumption Old Chip: corr. Factor 1,8 Watt 180 nm 1,8 V Core New chip 90 nm 1 V Core 1/(1,8*1,8) 0,56 Watt Frequency 40 MHz New chip 100 MHz 100/40 1,39 Watt Area 1x1 cm**2 New chip 4 cm**2 4/1 5,56 Watt New: Pre-match feature 1/3 (1/2) 1,85 (2,78) Watt Per crate 16 x 128 = 2048 chips 3,8 (5,7) kW IF the pre-match feature save at least 1/3, new 2D chip (1,85 W) ~ old chip (1,8 W) ANY OTHER IDEA TO GAIN IN POWER INCREASES THE POTENTIALITY TO GROW IN THE THIRD DIRECTION we would like to be 4 funding agencies involved:

LHC Schedule → Intermediate chip! 17,6 pile-up ev. @2.6 1033 19,0 pile-up ev. @ 1034 Sim with 75 pile-up events after 2020! Concentrate now on 2013-2015 (17-19 pile-up events) Consider evolution up to 2019 (41,5 pile-up events << simulated 75 ev) → Intermediate chip! 2020 comes much later and will profit of a very advanced technology……. 9 Annovi, 27-09-2010

Our Schedule TSMC 65 nm, low power, available as mini@sic (Vcc_core=1,2 V). 65 nm mini@sic 22,5 k€/block; 90 nm mini@sic 18,6 k€/block. "variable resolution" gives good results → early production of AM04 we missed the 90nm 2010 September run We propose to move directly to a 65 nm prototype. This is a preliminary schedule to produce new LAMBs for 2013: (1) Mini@sic submission: spring or october 2011. (2) delivery: ~february 2012 (3) tested ~June 2012 (4) MPW submission: from June 2012 (5) Delivery: from November 2012 (6) Tested: from February 2013 (7) MPW Production from February 2013 (8) Delivery from July 2013 (9) mounted on new Lambs from autumn 2013

Costs 2 blocks Mini@sic: payed by Italy MPW run: TSMC 2010: 12 mm^2 80 kUSD → 6,7 kUSD/mm^2 UMC 2010: 4 mm x 4 mm 70 k€ → 4,37 k € /mm^2 12 mm^2 ~ 1/8 AMchip03 area in CDF → 7500 patterns/chip → 960 kpatterns/AMBoard With 2 blocks 160 kUSD → ~2 Mpatterns/AMBoard In 2012 could cost less – Academia Sinica can help on prize. Italy – Germany – USA – Academia Sinica (reduction) . For 2013: small production = 8+2 AMBoards = 1280 chips. How many wafers? How much for a wafer? we would like to be 4 funding agencies, especially for final step: Whole wafer Mask @time when a large area chip is needed: UMC 2010 90 nm: 555 kUSD TSMC 2010 65 nm: 1300-900 kUSD TSMC 2010 65 nm MLM 650-950 kUSD

Packaging chips together in the LAMB add_in add_out Pipelines of AM chips AMchip Control = GLUE

AMTOP Bus0 Bus1 Bus3 Bus2 AMBOTTOM Bus5 Bus4 add_in add_out LAMB AM INDI AMTOP Bus0 Bus1 Bus3 Bus2 AMBOTTOM Bus5 Bus4 PAT_ADD_IN [17:0] PAT_ADD_OUT REV_EN add_in add_out LAMB

6 bus (108 bits!) GLUE AM INDI Four 8-chips (top-bottom) pipeline FPGA VME INTERFACE ROAD CONNECTOR AM INDI Four 8-chips (top-bottom) pipeline FPGA I/O control FIFOS TRACKs ADD OUT [30:0] RECEIVERs & PIPELINE LAMB DRIVERs REGISTERs CONNECTORs (ROAD bus + CONNECTOR 6 HIT buses) HIT [17:0] HIT

Packaging LAMBs together in the AMBoard Complementary Functions in the AUX board Standard cell chip LAMB Control FPGA FPGA for Roads 40 MHz clock FPGA for SS Input P3 serial LVDS CDF AMBoard with 4 LAMBs FTK AMBoard 16 AMBoards per “core” crate → 8 core crates in the system

LAMB Processing Unit Packaging AMBoards inside a crate We need 8 of AM0+TSP+DO+TF+HW CPU vme AM1+TSP+DO+TF+HW AM2+TSP+DO+TF+HW AM3+TSP+DO+TF+HW AM4+TSP+DO+TF+HW AM5+TSP+DO+TF+HW AM6+TSP+DO+TF+HW AM7+TSP+DO+TF+HW 11LayFit+HW AM10+….. AM11+….. AM12+….. AM13+…… AM14+….. AM8+….. AM9+…... 11LayFit+ HW final AM15+….. Packaging AMBoards inside a crate Processing Unit AUX card Hits LVDS Cables Connectors for DO+TF+HW HW TF DO INPUT FIFOs tracks output Interface SSMAP LAMB Standard cell chip 40 MHz clock FPGA for Roads AMBoard P3 serial LVDS Control SS Input We need 8 of such crates Why?

The whole system: Data Formatter + 8 core crates Track data ROB Raw data ROBs ~Offline quality Track parameters Pixels & SCT 50~100 KHz event rate RODs S-links Core Crate HITS Data Formatter (DF) cluster finding split by layer overlap regions 8x h-f towers DO T F AM brd HW Second stage

6-12 Logical Layers: full h coverage FTK: 8 processors working in parallel because of Input bandwidth IEEE Trans. Nucl. Sci. 51, 391 (2004) 1/2 f AM Divide into f sectors with overlaps 1/2 f AM Pixel barrel SCT barrel Pixel disks 6-12 Logical Layers: full h coverage 6 18-bit buses, hit rate: 40MHz/bus input bandwidth of 4 Gbit/s Goal: High Lum 8 f sectors 8 9U VME crates for the FTK core Overlaps require hits in a small region to be sent to two neighboring AMs

Whatever is the power of the AM we can build, we can do better with the TSP

The Tree Search Processor (TSP): Binary search to go down to better SS resolutions FAT ROAD Found by AM (default SS for example) Depth 0 Depth 1 Depth 2 PATTERN BLOCK PARENT 1 2 3 4 5 6 7 8 Algorithm: NIM A287 (1990) 436-438 http://www.pi.infn.it/~paola/Tree_search_algorithm.pdf Tree Search Processor: NIM A 287, 431 (1990), http://www.pi.infn.it/~orso/ftk/NIMA287_431.pdf IEEE Toronto, Canada, November 8-14 1998 http://www.pi.infn.it/~paola/TSP_v14.pdf THIN ROAD 1 2 3 4

Example: 2-Level TSP → divide by 4 each SS Higher resolution SS (sub-ss) to be stored in AM or into a Mini-DO & LSB bits should be provided to TSP Example: 2-Level TSP → divide by 4 each SS The AM chip for each found road could provide: The Road IDentifier (address) The Bitmap : one bit per layer, saying which SSs are empty & which are full (11 bits: 11101111111 eg.) 4 more bits for each layer, Sub-SS, saying which of the 4 SS subdivisions are empty and which are full (4 bits  8 Layers).

Conclusions The application at future Instantaneus Luminosities will require AM extremely performing Even if extremely performing, the AM work could be refined by the TSP that could fit in the same package with the AM chip in a 2.5 D technology. This actually is NOT true any more, probably, before 2020 The AM could be used for both L1 and L2 applications Any AM pattern capacity increase would be an important advantage for both L1 and L2 tracking systems

BACKUP

LAMB The FTK CHALLENGING PART: the NEW AMCHIP & the TSP +TSP? Standard cell chip 40 MHz clock FPGA +TSP? Where we can stack the TSP? In the AUX board just after the AMBoard? In the AMBoard itself? In the Lamb to reduce early the # of roads? Even better in the AMchip 2.5 D!