Presentation is loading. Please wait.

Presentation is loading. Please wait.

Outline The Pattern Matching and the Associative Memory (AM)

Similar presentations


Presentation on theme: "Outline The Pattern Matching and the Associative Memory (AM)"— Presentation transcript:

1 Outline The Pattern Matching and the Associative Memory (AM)
Why more dense AM we get better it is Associative memory architecture How chips are put together: Lamb → AMboard → crate The Tree Search Processor & its location

2 The Event TRACKING WITH PATTERN MATCHING The Pattern Bank ...

3 The Associative Memory – AM = Bingo
Dedicated device - maximum parallelism: Each pattern with private comparator Track search during detector readout Bingo scorecard Full custom nm: 0,128 6L kpat/chip FPGA nm: 0,128 6L kpat/chip standard cell nm: 5, L kpat/chip new for FTK nm: ~ L kpat/chip new for FTK nm: ~120 8L kpat/chip 2 Tiers nm 2,5 D: L kpat/chip

4 A schematic drawing of the AM
ONE PATTERN Layer 1 Layer 2 Layer 3 Layer 4 Cell 0 word FF word word word Cell 1 FF Output Bus Cell 2 FF Cell 3 FF HIT HIT HIT HIT

5 More powerful is the AM better it is. WHY?
Tracking in 2 steps: find Roads first (Pattern Matching with Associative Memory, AM) then find Tracks inside Road (Fit by TF) Hits Associative Memory (AM) Data Organizer (DO) Hits Roads Hot point @high occupancy Super Strip (SS) Roads + hits Track Fitter (TF) Tracks parameters (d, pT, , h, z) Track fitting using full resolution of the detector Full Resolution Hits Large SS: a lot of fakes + combinatorics inside roads Road Road size: a parameter to balance the AM size & the DO-TF workload

6 Which banks we would like to have
    What we have now: Standard Cell mm pattern/chip for 6-layer patterns, 2500 pattern/chip for 12-layer patterns “A VLSI Processor for Fast Track Finding Based on Content Addressable Memories”, IEEE Transactions on Nuclear Science, Volume 53, Issue 4, Part 2, Aug Page(s): 90 nm technology provides a factor → patterns/chip Full custom cell provides at least a factor 2 → patterns/chip 8 layers instead of 12 provides a factor 1,5 → patterns/chip 1,5 x 1,5 cm**2 2D chip → patterns/chip Going to 65 nm → patterns/chip With a 2 D chip we gain a factor 50! 1 AMboard: 128 chips → ~15 Mpatterns per board 1 Crate: 16 AMboard → ~245 Mpatterns per crate 100 MHz running clock NEXT: NEW VERSION For both L1 & L2

7 The CDF final AMchip architecture
Pattern bank Add encoder kill Bus0[17:0] Bus1[17:0] Bus2[17:0] Bus3[17:0] Bus4[17:0] Bus5[17:0]

8 Power consumption Old Chip: corr. Factor 1,8 Watt 180 nm 1,8 V Core
New chip 90 nm 1 V Core 1/(1,8*1,8) 0,56 Watt Frequency 40 MHz New chip 100 MHz 100/40 1,39 Watt Area 1x1 cm**2 New chip 4 cm**2 4/1 5,56 Watt New: Pre-match feature 1/3 (1/2) 1,85 (2,78) Watt Per crate 16 x 128 = 2048 chips 3,8 (5,7) kW IF the pre-match feature save at least 1/3, new 2D chip (1,85 W) ~ old chip (1,8 W) ANY OTHER IDEA TO GAIN IN POWER INCREASES THE POTENTIALITY TO GROW IN THE THIRD DIRECTION we would like to be 4 funding agencies involved:

9 LHC Schedule → Intermediate chip!
17,6 pile-up ev. @ 19,0 pile-up ev. @ 1034 Sim with 75 pile-up events after 2020! Concentrate now on (17-19 pile-up events) Consider evolution up to 2019 (41,5 pile-up events << simulated 75 ev) → Intermediate chip! 2020 comes much later and will profit of a very advanced technology……. 9 Annovi,

10 Our Schedule TSMC 65 nm, low power, available as (Vcc_core=1,2 V). 65 nm 22,5 k€/block; 90 nm 18,6 k€/block. "variable resolution" gives good results → early production of AM04 we missed the 90nm September run We propose to move directly to a 65 nm prototype. This is a preliminary schedule to produce new LAMBs for 2013: (1) submission: spring or october 2011. (2) delivery: ~february 2012 (3) tested ~June 2012 (4) MPW submission: from June 2012 (5) Delivery: from November 2012 (6) Tested: from February 2013 (7) MPW Production from February 2013 (8) Delivery from July 2013 (9) mounted on new Lambs from autumn 2013

11 Costs 2 blocks Mini@sic: payed by Italy MPW run:
TSMC 2010: 12 mm^ kUSD → 6,7 kUSD/mm^2 UMC : 4 mm x 4 mm 70 k€ → 4,37 k € /mm^2 12 mm^2 ~ 1/8 AMchip03 area in CDF → 7500 patterns/chip → 960 kpatterns/AMBoard With 2 blocks kUSD → ~2 Mpatterns/AMBoard In 2012 could cost less – Academia Sinica can help on prize. Italy – Germany – USA – Academia Sinica (reduction) . For 2013: small production = 8+2 AMBoards = 1280 chips. How many wafers? How much for a wafer? we would like to be 4 funding agencies, especially for final step: Whole wafer when a large area chip is needed: UMC nm: kUSD TSMC nm: kUSD TSMC nm MLM kUSD

12 Packaging chips together in the LAMB
add_in add_out Pipelines of AM chips AMchip Control = GLUE

13 AMTOP Bus0 Bus1 Bus3 Bus2 AMBOTTOM Bus5 Bus4 add_in add_out LAMB AM
INDI AMTOP Bus0 Bus1 Bus3 Bus2 AMBOTTOM Bus5 Bus4 PAT_ADD_IN [17:0] PAT_ADD_OUT REV_EN add_in add_out LAMB

14 6 bus (108 bits!) GLUE AM INDI Four 8-chips (top-bottom) pipeline FPGA
VME INTERFACE ROAD CONNECTOR AM INDI Four 8-chips (top-bottom) pipeline FPGA I/O control FIFOS TRACKs ADD OUT [30:0] RECEIVERs & PIPELINE LAMB DRIVERs REGISTERs CONNECTORs (ROAD bus + CONNECTOR 6 HIT buses) HIT [17:0] HIT

15 Packaging LAMBs together in the AMBoard
Complementary Functions in the AUX board Standard cell chip LAMB Control FPGA FPGA for Roads 40 MHz clock FPGA for SS Input P3 serial LVDS CDF AMBoard with 4 LAMBs FTK AMBoard 16 AMBoards per “core” crate → 8 core crates in the system

16 LAMB Processing Unit Packaging AMBoards inside a crate We need 8 of
AM0+TSP+DO+TF+HW CPU vme AM1+TSP+DO+TF+HW AM2+TSP+DO+TF+HW AM3+TSP+DO+TF+HW AM4+TSP+DO+TF+HW AM5+TSP+DO+TF+HW AM6+TSP+DO+TF+HW AM7+TSP+DO+TF+HW 11LayFit+HW AM10+….. AM11+….. AM12+….. AM13+…… AM14+….. AM8+….. AM9+…... 11LayFit+ HW final AM15+….. Packaging AMBoards inside a crate Processing Unit AUX card Hits LVDS Cables Connectors for DO+TF+HW HW TF DO INPUT FIFOs tracks output Interface SSMAP LAMB Standard cell chip 40 MHz clock FPGA for Roads AMBoard P3 serial LVDS Control SS Input We need 8 of such crates Why?

17 The whole system: Data Formatter + 8 core crates
Track data ROB Raw data ROBs ~Offline quality Track parameters Pixels & SCT 50~100 KHz event rate RODs S-links Core Crate HITS Data Formatter (DF) cluster finding split by layer overlap regions 8x h-f towers DO T F AM brd HW Second stage

18 6-12 Logical Layers: full h coverage
FTK: 8 processors working in parallel because of Input bandwidth IEEE Trans. Nucl. Sci. 51, 391 (2004) 1/2 f AM Divide into f sectors with overlaps 1/2 f AM Pixel barrel SCT barrel Pixel disks 6-12 Logical Layers: full h coverage 6 18-bit buses, hit rate: 40MHz/bus input bandwidth of 4 Gbit/s Goal: High Lum 8 f sectors 8 9U VME crates for the FTK core Overlaps require hits in a small region to be sent to two neighboring AMs

19 Whatever is the power of the AM we can build,
we can do better with the TSP

20 The Tree Search Processor (TSP): Binary search to go down to better SS resolutions
FAT ROAD Found by AM (default SS for example) Depth 0 Depth 1 Depth 2 PATTERN BLOCK PARENT 1 2 3 4 5 6 7 8 Algorithm: NIM A287 (1990) Tree Search Processor: NIM A 287, 431 (1990), IEEE Toronto, Canada, November THIN ROAD 1 2 3 4

21 Example: 2-Level TSP → divide by 4 each SS
Higher resolution SS (sub-ss) to be stored in AM or into a Mini-DO & LSB bits should be provided to TSP Example: 2-Level TSP → divide by 4 each SS The AM chip for each found road could provide: The Road IDentifier (address) The Bitmap : one bit per layer, saying which SSs are empty & which are full (11 bits: eg.) 4 more bits for each layer, Sub-SS, saying which of the 4 SS subdivisions are empty and which are full (4 bits  8 Layers).

22 Conclusions The application at future Instantaneus Luminosities will require AM extremely performing Even if extremely performing, the AM work could be refined by the TSP that could fit in the same package with the AM chip in a 2.5 D technology. This actually is NOT true any more, probably, before 2020 The AM could be used for both L1 and L2 applications Any AM pattern capacity increase would be an important advantage for both L1 and L2 tracking systems

23 BACKUP

24 LAMB The FTK CHALLENGING PART: the NEW AMCHIP & the TSP +TSP?
Standard cell chip 40 MHz clock FPGA +TSP? Where we can stack the TSP? In the AUX board just after the AMBoard? In the AMBoard itself? In the Lamb to reduce early the # of roads? Even better in the AMchip 2.5 D!


Download ppt "Outline The Pattern Matching and the Associative Memory (AM)"

Similar presentations


Ads by Google