FTK Update Approved by TDAQ in april Ongoing activity: AMchip, AMboard, TSP, Simulation Next future: mezzanine for 2-D clustering, Vertical Slice USA funds
AMCHIP – ordered a miniasic - 20 keuro June or september run We are working (L.Sartori, Pisa+ M.Beretta, E. Bossini, Frascati, +Crescioli, Sacco Pisa) at the 2010 mini-asic prototype provided of full custom cell (M. Beretta Frascati) with 2 main goals: Area reduction to obtain higher pattern densities Power consumption reduction to be able to use large silicon areas Pattern Density/power with respect the CDF chip: 90 nm against 180 nm → factor ~4 for area and power consumption reduction (V90/V180)**2 Full custom cell: a factor 2 gain for both area and consumption. Future collaboration with Fermilab (Ted Liu and Ray Yarema group) to stack 2 or 4 tiers with 3D technology
The AMchip - 90 nm miniasic What we have now: Standard Cell 180 mm pattern/chip for 6-layer patterns, 2500 pattern/chip for 12-layer patterns “A VLSI Processor for Fast Track Finding Based on Content Addressable Memories”, IEEE Transactions on Nuclear Science, Volume 53, Issue 4, Part 2, Aug. 2006 Page(s):2428 - 2433 90 nm technology provides a factor 4 → 10000 patterns/chip Full custom cell provides at least a factor 2 → 20000 patterns/chip 8 layers instead of 12 provides a factor 1,5 → 30000 patterns/chip 1,5 x 1,5 cm**2 2D chip = 2 Tiers 1x1 cm**2 → 60000 patterns/chip With a 2 D chip we gain a factor 25! If 2 Tiers of 1,5x1,5 or 4 Tiers 1x1 → 120000 patterns/chip With a 2 Tier - 2.5 D chip we gain a factor 50! 65 nm?? Available as miniasic! 100 MHz running clock NEXT: NEW VERSION For both L1 & L2
90 nm TODAY L.Sartori M.Beretta E. Bossini F. Crescioli I. Sacco 180 nm
MAY: defining a collaboration Italy-USA for DOE application to FERMILAB - tomorrow MAY: defining a collaboration Italy-USA for DOE application to Generic R&D funds (ATLAS FTK - Fermilab CMS, both interested)
LAMB cooling AMboard: Pisa-Milano Board Power consumption: 230 W @ 1,8 V today @ 1V Phase 1 → 128 A today → 230 A Phase 1 cooling Fast and dense connectors Thin powerful - FPGAs Standard cell chip LAMB Control FPGA FPGA for Roads 40 MHz clock FPGA for SS Input P3 serial LVDS FTK AMBoard CDF AMBoard with 4 LAMBs 16 AMBoards per “core” crate → 8 core crates in the system
Whatever is the power of the AM we can build, we can do better complementing the AM with a TSP Milano: starting from an existing mezzanine will build a TSP Prototype to test its performances and extrapolate its costs
The Tree Search Processor (TSP): Binary search to go down to better SS resolutions FAT ROAD Found by AM (default SS for example) Depth 0 Depth 1 Depth 2 PATTERN BLOCK PARENT 1 2 3 4 5 6 7 8 Algorithm: NIM A287 (1990) 436-438 http://www.pi.infn.it/~paola/Tree_search_algorithm.pdf Tree Search Processor: NIM A 287, 431 (1990), http://www.pi.infn.it/~orso/ftk/NIMA287_431.pdf IEEE Toronto, Canada, November 8-14 1998 http://www.pi.infn.it/~paola/TSP_v14.pdf THIN ROAD 1 2 3 4
Tests: the Vertical Slice starting next June: Bologna (EDRO)-Pisa (AMBoard) when ready Frascati (pixel clustering) – Milano (TSP) - Pavia (AMboard and software) Pixel clust. Fibers PC + Pseudo Hola Pixels Fired channels DO Roads hits S-link SLink to PC
Simulation activity Pisa-Frascati a lot of space for new people Efficiency-tracking improvements and studies Efforts to reduce the # of matched roads, optimize banks Algorithm Improvements Studies Comparison of 11L, Option B and Option A - Select baseline design Description and study of the evolution of the system from 1033 up to 3×1034 and above. Trigger strategy developments Integrate FTKSim into ATHENA Compare simulation with real data
Fondi USA Spostati fondi da Upgrade LHC a R&D generico → applicheremo per AMchip DOE: deve finanziare costi di sviluppo con urgenza (lavoro deve partire a giugno) - fondi limitati ma FTK ad alta priorita’ con IBL – sono ottimisti Produzione: applicheranno questo anno a NSF per fondi MRI, per essere finanziati l’anno prossimo.
Conclusions The application at future Instantaneus Luminosities will require AM extremely performing – R&D ongoing Even if extremely performing, the AM work could be refined by the TSP that could fit in the same package with the AM chip in a 2.5 D technology. Milan expressed its interest on TSP R&D The Vertical Slice will join together the Bologna-Frascati-Milan-Pavia-Pisa efforts A lot of work has to be done in the simulation area, included data analysis in the FTK areas of interest.
BACKUP
Example: 2-Level TSP → divide by 4 each SS Higher resolution SS (sub-ss) to be stored in AM or into a Mini-DO & LSB bits should be provided to TSP Example: 2-Level TSP → divide by 4 each SS The AM chip for each found road could provide: The Road IDentifier (address) The Bitmap : one bit per layer, saying which SSs are empty & which are full (11 bits: 11101111111 eg.) 4 more bits for each layer, Sub-SS, saying which of the 4 SS subdivisions are empty and which are full (4 bits 8 Layers).
A new idea that could have a large impact: variable precision patterns # of Kids (combinations of high resolution SS) 258 = 28 if only 1, 2 or 3 over possible 256 kids are good track → It is convenient to insert high resolution patterns into the AM Parent pattern with 1 or 2 or 3 kids # of patterns do not increase very much Probability to fire as fake road is smaller Parent pattern with more kids # of patterns do increase if we go deeper Probability to fire as fake is ~ the same SS will have enough bits to do the matching of all patterns at high resolution, but the less significant bits can be set as DON’T CARE and not participate to the matching (the kids are not loaded into AM if the DON’T CARE is set).
6 bus (108 bits!) GLUE AM INDI Four 8-chips (top-bottom) pipeline FPGA VME INTERFACE ROAD CONNECTOR AM INDI Four 8-chips (top-bottom) pipeline FPGA I/O control FIFOS TRACKs ADD OUT [30:0] RECEIVERs & PIPELINE LAMB DRIVERs REGISTERs CONNECTORs (ROAD bus + CONNECTOR 6 HIT buses) HIT [17:0] HIT
The CDF final AMchip architecture Pattern bank Add encoder kill Bus0[17:0] Bus1[17:0] Bus2[17:0] Bus3[17:0] Bus4[17:0] Bus5[17:0]
Power consumption Old Chip: corr. Factor 1,8 Watt 180 nm 1,8 V Core New chip 90 nm 1 V Core 1/(1,8*1,8) 0,56 Watt Frequency 40 MHz New chip 100 MHz 100/40 1,39 Watt Area 1x1 cm**2 New chip 4 cm**2 4/1 5,56 Watt New: Pre-match feature 1/3 (1/2) 1,85 (2,78) Watt Per crate 16 x 128 = 2048 chips 3,8 (5,7) kW IF the pre-match feature save at least 1/3, new 2D chip (1,85 W) ~ old chip (1,8 W) ANY OTHER IDEA TO GAIN IN POWER INCREASES THE POTENTIALITY TO GROW IN THE THIRD DIRECTION
A schematic drawing of the AM ONE PATTERN Layer 1 Layer 2 Layer 3 Layer 4 Cell 0 word FF word word word Cell 1 FF Output Bus Cell 2 FF Cell 3 FF HIT HIT HIT HIT
More powerful is the AM better it is. WHY? Tracking in 2 steps: find Roads first (Pattern Matching with Associative Memory, AM) then find Tracks inside Road (Fit by TF) Hits Associative Memory (AM) Data Organizer (DO) Hits Roads Hot point @high occupancy Super Strip (SS) Roads + hits Track Fitter (TF) Tracks parameters (d, pT, , h, z) Track fitting using full resolution of the detector Full Resolution Hits Large SS: a lot of fakes + combinatorics inside roads Road Road size: a parameter to balance the AM size & the DO-TF workload
The whole system: Data Formatter + 8 core crates Track data ROB Raw data ROBs ~Offline quality Track parameters Pixels & SCT 50~100 KHz event rate RODs S-links Core Crate HITS Data Formatter (DF) cluster finding split by layer overlap regions 8x h-f towers DO T F AM brd HW Second stage