1 Olav Torheim University of Bergen / IPHC Conceptual design of a 3D integrated pixel detector for the CBM MVD Is it possible to run the experiment at.

1 Olav Torheim University of Bergen / IPHC Conceptual design of a 3D integrated pixel detector for the CBM MVD Is it possible to run the experiment at the full interaction rate? 10^9 heavy ions passing through a 1% interaction target  10^7 collisions + delta electrons of 10^9 heavy ions. 10^6 collisions/sec considered sufficient for open charm physics.

2 Simulation results of first MVD station, 5 cm from target Close to beam pipe, hit density dominated by delta-electrons knocked out of target (Figure shows results with 2 us integration time).

3 Arrangement of sensors The sensors are arranged to point away from the beam pipe, with only a small part of the sensor covering the hottest areas close to the beam pipe.

4 Imposed constraints with 10^6 collisions/sec. and 2 us integration time: Necessary with a sensor architecture that in average can process and transfer hits from 3.35 particles/sq.mm/frame, while also being capable of handling a twice as large hit density (7.2 particles/sq.mm/frame) in smaller segments of the chip.

5 Requirements on spatial resolution To reconstruct the secondary decay vertex of open charm particles:  50 micrometers resolution required to distinguish from background  5 micrometers resolution required for detectors of 1 st MVD. Pitch < 18 micrometers imposed for pure binary resolution. Slightly larger pitch allowed due to charge sharing.

6 Requirements on time resolution High level trigger of CBM utilizes STS to provide initial tracks. Trajectory of these tracks are refined using MVD. To separate the tracks, the MVD should not have occupancy above one percent. Pixel pitch: 18 micrometers Frame readout time: 2 microseconds => Expected occupancy in tightest hotspots: (Multiplicity of 5 is assumed for standard MAPS, will be smaller for high-resistivity MAPS). => 2 us frame readout time: 1.16% occupancy in the hotspots.

7 Requirements on radiation hardness Non-ionizing: At the border of the beam hole of the detector stations at 5 cm distance from the target, there will be up to 2 · 10^15 neq /cm^2 per year. Collision rate reduced to 10^6: Fluence of 2 · 10^14 neq /cm^2. XFAB detectors have excellent charge collection efficiency all the way up to 10^13 neq/cm^2 before starting to slightly deteriorate (C. Hu, Vertex-2009). Should be possible to operate high- res. MAPS up to 10^14 neq/cm^2. Landau MP (in electrons) versus cluster size 0 n eq /cm² 0.3 x 10 13 n eq /cm² 1.3 x 10 13 n eq /cm² 3 x 10 13 n eq /cm² To compare: «standard» non-depleted EPI substrate: MIMOSA15 Pitch=20µm, before and after 5.8x10 12 n eq /cm 2 saturation -> >90 % of charge is collected is 3 pixels -> very low charge spread for depleted substrate

8 Requirements on radiation hardness Ionizing: Hottest MVD regions  TID < 340 MRad. Chartered wafers, analog and digital: Can withstand doses up to 400 MRad (Godiot et al, "3D electronics for hybrid pixel detectors", TWEPP-09).

9 Requirements on power consumption The power consumption of the detectors should not exceed 2W/cm^2. Power saving approach: Rolling shutter. Only one line in a MAPS sensor is activated at each time, thus saving the power that is otherwise consumed in the amplification circuitry. To meet a given timing resolution, the number of lines processed by the rolling shutter cannot exceed:

10 Requirements on power consumption First 3D chip with rolling shutter operation, designed by Yavuz Degerli at IRFU. Shared column discriminators => single discriminator inside every pixel cell. Line processing time: 200 ns => 50 ns. Power consumption: 500 uW/pixel => 100 uW/pixel. It is expected that newer versions of the architecture can be optimized to reach a line processing time of 40 ns. To improve the time resolution further, the rolling shutter chip can be split into segments.

11 Conclusion on requirements Recently new technology like 3D integration and sensors with fully depleted epi opens the perspective of running the CBM at a collision rate of 10^6 collisions per second. A conceptual design for a detector based on this technology, and targeted for CBM at 10^6 collisions per second, is therefore to be presented: Detector tier: XFAB 0.6, bonded to amplifier tier with Ziptronix DBI Analog amplifier tier: Tezzaron/Chartered Digital tiers: Tezzaron/Chartered

12 3D-integration – two approaches -Folding of electronics: Starting point is pixel detector with readout electronics in periphery (MIMOSA26) Readout electronics moves to top of the active surface to remove dead area. -Completely new architecture Fully utilizes the new degrees of freedom given by 3D integration. -Last approach has been followed.

13 Starting point – Pixel matrix of 500 rows x 1024 columns -3D integrated rolling shutter chip divides into subframes to meet timing req. -2 us time resolution  Height limited to 2 us / 40 ns = 50.  Pixel sensor of 500 lines divides into 10 submatrices.

14 Starting point – rolling shutter with 40 ns line processing time What to place on top of the rolling shutter?

15 Data reduction – Clusterization or state encoding? -Standard MAPS: Assumed cluster multiplicity of five, with 3 states necessary to code a regular cluster. Beneficial compared to state encoding that for the same cluster requires three states. -Fully depleted MAPS: Much smaller clusters assumed. Test results for MIMOSA25 shows avg. multiplicity of four, and an avg. number of lines per cluster of 2. NB! Data analysis performed with low cut of 80 electrons for seed pixels and 40 electrons for surrounding pixels. For online data processing, there is only one threshold. => Even smaller clusters! If avg. number of lines per cluster becomes 1.5, the amount of data due to state encoding and the amount of data due to clusterization becomes exactly the same. Conclusion: We want state encoding, not clusterization.

16 Rolling-shutter compliant token injection scheme -Tokens enter in parallell into each row of one half - Hits are injected sequentially into other half.

17 Distributed row-wise zero suppression (state encoding) -Each pixel communicates with its nearest neighbours to identify and encode states:

18 Token passing between pixels in every line -Pixels identifying themselves as central pixels in a state may grab a token to transfer hit information: Condition for being central: Hit in Pixel N while no hit in Pixel N-1 (although slightly modified to avoid redundant states).

19 Token passing between pixels in every line -Pixels holding the token put address and data on a shared line bus (multiplexed): The hit information placed at the bus is stored in buffers at the end of the line).

20 With the above mentioned structures, an abstract view of a pixel cell with its pinning will look like the following: Pixel cells successfully placed and routed on pitches of 18.45 and 14.76 um.

21 How to read out the column buffers? A common state machine is responsible for readout out the upper and lower half of the matrix and to provide the two halves with their appropriate clocks and control signals:

22 Dividing into line segments: -With 2 us frame readout time, there is 1 us for the token to fly through a line. To meet the timing requirements, it proposed to split every line into 4 line segments, with a separate token injected into every line segment.

23 Distributed state encoding and token passing between pixels in every line At end of every line segment, buffers store the hit information placed on data bus: Structure similar to the end-of-column buffer of ATLAS FE-I3, but: Where the ATLAS FE-I3 has the end-of-column buffers in the periphery, we will thanks to 3D integration be able to put end-of-line buffers on top to avoid dead area.

24 How many buffers per line segment? For a given average number of hits, the cumulative Poisson distribution can be used to find the maximum number of hits: We want to arrive at the maximum number of hits per line segment that corresponds to a probability less than 10^-3!

25 How many buffers per line segment? -Design entire chip for average of 3.35 hits/mm^2/frame, -Design each line segment for covering a hotspot of up to 7.2 hits/mm^2/frame. Pitch of 18 um and segment width of 256 pixels gives segment area of: ((18*10^-6)* (18*10^-6)*256) / 10^-6 = 0.083 square millimeters. 7.2 hits/mm^2/frame gives hit density of: Hit_density*line_segment_area*frame_readout_time = (7.2)*0.083= 0.60 hits/segment/frame.  4 hits to take into account for probability of missing hits less than 10^-3 (using cumulative Poisson distribution function) Assuming 2 states per hit, 8 buffers are required for each line segment!

26 Basic building blocks of the line buffers For every token clock cycle, the content on line data bus is stored into leftmost buffer, with the content of all the buffers shifted from one buffer to the next. After 8 token clocks, up to 8 valid states have been stored in each their buffer. A hit counter increments the number of valid hits as long as the token injected into the line has not been returned.

27 How to read out the column buffers? The columns buffers are implemented as scan flip-flops, and their content is shifted from line to line during readout. Pipeline advances when all states in a line are read => No limitation in number of states per line, only max. limit of states per frame.

28 Operating frequency of the readout FSM Depends on the number of states to read and the number of states selected per clock cycle. Assuming pixel pitch of 18 micrometers and a matrix size of 1024*512, the matrix size in square millimeters becomes: ((18*10^-6)* (18*10^-6)*1024*512) / 10^-6 = 169.87 square millimeters. 3.35 particles/mm^2/frame: (hit density/frame)*matrix_area = (3.35)*169.87 = 3.5*170*2= 569 hits per chip per frame. For a half submatrix, this number translates into an average of 28.45 hits, which for hit miss probability below 10^-3 expands to 47 hits => 94 states. In addition comes 25 line addresses, in total 119 words to process.

29 Operating frequency of the readout FSM A relatively high clock frequency may be allowed for the FSM, as this frequency does not affect the number of clock pulses given to the flip flops of the line buffers, which will always remain 25 per submatrix during 2 us: In total 119 words must be transferred during 1 us, so if two words are selected at each clock cycle, then the clock frequency of the FSM can be set to 70 MHz. With four words being selected, the clock frequency can be relaxed to 35 MHz.

30 Floorplanning and layout An overview of the line buffer organization in a submatrix is shown in the figure below. By varying the direction of the token, one has some degree of freedom on where to locate the buffers:

31 Floorplanning and layout With the location of the column buffers put into place, it is possible to locate the other necessary modules:

32 3D stack for the entire architecture With the digital pixel in one tier and the line buffer, readout FSM and memory together in another tier:

33 Power consumption Pixel cells: As long as there are no hits, there are no signals switching in the pixel matrix, except the 8 token clocks sent to the token flip flops during 2 us. Line buffers: 8 token clocks and 25 scan clocks are provided during 2 us. => Should imply low power consumption, even with high logic density.

34 Power consumption Estimations from Encounter, power analysis based on VCD files from simulation: Pixel cells: 80 mW. Line buffer registers: = 138 mW Line buffer counters: = 6 mW In total 224 mW. => Should imply low power consumption, even with high logic density.

35 Number of required serial transmission links Typical delay for Chartered/Artisan standard IO cell : 2 ns  500 MHz.  8b10b serial transmission  100 bytes (50 words) in 2 us. 569 hits per chip in average => 1138 states per chip in average 1138 states + 500 line addresses => 1638 words to transmit. (Assume infinitely large FIFO => data rate corresponding to avg. hit rate) 1638/50 = 33 serial links (theoretical minimum).

36 Number of required serial transmission links Using a separate FIFO for each submatrix, and modeling the FIFO as an M/D/1 queue, one arrives at a necessary FIFO depth of 62 words to have 4 serial links. Sufficiently small for register file => No memory necessary.

37 Scalability issues 8 buffers per line segment sufficient to meet simulated CBM requirements, BUT… Given the timing constraints, one can increase to: 4 segments of 18 buffers => Tolerated line hit densities of 28.12 hits/mm^2/frame. 8 segments of 32 buffers => Tolerated line hit densities of 133.18 hits/mm^2/frame. Balance buffers and readout FSM optimally to: Cover expected density in entire chip Cover worst hotspots in part of chip Very high hit densities in small segments of the chip can be covered by increasing the number of buffers and segments, while the readout FSM continues to run at a relaxed frequency.

38 A folded SUZE (MIMOSA26) on top of the rolling shutter 3D rolling shutter has discriminator in every pixel. Sequential sparsified readout requires shared discriminator bus for each column.  Preferentially in the rolling shutter chip, or separate tier consacred. Only 50 lines per submatrix  Possible return to shared discriminator solution without too much performance loss. Preferentially, all in one tier, but: Also possible with one tier for SDS and MUX and another for memory and data transmission (due to sliced circuit shape).

39 A folded SUZE on top of the rolling shutter Line processing time: 200 ns  40 ns SUZE system clock: 80 MHz  400 MHz! Possible, but design challenge to meet timing requirements.

40 Area estimations Height of a submatrix with 50 lines: 18 um * 50 = 900 micrometers. Preliminary layout results, Chartered process with Artisan standard cells: SDS, MUX and memory management give total height of 530 micrometers. Dualport-memory of depth 512 has a footprint with heigh of 156.4 um and width of 799.5 um.  Total height: 687 micrometers.  Possible to fit SUZE on top a submatrix.

41 Dimensioning PLA and multiplexer for CBM Calculations show that it is necessary to take into account for the… Sparse scan: 3 hits (6 states, dimensioned for hot segment of 7.2 hits/sq.mm./sec) Multiplexer: 6 hits (12 states, dimensioned for hot line of 3.65 hits/sq.mm./sec) SDS structure already dimensioned for 6 states. Multiplexer must increase to account for 12 states.

42 Dimensioning the memory interfaces For the memory, up to 12 states and 1 line address are to be stored during one line. To store 13 words in 2 parallell memories during 40 ns:  Implement memory interface with 7 write operations / line ( >180 MHz). OR implement three memories and continue using 6-cycle CKREADPIX to write up to 6x3 words to memory.

43 Estimations on power consumption Rough estimations from the Encounter Power Analysis tool, assuming: 150 MHz CKREADPIX (6 cycles / 40 ns) Switching rate of 0.1 for SDS (most inputs without hits) Default switching rate of 0.2 for MUX and memory management: SDS: 64 x 0.30 mW = 19.2 mW MUX: 11.97 mW MEM_MAN: 11.14 mW Total power consumption / submatrix: 19.2 mW + 11.97 mW + 11.14 mW = 42.31 mW. Total power consumption / chip: 10*42.31 mW = 423.1 mW = 0.42 W

44 Estimations on power consumption - memory Two dualport-memories of size 512, with 7 write cycles during 40 ns => 175 MHz. For reading, one assumes 100 MHz. Write AC current: 11.153 mA => 1.5 V *11.153 mA = 16.73 mW. Read AC current: 4.170 mA => 1.5 V * 4.170 mA = 6.255 mW. Tot. effect of two memories in one submatrix: 2 x (16.73 + 6.255) mW = 45.97 mW Total effect of memories in 10 submatrices: 10 x 45.97 mW = 0.56 W. More than half of total power consumption due to the memories!

45 Scalability issues With the 16 64-bit banks of the SUZE SDS: -Maximum hit density of 9.18 hits/mm^2/frame can be covered, but… -must be followed by MUX selecting 20 states per line. With the 32 32-bit banks of the SUZE SDS: -Maximum hit density of 78 hits/mm^2/frame can be covered, but… -Requires MUX selecting 90 states per line  Practically unrealizable!

46 Comparison of solutions Parallell token based approach gives critical time per line of 1 us. Sequential sparse scan (SUZE) gives only 40 ns / line.  1st approach requires only 4 line segments. (2nd requires at least 16).  1st approach has no limitation on how many states to read from a line (2nd approach has a fixed number of states per line)  By changing the number of buffers or number of segments, 1st approach can scale with much higher hit densities in small segments of the chip. (2nd approach already close to end of scaling)  1st approach does not require memory for buffering (Strictly necessary for 2nd approach)

47 Backup slides Structure of 10-bit line buffer. Data going from the left to the right while clocked by the token clock, and going from top to bottom during scan mode.

48 Backup slides Structure of line buffer hit counter. Hit counter is enabled until the token has returned.

1 Olav Torheim University of Bergen / IPHC Conceptual design of a 3D integrated pixel detector for the CBM MVD Is it possible to run the experiment at.

Similar presentations

Presentation on theme: "1 Olav Torheim University of Bergen / IPHC Conceptual design of a 3D integrated pixel detector for the CBM MVD Is it possible to run the experiment at."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Olav Torheim University of Bergen / IPHC Conceptual design of a 3D integrated pixel detector for the CBM MVD Is it possible to run the experiment at.

Similar presentations

Presentation on theme: "1 Olav Torheim University of Bergen / IPHC Conceptual design of a 3D integrated pixel detector for the CBM MVD Is it possible to run the experiment at."— Presentation transcript:

Similar presentations

About project

Feedback