Pixel Digital Simplification

Slides:



Advertisements
Similar presentations
Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.
Advertisements

Tera-Pixel APS for CALICE Progress Meeting 6 th September 2006.
Moving NN Triggers to Level-1 at LHC Rates Triggering Problem in HEP Adopted neural solutions Specifications for Level 1 Triggering Hardware Implementation.
[M2] Traffic Control Group 2 Chun Han Chen Timothy Kwan Tom Bolds Shang Yi Lin Manager Randal Hong Mon. Nov. 24 Overall Project Objective : Dynamic Control.
Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.
Final Year Project A CMOS imager with compact digital pixel sensor (BA1-08) Supervisor: Dr. Amine Bermak Group Members: Chang Kwok Hung
IT253: Computer Organization Lecture 3: Memory and Bit Operations Tonga Institute of Higher Education.
Optimal digital circuit design Mohammad Sharifkhani.
McKenneman, Inc. SRAM Proposal Design Team: Jay Hoffman Tory Kennedy Sholanda McCullough.
UNIVERSITY OF ROSTOCK Institute of Applied Microelectronics and Computer Science Single-Rail Self-timed Logic Circuits in Synchronous Designs Frank Grassert,
Transfering Trigger Data to USA15 V. Polychonakos, BNL.
Thanushan Kugathasan, CERN Plans on ALPIDE development 02/12/2014, CERN.
Priority encoder. Overview Priority encoder- theoretic view Other implementations The chosen implementation- simulations Calculations and comparisons.
Directional and Single-Driver Wires in FPGA Interconnect Guy Lemieux Edmund LeeMarvin TomAnthony Yu Dept. of ECE, University of British Columbia Vancouver,
UPDATE ON CLICPIX2 DESIGN Pierpaolo Valerio Edinei Santin
1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University.
1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.
UPDATE ON CLICPIX2 DESIGN Pierpaolo Valerio Edinei Santin
17 nov FEC4_P2 status P.Pangaud ; S.Godiot ; R.Fei ; JP.Luo Remember : P2 from P1 Optimization of Rad-Hard block and SEU tolerance blocs Optimization.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Programmable Logic Devices
Basic Concepts Microinstructions The control unit seems a reasonably simple device. Nevertheless, to implement a control unit as an interconnection of.
Enhancement Presentation Carlos Abellan Barcelona September, 9th 2009.
TDC status and to do 1. Status of TDC design 2. List of future activities.
COMP541 Combinational Logic - 3
THE CMOS INVERTER.
Designing a Low Power SRAM for PICo
LHC1 & COOP September 1995 Report
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Digital readout architecture for Velopix
Micro-programmed Control
Christophe Beigbeder PID meeting
Meeting at CERN March 2011.
SLIDES FOR CHAPTER 12 REGISTERS AND COUNTERS
HV-MAPS Designs and Results I
C.Octavio Domínguez, Frank Zimmermann
Chapter 3 Top Level View of Computer Function and Interconnection
Chess2 Review ASIC Configuration
GTK TDC design and characterization notes Gianluca Aglieri Rinella
Basics Combinational Circuits Sequential Circuits Ahmad Jawdat
Bushnell: Digital Systems Design Lecture 4
COE 202: Digital Logic Design Sequential Circuits Part 4
Lecture 6: Logical Effort
Programmable Interval timer 8253 / 8254
Ka-Ming Keung Swamy D Ponpandi
Programmable Configurations
Lecture 6: Logical Effort
Introduction to CMOS VLSI Design Lecture 5: Logical Effort
Adding Programmable Delay
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Programmable Interval timer 8253 / 8254
Lecture 6: Logical Effort
ECE 432 Group 4 Aaron Albin Jisoon Kim Kiwamu Sato
Jason Klaus, Duncan Elliott Confidential
Comparison of Various Multipliers for Performance Issues
Binary Adder/Subtractor
Global chip connections
ECE 352 Digital System Fundamentals
Some simulations on reported issues
ECE 352 Digital System Fundamentals
ECE 352 Digital System Fundamentals
William Stallings Computer Organization and Architecture 7th Edition
On behalf of the CEPC MOST2 Vertex detector design team
Lecture 4 Sequential units. Registers
Ka-Ming Keung Swamy D Ponpandi
Preliminary design of the behavior level model of the chip
A Random Access Scan Architecture to Reduce Hardware Overhead
Bus Serialization for Reducing Power Consumption
(Carry Lookahead Adder)
Presentation transcript:

Pixel Digital Simplification Wei Wei 2019-01-07

Outline Pixel digital/encoder simplification Tried 4 schemes Some issues of the address buffer Design proposal 并行设计较多,仿真比较仓促,请随时指正!

To simplify the pixel digital/address design Motivation: reduce the duplicated logic in pixel digital and encoder Priority logic existed in both pixel digital and encoder, keep one Any possibility to reduce the complexity of the encoder layout? Scheme 1: Keep the encoder unchanged, simplify the pixel digital Proposed in the last meeting by Xiaomin Tried other two schemes Scheme 2: Keep the pixel digital, simplify the encoder Scheme 3: To simplify the encoder furthermore, based on scheme 2

Scheme 1: less digital, same encoder Keep encoder unchanged Delete fastor, priority, read token logic in pixel digital; keep hit gen, hit reset (proposed edge detection not involved, but can be added) Sim: use real analog, real encoder( 4bit), ideal periphery. Bus RC load were included Func works

Scheme 2a:same digital, less encoder Keep digital unchanged Delete the sync reset logic, the priority logic were simplified to inverters. They are included in the pixel digital The real encoder keep unchanged Sim: same as previous Func works Simplify to inv same

Scheme 2b:to ease the layout complexity Based on scheme 2a In layout, 2a will have some inter-pixel connections due to the group of encoder The lower rows have to deal with many cross-connection buses Try to ease the layout complexity further Separate the encoder to independent bit-> every pixel now only face its own bits Actually based on tristate buffers Func works Simulation results omitted Disadvantages: Every pixel has to layout all the 9bits for 512 row address It looks very much like the pull up scheme Pixel addr=2’b10

Scheme 2c: further easy the layout complexity Based on scheme 2b To simplify every bit logic into one transistor Easier layout Although every pixel still has to place all the 9bits, every bit has only one transistor 13 transistors shared by 4 pixels (lower group level), but has to leave space for higher level transistors there is no inter-pixel connection now, only global addr buses Intrinsic pullup connection, not high-Z bus anymore Func works (same sim. condition) Questions: Power consumption is big as mentioned?

Comparison of the candidate schemes Not consider sch 2b anymore Power consumption depends on basic blocks, but also bus buffers and bus RC load Scheme 2c don’t need buffers, whereas scheme 1 & 2a need buffers To evaluate power and delay (TDA), under the same condition Read given after 8 segs of buffer+RC load Buffer from std cell lib: buffd1 Encoder output with 8 segs of buffer+RC load Buffer designed by Tianya: wire_buffer Pullup scheme output with 8segs of RC_only Terminated with bufbd1 460ohm 400f X8 X8 read X8 read

Power consumption Sch 1 Sch 2a Sch 2c Total current of the encoder was checked, some issues: Sch 1 &2a: due to high Z state at the initial state, the input of the wire buffer will suffer from large current (23uA@tt), after first readout, the current becomes normal (nA~pA) The average current was used to judge the power consumption: 200ns~600ns, tt: 66.07uA@sch1, 65.44uA@sch2a, 43.62uA@sch2c 200ns~2us, tt:14.67uA@sch1, 14.54uA@sch2a, 9.69uA@sch2c 200ns~600ns, ss, 50: 97.00uA@sch1, 97.48uA@sch2a, 43.61uA@sch2c 200ns~8u, ss, 50: 4.97uA@sch1, 5.00uA@sch2a, 2.24uA@sch2c When used std_cell tristate buffer, the trend were the same That means: sch 2c is not really more power consuming, because it doesn’t need buffer

Timing performance:delay@TDA TDA: read from periphery -> addr arrive at periphery was used to evaluate the timing performance Buffer and RC load were included, but encoder only used the 4-bit block, with the rest 8 seg buffers @tt: 16.03ns@sch1,16.91ns@sch2a, 17.89ns@sch2c @ss, 50: 20.91ns@sch1,21.97ns@sch2a, 22.33ns@sch2c This means: sch1 is the fastest as expected, but all the 3 sch are at the same level

Combined comparison Note: Conclusion & proposal Sch 1 (less digital, same enc) Sch 2a (same digital, less enc) Sch 2c (same digital, + pull up) Power same Same ~ ½ lower Timing 1st (16.03ns) 2nd (16.91ns) 3rd (17.89ns) Area Med Layout complexity Easy Possible risk High Z, wire buffer -- (?) Advantage Keep most of design from MOST1 Proved performance from ATLAS 25ns BX, with ALPIDE encoder Easier layout; may save the power from buffers Note: For the encoder scheme (1 &2a), not all the rest logics and groups were included, will add some more power and delay The power consumption due to the high Z + wire buffer seems to be a potential issue, need to be further verified by Tianya Conclusion & proposal Sch 1,2a, & 2c all seemed to be effective simplifications from the original design Maybe we should try parallel designs in this tapeout

Preliminary consideration of the schedule From MOST2 project 05/2018 ~ 04/2019 the first MPW to be tapeout