Optimizing the layout and error properties of quantum circuits

Slides:



Advertisements
Similar presentations
Analysis of Computer Algorithms
Advertisements

Q U RE: T HE Q UANTUM R ESOURCE E STIMATOR T OOLBOX Martin Suchara (IBM Research) October 9, 2013 In collaboration with: Arvin Faruque, Ching-Yi Lai, Gerardo.
University of Queensland
ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.
1/1/ /e/e eindhoven university of technology Microprocessor Design Course 5Z008 Dr.ir. A.C. (Ad) Verschueren Eindhoven University of Technology Section.
Reference: Message Passing Fundamentals.
Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi.
The Design Process Outline Goal Reading Design Domain Design Flow
Detailed Design Kenneth M. Anderson Lecture 21
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Optimizing the layout and error properties of quantum circuits November 10 th, 2009 John Kubiatowicz
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Local Fault-tolerant Quantum Computation Krysta Svore Columbia University FTQC 29 August 2005 Collaborators: Barbara Terhal and David DiVincenzo, IBM quant-ph/
Digital Design – Optimizations and Tradeoffs
Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H
VHDL Coding Exercise 4: FIR Filter. Where to start? AlgorithmArchitecture RTL- Block diagram VHDL-Code Designspace Exploration Feedback Optimization.
Automated Generation of Layout and Control for Quantum Circuits Mark Whitney, Nemanja Isailovic, Yatish Patel, John Kubiatowicz University of California,
Lecture 9: Multi-FPGA System Software October 3, 2013 ECE 636 Reconfigurable Computing Lecture 9 Multi-FPGA System Software.
Memory Hierarchies for Quantum Data Dean Copsey, Mark Oskin, Frederic T. Chong, Isaac Chaung and Khaled Abdel-Ghaffar Presented by Greg Gerou.
Testing an individual module
Running a Quantum Circuit at the Speed of Data Nemanja Isailovic, Mark Whitney, Yatish Patel, John Kubiatowicz U.C. Berkeley QEC 2007.
Copyright 2008 Koren ECE666/Koren Part.6a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
A Fault-tolerant Architecture for Quantum Hamiltonian Simulation Guoming Wang Oleg Khainovski.
From Concept to Silicon How an idea becomes a part of a new chip at ATI Richard Huddy ATI Research.
Spring 2002EECS150 - Lec0-intro Page 1 EECS150 - Digital Design Lecture 8 - Hardware Description Languages February 14, 2002 John Wawrzynek.
By: Mike Neumiller & Brian Yarbrough
CS252 Graduate Computer Architecture Lecture 28 Esoteric Computer Architecture DNA Computing & Quantum Computing Prof John D. Kubiatowicz
Quantum Error Correction Jian-Wei Pan Lecture Note 9.
MASSOUD PEDRAM UNIVERSITY OF SOUTHERN CALIFORNIA Interconnect Length Estimation in VLSI Designs: A Retrospective.
Chisel-Q: Designing Quantum Circuits with a Scala Embedded Language Xiao Liu and John Kubiatowicz Computer Science Division University of California, Berkeley.
CS252 Graduate Computer Architecture Lecture 26 Modern Intel Processors, Quantum Computing and Quantum CAD Design April 30 th, 2012 Prof John D. Kubiatowicz.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Lecture 12 – Design Procedure.
Introduction to VLSI Design – Lec01. Chapter 1 Introduction to VLSI Design Lecture # 2 A Circuit Design Example.
COE4OI5 Engineering Design. Copyright S. Shirani 2 Course Outline Design process, design of digital hardware Programmable logic technology Altera’s UP2.
CS252 Graduate Computer Architecture Lecture 26 Quantum Computing and Quantum CAD Design May 4 th, 2010 Prof John D. Kubiatowicz
Mathematical Modeling and Formal Specification Languages CIS 376 Bruce R. Maxim UM-Dearborn.
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
Automated Design of Custom Architecture Tulika Mitra
PROGRAMMABLE LOGIC DEVICES (PLD)
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 9: February 24, 2014 Operator Sharing, Virtualization, Programmable Architectures.
Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis Split-Row: A Reduced Complexity, High Throughput.
Digital Logic Computer Organization 1 © McQuain Logic Design Goal:to become literate in most common concepts and terminology of digital.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
Lecture 2 1 ECE 412: Microcomputer Laboratory Lecture 2: Design Methodologies.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation.
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
Lecture 13: Logic Emulation October 25, 2004 ECE 697F Reconfigurable Computing Lecture 13 Logic Emulation.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
Fall 2004EE 3563 Digital Systems Design EE 3563 VHSIC Hardware Description Language  Required Reading: –These Slides –VHDL Tutorial  Very High Speed.
COMP541 Arithmetic Circuits
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
© 2015 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual.
ELEE 4303 Digital II Introduction to Verilog. ELEE 4303 Digital II Learning Objectives Get familiar with background of HDLs Basic concepts of Verilog.
FPGA CAD 10-MAR-2003.
Digital Logic Design Basics Combinational Circuits Sequential Circuits Pu-Jen Cheng Adapted from the slides prepared by S. Dandamudi for the book, Fundamentals.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
ASIC/FPGA design flow. Design Flow Detailed Design Detailed Design Ideas Design Ideas Device Programming Device Programming Timing Simulation Timing Simulation.
COMBINATIONAL AND SEQUENTIAL CIRCUITS Guided By: Prof. P. B. Swadas Prepared By: BIRLA VISHVAKARMA MAHAVDYALAYA.
EECE 320 L8: Combinational Logic design Principles 1Chehab, AUB, 2003 EECE 320 Digital Systems Design Lecture 8: Combinational Logic Design Principles.
QUANTUM COMPUTING: Quantum computing is an attempt to unite Quantum mechanics and information science together to achieve next generation computation.
ASIC Design Methodology
Combinational Logic Design
Introduction to cosynthesis Rabi Mahapatra CSCE617
Objective of This Course
HIGH LEVEL SYNTHESIS.
Improving Quantum Circuit Dependability
Fast Min-Register Retiming Through Binary Max-Flow
Chapter 2 from ``Introduction to Parallel Computing'',
Presentation transcript:

Optimizing the layout and error properties of quantum circuits Professor John Kubiatowicz University of California at Berkeley September 28th, 2012 kubitron@cs.berkeley.edu http://qarc.cs.berkeley.edu/

Quantum Circuits are Big! Some recent (naïve?) estimates for Ground-State Estimation (Level 3 Steane code): 209 logical qubits  343 (EC) = 71687 data qubits Total operations: 1011 to 1017 (depending on type) 1017 T gates  117,000 ancillas/T gate = 1022 ancillas 51026 Operations for SWAP (communication) And on… Shor’s Algorithm for factoring? 5105 or more data qubits 1.5  1015 operations (or more) How can you possibly investigate such circuits? This is the realm of Computer Architecture and Computer Aided Design (CAD)

Simple example of Why Architecture Studies are Important (2003) Consider Kane-style Quantum Computing Datapath Qubits are embedded P+ impurities in silicon substrate Manipulate Qubit state by manipulating hyperfine interaction with electrodes above embedded impurities Obviously, important to have an efficient wire For Kane-style technology need sequence of SWAPs to communicate quantum state So – our group tried to figure out what involved in providing wire Results: Swapping control circuit involves complex pulse sequence between every pair of embedded Ions We designed a local circuit that could swap two Qubits (at < 4K) Area taken up by control was > 150 x area taken by bits! Conclusion: must at least have a practical WIRE! Not clear that this technology meets basic constraint

Pushing Limits Very interesting problems happen at scale! Small circuits become Computer Architecture Modular design Pipelining Communication Infrastructure Direct analogies to classical chip design apply The physical organization of components matters “Wires are expensive, adders are not”? Important Focus Areas for the future: Languages for Describing Quantum Algorithms Optimal partitioning and layout Global communication scheduling Layout-driven error correction

Expressing Quantum Algorithms

How to express Circuits/Algorithms? Graphically: Schematic Capture Systems Several of these have been built QASM: the quantum assembly language Primitives for defining single Qubits, Gates C-like languages Scaffold: some abstraction, modules, fixed loops Embedded languages Use languages such as Scala or Ruby to build Domain Specific Language (DSL) for quantum circuits Can build up circuit by overriding basic operators Can introduce a “Reverse” operator to turn classical circuits into reversible quantum ones

Quantum Circuit Model Quantum Circuit model – graphical representation Time Flows from left to right Single Wires: persistent Qubits, Double Wires: classical bits Qubit – coherent combination of 0 and 1:  = |0 + |1 Universal gate set: Sufficient to form all unitary transformations Example: Syndrome Measurement (for 3-bit code) Measurement (meter symbol) produces classical bits Quantum CAD Circuit expressed as netlist Computer manpulated circuits and implementations

Higher-Level Language: Chisel Scala-based language for digital circuit design High-level functional descriptions of circuits as input Many outputs: for instance direct production on Verilog Used in design of new advanced RISC pipeline Features High-level abstraction Hierarchical design Abstractions build up circuit (netlist) Inner-Product FIR Digital Filter:

Quantum Chisel Simple additions to Chisel Code base Addition of Classical  Quantum translation Produce Ancilla, UseToffoli Gates, CNots, etc Reverse Logic to automagically reverse netlists and produce reversible output State machine transformation (using “shift registers” to keep extra state when needed) Because of the way Chisel constructed, can be below the level of syntax (DSL) seen by programmer With possible exception of explicit REVERSE operator Goal? Take classical circuits designed in Chisel and produce quantum equivalents Adders, Multipliers Floating-Point processors Output: Quantum Assembly (QASM) Input to other tools!

One Sticky Issue: Error Correction

Quantum ECC (Concatenated Codes) X Encoded /8 (T) Ancilla SX T: Correct Not Transversal! H T H T n-physical Qubits per logical Qubit QEC Ancilla Correct Errors Computation Syndrome Quantum State Fragile  encode all Qubits Uses many resources: e.g. 343 physical Qubits/logical Qubit)! Need to handle operations (fault-tolerantly) Some set of gates are simply “transversal”: identical operation on each bit Others (like T gate) much more complex (non-transversal) Finally, need to perform periodic error correction Correct after every(?): Gate, Movement, Long Idle Period Correction reducing entropy  Consumes Ancilla bits

Topological (Surface) Quantum ECC Rough boundary Smooth boundary Physical Qubits on links in the lattice Continuous Measurement and Correction Measuring stabilizers (groups of 4) yields error syndromes Optimizations around the decoding algorithm and frequency of measurement

Computation with Topological Codes Each logical Qubit represented by a pair of holes Layout for Large Algorithm: Tile Lattice with paired holes CNOT: move a smooth hole around a rough one Complications: may need to transform a smooth hole into a rough one before performing CNOT Rules for how to move holes (grow and shrink them) Again: Some gates easy, some not (Once again, T is messy)

Moving to the Realm of Quantum Computer Aided Design

Need for CAD: More than just Size Data locality: Where qubits “live” and how they move can make or break the ability of a quantum circuit to function: Movement carries risk and consumes time Ancilla must be created close to where used Communication must be minimized through routing optimization Customized (optimal?) data movement  customized channel structure/quantum data path One-size fits all topology not necessarily the best Parallelism: How to exploit parallelism in dataflow graph Partitioning and scheduling algorithms Area-Time tradeoff in Ancilla generation Customized circuits for pre-computing non-transversal Ancilla reuse? Error Correction: One-size fits all probably not desirable Adapt level of encoding in circuit-dependent way Corrections after every operation may not be necessary

Quadence Design Tool OR Classical Control Schematic Capture (Graphical Entry) Classical Control Teleportation Network QEC Insertion Partitioning Layout Network Insertion Error Analysis … Optimization CAD Tool Implementation OR Custom Layout and Scheduling Quantum Assembly (QASM)

Important Measurement Metrics Traditional CAD Metrics: Area What is the total area of a circuit? Measured in macroblocks (ultimately m2 or similar) Latency (Latencysingle) What is the total latency to compute circuit once Measured in seconds (or s) Probability of Success (Psuccess) Not common metric for classical circuits Account for occurrence of errors and error correction Quantum Circuit Metric: ADCR Area-Delay to Correct Result: Probabilistic Area-Delay metric ADCR = Area  E(Latency) = ADCRoptimal: Best ADCR over all configurations Optimization potential: Equipotential designs Trade Area for lower latency Trade lower probability of success for lower latency

ReSynthesis (ADCRoptimal) Most Vulnerable Circuits Quantum CAD flow QEC Insert Circuit Synthesis QEC Optimization Input Circuit Fault-Tolerant Circuit (No layout) Fault Tolerant ReSynthesis (ADCRoptimal) Circuit Partitioning Communication Estimation Teleportation Network Insertion Mapping, Scheduling, Classical control Partitioned Circuit Functional System ReMapping Error Analysis Most Vulnerable Circuits Complete Layout Hybrid Fault Analysis Output Layout Psuccess ADCR computation

Optimizing Ancilla and Layout

An Abstraction of Ion Traps Basic block abstraction: Simplify Layout Evaluation of layout through simulation Movement of ions can be done classically Yields Computation Time and Probability of Success Simple Error Model: Depolarizing Errors Errors for every Gate Operation and Unit of Waiting Ballistic Movement Error: Two error Models Every Hop/Turn has probability of error Only Accelerations cause error in/out ports gate locations straight 3-way 4-way turn

Example Place and Route Heuristic: Collapsed Dataflow Gate locations placed in dataflow order Qubits flow left to right Initial dataflow geometry folded and sorted Channels routed to reflect dataflow edges Too many gate locations, collapse dataflow Using scheduler feedback, identify latency critical edges Merge critical node pairs Reroute channels Dataflow mapping allows pipelining of computation! Greedy places gates with no global knowledge of the circuit structure, our second heuristic tries to exploit spatial locality in the given circuit by laying out gates in roughly dataflow order Number of gate locations is same as circuit gates, so in the millions this doesn’t work but it is a start Columns sorted to maximize left to right motion without turns As mentione, qubits flow left to right naturally in this design, so this heuristic is especially well suited to circuits that operate repeatedly on streams of qubits at high throughput, inheirently pipelined q0 q1 q2 q3 q0 q1 q2 q3 q0 q1 q2 q3

Quantum Logic Array (QLA) Syndrome Ancilla Factory Correct EPR TP Anc Comp Anc Comp Correct 1 or 2-Qubit Gate (logical) Storage for 2 Logical Qubits (In-Place) n-physical Qubits TP Teleporter NODE EPR Basic Unit: Two-Qubit cell (logical) Storage, Compute, Correction Connect Units with Teleporters Probably in mesh topology, but details never entirely clear from original papers First Serious (Large-scale) Organization (2005) Tzvetan S. Metodi, Darshan Thaker, Andrew W. Cross, Frederic T. Chong, and Isaac L. Chuang

Running Circuit at “Speed of Data” Often, Ancilla qubits are independent of data Preparation may be pulled offline Very clear Area/Delay tradeoff: Suggests Automatic Tradeoffs (CAD Tool) Ancilla qubits should be ready “just in time” to avoid ancilla decoherence from idleness Hardware Devoted to Parallel Ancilla Generation Q0 H QEC Ancilla QEC C X QEC Ancilla QEC T-Ancilla T QEC Ancilla QEC Q1 T-Ancilla T QEC Ancilla QEC QEC Ancilla QEC H QEC Ancilla QEC Serial Circuit Latency Parallel Circuit Latency

How much Ancilla Bandwidth Needed? 32-bit Quantum Carry-Lookahead Adder Ancilla use very uneven (e.g. zero and T ancilla) Performance is flat at high end of ancilla generation bandwidth Can back off 10% and save orders of magnitude in area Many bits idle at any one time Need only enough ancilla to maintain state for these bits Many not need to frequently correct idle errors Conclusion: makes sense to compute ancilla requirements and share area devoted to ancilla generation Can precompute ancilla for non-transverse gates!

Tiled Quantum Datapaths Anc Comp EPR TP Previous: QLA, LQLA Anc Mem Comp TP EPR Previous: CQLA, CQLA+ TP Anc Comp Mem EPR Our Group: Qalypso Several Different Datapaths mappable by our CAD flow Variations include hand-tuned Ancilla generators/factories Memory: storage for state that doesn’t move much Less/different requirements for Ancilla Original CQLA paper used different QEC encoding Automatic mapping must: Partition circuit among compute and memory regions Allocate Ancilla resources to match demand (at knee of curve) Configure and insert teleportation network

Which Datapath is Best? Random Circuit Generation Qalypso clear winner Splitting factor (r): measures connectivity of the circuit Related to Rent’s factor Qalypso clear winner 4x lower latency than LQLA 2x smaller area than CQLA+ Why Qalypso does well: Shared, matched ancilla factories Automatic network sizing (rather than fixed teleportation) Automatic Identification of Idle Qubits (memory) LQLA and CQLA+ perform close second Original supplemented with better ancilla generators, automatic network sizing, and Idle Qubit identification Original QLA and CQLA do very poorly for large circuits

Optimizing Error Correction

Reducing QEC Overhead Correct H H H Correct Standard idea: correct after every gate, and long communication, and long idle time This is the easiest for people to analyze This technique is suboptimal Not every bit has same noise level! Different idea: identify critical Qubits Try to identify paths that feed into noisiest output bits Place correction along these paths to reduce maximum noise

1024-bit QRCA and QCLA adders QEC Optimization EDistMAX iteration QEC Optimization EDistMAX Partitioning and Layout Fault Analysis Input Circuit Optimized Layout Modified version of retiming algorithm: called “recorrection:” Find minimal placement of correction operations that meets specified MAX(EDist)  EDistMAX Probably of success not always reduced for EDistMAX > 1 But, operation count and area drastically reduced Use Actual Layouts and Fault Analysis Optimization pre-layout, evaluated post-layout 1024-bit QRCA and QCLA adders

Recorrection of 500-gate Random Circuit (r=0.5) Probability of Success Move Error Rate per Macroblock EDistMAX=3 Probability of Success Idle Error Rate per CNOT Time EDistMAX=3 Not all codes do equally well with Recorrection Both [[23,1,7]] and [[7,1,3]] reasonable candidates [[25,1,5]] doesn’t seem to do as well Cost of communication and Idle errors is clear here! However – real optimization situation would vary EDist to find optimal point

Investigating Larger Circuits

What does Quadence do? ECC Insertion and Optimization Logical  Physical circuits Includes encoding, and correction ECC Recorrection optimization (more later) Circuit partitioning Find minimum places to cut large circuit Compute ancilla needs Place physical qubits in proper regions of grid Communication Estimation and insertion Generate Custom Teleportation network Schedule movement of bits Movement within Ancilla generators (Macros) Movement within compute and memory regions Movement two and from teleportation stations Simulation of result to get timing for full circuit MonteCarlo simulation to get error analysis

Possible 1024-bit adders Quantum Ripple-Carry adder (QRCA) Tradeoffs between area and parallelism Or – between speed and circuit reuse Subadder: m-bit QRCA Quantum Carry-Lookahead adder (QCLA) Stronger tradeoff between area and parallelism Arity of carry-lookahead Subadder: m-bit QCLA

Comparison of 1024-bit adders ADCRoptimal for 1024-bit QRCA and QCLA ADCRoptimal for 1024-bit QCLA Carry-Lookahead is better in all architectures QEC Optimization improves ADCR by order of magnitude in some circuit configurations

Area Breakdown for Adders Error Correction is not predominant use of area Only 20-40% of area devoted to QEC ancilla For Optimized Qalypso QCLA, 70% of operations for QEC ancilla generation, but only about 20% of area T-Ancilla generation is major component Often overlooked Networking is significant portion of area when allowed to optimize for ADCR (30%) CQLA and QLA variants didn’t really allow for much flexibility

Direct Comparison: Concatenated and Topological QECC

Ground State Estimation C O N H Ground State Estimation Find ground state of Glycine Problem Size: 50 Basis Functions Result Calculated with 5 Bits accuracy 60 Qubits, 6.9 x 1012 gates, Parallelism: 2.5 Conceptual Primitives Quantum Simulation and Phase Estimation

Properties of Quantum Technologies: Gate Times and Errors Supercond. Qubits (Primitive) Supercond. Qubits (Optimal) Ion Traps (Primitive) Ion Traps (Optimal) Neutral Atoms (Primitive) Neutral Atoms (Trotter) Time (ns) 25 28 32,000 14,818 19,465 Gate Err 1.0x10-5 6.6x10-4 3.2x10-9 2.9x10-7 8.1x10-3 1.5x10-3 Mem Err 2.5x10-12 0.0 Average gate time shown for simplicity. Actual gate times considered in all of our calculations. Ion traps slower but more reliable than superconductors Neutral atoms unusable with concat. codes

Ground State Estimation, Multiple Technologies 1 x 10-3 19,000 ns 1 x 10-5 25 ns 1 x 10-9 32,000 ns Neutral Atoms (Trotter) Supercond. Qubits (Primitive) Ion Traps (Primitive) Surface Code 10,883 years 4.5 years 5,588 years Time 2.0 x 1024 3.5 x 1022 3.9 x 1022 Gates 2.5 x 108 1.7 x 107 4.4 x 107 Qubits Bacon Shor Code - 4,229 years 128 years 9.5 x 1032 1.5 x 1019 9.4 x 1011 1.6 x 105 5 1 Concatenations

Conclusion How to express quantum algorithms? Embedded DSLs in higher-order languages Size of Quantum Circuits  Must Optimize Locality Presented Some details of a Full CAD flow (Partitioning, Layout, Simulation, Error Analysis) New Evaluation Metric: ADCR = Area  E(Latency) Full mapping and layout accounts for communication cost Ancilla Optimization Important Ancilla bandwidth varies widely Custom ancilla factories sized to meet needs of circuit “Recorrection” Optimization for QEC Selective placement of error correction blocks Validation with full layout to find optimal level of correction Analysis of 1024-bit adder architectures Carry-Lookahead adders better than Ripple Carry adders Error correction not the primary consumer of area!