Online Learning of Quantum States Scott Aaronson (UT Austin)

Slides:

Advertisements

Similar presentations

How Much Information Is In Entangled Quantum States? Scott Aaronson MIT |

Advertisements

The Learnability of Quantum States Scott Aaronson University of Waterloo.

Quantum Software Copy-Protection Scott Aaronson (MIT) |

The Future (and Past) of Quantum Lower Bounds by Polynomials Scott Aaronson UC Berkeley.

The Learnability of Quantum States Scott Aaronson University of Waterloo.

Limitations of Quantum Advice and One-Way Communication Scott Aaronson UC Berkeley IAS Useful?

How Much Information Is In A Quantum State? Scott Aaronson MIT |

Quantum Double Feature Scott Aaronson (MIT) The Learnability of Quantum States Quantum Software Copy-Protection.

Lower Bounds for Local Search by Quantum Arguments Scott Aaronson (UC Berkeley) August 14, 2003.

Pretty-Good Tomography Scott Aaronson MIT. Theres a problem… To do tomography on an entangled state of n qubits, we need exp(n) measurements Does this.

How to Solve Longstanding Open Problems In Quantum Computing Using Only Fourier Analysis Scott Aaronson (MIT) For those who hate quantum: The open problems.

Scott Aaronson Institut pour l'Étude Avançée Le Principe de la Postselection.

QMA/qpoly PSPACE/poly: De-Merlinizing Quantum Protocols Scott Aaronson University of Waterloo.

Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.

1 Introduction to Quantum Information Processing QIC 710 / CS 667 / PH 767 / CO 681 / AM 871 Richard Cleve DC 2117 / RAC 2211 Lecture.

Quantum Spectrum Testing Ryan O’Donnell John Wright (CMU)

Machine Learning Week 2 Lecture 1.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

The loss function, the normal equation,

Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.

Superdense coding. How much classical information in n qubits? Observe that 2 n  1 complex numbers apparently needed to describe an arbitrary n -qubit.

Probably Approximately Correct Model (PAC)

Avraham Ben-Aroya (Tel Aviv University) Oded Regev (Tel Aviv University) Ronald de Wolf (CWI, Amsterdam) A Hypercontractive Inequality for Matrix-Valued.

Quantum Computing Lecture 22 Michele Mosca. Correcting Phase Errors l Suppose the environment effects error on our quantum computer, where This is a description.

Quantum Algorithms II Andrew C. Yao Tsinghua University & Chinese U. of Hong Kong.

Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.

1 Introduction to Quantum Information Processing QIC 710 / CS 768 / PH 767 / CO 681 / AM 871 Richard Cleve QNC 3129 Lecture 18 (2014)

online convex optimization (with partial information)

1 Introduction to Quantum Information Processing QIC 710 / CS 667 / PH 767 / CO 681 / AM 871 Richard Cleve DC 2117 Lecture 16 (2011)

Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.

8.4.2 Quantum process tomography 8.5 Limitations of the quantum operations formalism 量子輪講 2003 年 10 月 16 日担当：徳本晋

CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

Random Access Codes and a Hypercontractive Inequality for

Scott Aaronson (UT Austin)

PAC-Learning and Reconstruction of Quantum States

Probabilistic Algorithms

The complexity of the Separable Hamiltonian Problem

Information Complexity Lower Bounds

Richard Cleve DC 2117 Introduction to Quantum Information Processing CS 667 / PH 767 / CO 681 / AM 871 Lecture 16 (2009) Richard.

Complexity-Theoretic Foundations of Quantum Supremacy Experiments

Introduction to Machine Learning

CH. 2: Supervised Learning

Generalization and adaptivity in stochastic convex optimization

Background: Lattices and the Learning-with-Errors problem

Shadow Tomography of Quantum States

CS 3343: Analysis of Algorithms

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

Hidden Markov Models Part 2: Algorithms

Shadow Tomography of Quantum States

Scott Aaronson (UT Austin) MIT, November 20, 2018

CSCI B609: “Foundations of Data Science”

Quantum Information Theory Introduction

New Results on Learning and Reconstruction of Quantum States

New Results on Learning and Reconstruction of Quantum States

Aviv Rosenberg 10/01/18 Seminar on Experts and Bandits

Computational Learning Theory

New Results on Learning and Reconstruction of Quantum States

Computational Learning Theory

Chapter 11 Limitations of Algorithm Power

Gentle Measurement of Quantum States and Differential Privacy

The loss function, the normal equation,

Mathematical Foundations of BME Reza Shadmehr

Scott Aaronson (UT Austin) UNM, Albuquerque, October 18, 2018

Richard Cleve DC 2117 Introduction to Quantum Information Processing CS 667 / PH 767 / CO 681 / AM 871 Lecture 16 (2009) Richard.

Gentle Measurement of Quantum States and Differential Privacy *

INTRODUCTION TO Machine Learning 3rd Edition

Presentation transcript:

Online Learning of Quantum States Scott Aaronson (UT Austin)  Scott Aaronson (UT Austin) Joint work with Xinyi Chen, Elad Hazan, Satyen Kale, and Ashwin Nayak arXiv:1802.09025 / NeurIPS 2018

An n-qubit pure state requires 2n complex numbers to specify, even approximately: Yet measuring yields at most n bits (Holevo’s Theorem) So should we say that the 2n complex numbers are “really there” in a single copy of |—or “just in our heads”? A probability distribution over n-bit strings also involves 2n real numbers. But probabilities don’t interfere!

Quantum State Tomography Task: Given lots of copies of an unknown D-dimensional quantum mixed state , produce an approximate classical description of  O’Donnell and Wright and Haah et al., STOC’2016: ~(D2) copies of  are necessary and sufficient for this Experimental Record (Song et al. 2017): 10 qubits, millions of measurement settings! Keep in mind: D = 2n

Quantum Occam’s Razor Theorem (A. 2006) Let  be an unknown D-dimensional state Suppose you just want to be able to estimate the acceptance probabilities of most measurements E drawn from some probability distribution  Then it suffices to do the following, for some m=O(log D): Choose m measurements independently from  Go into your lab and estimate acceptance probabilities of all of them on  Find any “hypothesis state”  approximately consistent with all measurement outcomes “Quantum states are PAC-learnable”

Can prove by combining two facts: (1) The class of [0,1]-valued functions of the form f(E)=Tr(E), where  is a D-dimensional mixed state, has -fat-shattering dimension O((log D)/2) Largest k for which we can find inputs x1,…,xk, and values a1,…,ak[0,1], such that all 2k possible behaviors involving f(xi) exceeding ai by  or vice versa are realized by some f in the class (2) Any class of [0,1]-valued functions is PAC-learnable using a number of samples linear in its fat-shattering dimension  [Alon et al., Bartlett-Long]

To upper-bound the fat-shattering dimension of quantum states: Use the lower bound for quantum random access codes [Nayak 1999]. Namely: You need (n) qubits to encode n bits x1,…,xn into a state , so that any xi of your choice can later be recovered w.h.p. by measuring  Then turn this lemon into lemonade!

How do we find the hypothesis state? Here’s one way: let b1,…,bm be the outcomes of measurements E1,…,Em Then choose a hypothesis state  to minimize This is a convex programming problem, which can be solved in time polynomial in D=2n (good enough in practice for n15 or so) Optimized, linear-time iterative method for this problem: [Hazan 2008]

You know, PAC-learning is so 1990s You know, PAC-learning is so 1990s. Who wants to assume a fixed, unchanging distribution  over the sample data? These days all the cool kids prove theorems about online learning. What’s online learning?

Online Learning Two-outcome measurements E1,E2,… on a D-dimensional state arrive one by one—chosen by an adaptive adversary No more fixed distribution over measurements, no independence, no nothin’! For each Et, you—the learner—are challenged to guess Tr(Et). If you’re off by more than /3, you’re then told the true value, or at least an /3-approximation to it Goal: Give a learning strategy that upper bounds the total number of times your guess will ever be more than  off from the truth (can’t know in advance which times those will be…)

Postselected learning Sequential fat-shattering dimension Online Convex Optimization / Matrix Multiplicative Weights

Error can be either L1 or L2 Theorem 1: There’s an explicit strategy, for online learning of an n-qubit quantum state, that’s wrong by > at most O(n/2) times (and this is tight) Theorem 2: Even if the data the adversary gives you isn’t consistent with any n-qubit quantum state, there’s still an explicit strategy such that your total regret, after T iterations, is at most Tight for L1, possibly not for L2 Regret = (Your total error) – (Total error if you’d started with the best hypothesis state from the very beginning) Error can be either L1 or L2

My Way: Postselected Learning 1 3 2 I/2n In the beginning, the learner knows nothing about , so he guesses it’s the maximally mixed state 0 = I/2n Each time the learner encounters a measurement Et on which his current hypothesis t-1 badly fails, he tries to improve—by letting t be the state obtained by starting from t-1, then performing Et and postselecting on getting the right outcome Amplification + Gentle Measurement Lemma are used to bound the damage caused by these measurements

Solving, we find that t = O(n log(n)) Let  = const for simplicity Crucial Claim: The iterative learning procedure must converge to log(n), after at most T=O(n log(n)) serious mistakes Proof: Let pt = Pr[first t postselections all succeed]. Then If pt wasn’t less than, say, (2/3)pt-1, learning would’ve ended! Solving, we find that t = O(n log(n))

Ashwin’s Way: Sequential Fat-Shattering Dimension New generalization of Ashwin’s random access codes lower bound from 1999: You can store at most m=O(n) bits x=(x1,…,xm) in an n-qubit state x, in such a way that any one xi of your choice can later be read out w.h.p. by measuring x—even if the measurement basis is allowed to depend on x1,…,xi-1 Implies an O(n/2) upper bound for the “online / sequential” version of -fat-shattering dimension Combined with a general result of [Rakhlin et al. 2015], automatically implies online learning alg + regret bound

Elad, Xinyi, Satyen’s Way: Online Convex Optimization Regularized Follow-the-Leader [Hazan 2015] Gradient descent using von Neumann entropy Matrix Multiplicative Weights [Arora and Kale 2016] Technical work: Generalize power tools that already existed for real matrices to complex Hermitian ones Yields optimal mistake bound and regret bound

Application: Shadow Tomography The Task (A. 2016): Let  be an unknown D-dimensional mixed state. Let E1,…,EM be known 2-outcome POVMs. Estimate Pr[Ei accepts ] to within  for all i[M]—the “shadows” that  casts on E1,…,EM—with high probability, by measuring as few copies of  as possible Clearly k=O(D2) copies suffice (do ordinary tomography) Clearly k=O(M) suffice (apply each Ei to separate copies) But what if we wanted to know, e.g., the behavior of an n-qubit state on all accept/reject circuits with n2 gates? Could we have

Theorem (A., STOC’2018): It’s possible to do Shadow Tomography using only copies Proof (in retrospect): Just combine two ingredients! The “Quantum OR Bound” [A. 2006, corrected by Harrow-Montanaro-Lin 2017], to repeatedly search for a measurement Ei such that Tr(Eit) is far from Tr(Ei), without much damaging the ’s. Here t is our current hypothesis state (initially, 0 = maximally mixed) Our online learning theorem, to upper-bound the number of updates to t until we converge on a hypothesis state T s.t. Tr(EiT)Tr(Ei) for every i[M]

New Shadow Tomography Protocol. [A New Shadow Tomography Protocol! [A.-Rothblum 2019, coming any day to an arXiv near you] Exploits a new connection that Guy and I discovered between gentle measurement of quantum states, and differential privacy (an area of classical CS) Based on the “Private Multiplicative Weights” algorithm [Hardt-Rothblum 2010] Uses this many copies of : while also being online and gentle (properties we didn’t have before) Explicitly uses online learning theorem as a key ingredient

Open Problems Generalize to k-outcome measurements Optimal regret for L2 loss What special cases of online learning of quantum states can be done with only poly(n) computation? What’s the true sample complexity of shadow tomography?