Can small quantum systems learn? NATHAN WIEBE & CHRISTOPHER GRANADE, DEC 12 2015.

Slides:

Advertisements

Similar presentations

Quantum t-designs: t-wise independence in the quantum world Andris Ambainis, Joseph Emerson IQC, University of Waterloo.

Advertisements

Quantum Software Copy-Protection Scott Aaronson (MIT) |

Pretty-Good Tomography Scott Aaronson MIT. Theres a problem… To do tomography on an entangled state of n qubits, we need exp(n) measurements Does this.

Bayesian Belief Propagation

Data-Assimilation Research Centre

Learning with Missing Data

Dynamic Bayesian Networks (DBNs)

Machine Learning Week 2 Lecture 1.

Visual Recognition Tutorial

Maximum likelihood (ML) and likelihood ratio (LR) test

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.

Lecture 5: Learning models using EM

AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.

Expectation Maximization Algorithm

Evolutionary Computational Intelligence Lecture 9: Noisy Fitness Ferrante Neri University of Jyväskylä.

Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.

Introduction to Bayesian Parameter Estimation

Efficient Quantum State Tomography using the MERA in 1D critical system Presenter : Jong Yeon Lee (Undergraduate, Caltech)

Optimization Methods One-Dimensional Unconstrained Optimization

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

Gaussian Processes for Transcription Factor Protein Inference Neil D. Lawrence, Guido Sanguinetti and Magnus Rattray.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Quantum Error Correction Jian-Wei Pan Lecture Note 9.

PATTERN RECOGNITION AND MACHINE LEARNING

沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.

Particle Filtering in Network Tomography

The free-energy principle: a rough guide to the brain? K Friston Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

General Principle of Monte Carlo Fall 2013 By Yaohang Li, Ph.D.

Cove: A Practical Quantum Computer Programming Framework Matt Purkeypile Doctorate of Computer Science Dissertation Defense June 26, 2009.

Kernel Classifiers from a Machine Learning Perspective (sec ) Jin-San Yang Biointelligence Laboratory School of Computer Science and Engineering.

Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.

Quantum Information Jan Guzowski. Universal Quantum Computers are Only Years Away From David’s Deutsch weblog: „For a long time my standard answer to.

Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley

ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Mean Field Variational Bayesian Data Assimilation EGU 2012, Vienna Michail Vrettas 1, Dan Cornford 1, Manfred Opper 2 1 NCRG, Computer Science, Aston University,

MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.

Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.

Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.

Maximum a posteriori sequence estimation using Monte Carlo particle filters S. J. Godsill, A. Doucet, and M. West Annals of the Institute of Statistical.

Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.

Problem: 1) Show that is a set of sufficient statistics 2) Being location and scale parameters, take as (improper) prior and show that inferences on ……

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.

Tractable Inference for Complex Stochastic Processes X. Boyen & D. Koller Presented by Shiau Hong Lim Partially based on slides by Boyen & Koller at UAI.

- 1 - Overall procedure of validation Calibration Validation Figure 12.4 Validation, calibration, and prediction (Oberkampf and Barone, 2004 ). Model accuracy.

Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.

Quantum Computing Michael Larson. The Quantum Computer Quantum computers, like all computers, are machines that perform calculations upon data. Quantum.

1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.

6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,

Univariate Gaussian Case (Cont.)

Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.

1 Introduction to Quantum Information Processing QIC 710 / CS 667 / PH 767 / CO 681 / AM 871 Richard Cleve DC 2117 Lectures

Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.

CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.

Probabilistic Models for Linear Regression

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

LECTURE 07: BAYESIAN ESTIMATION

An approach to quantum Bayesian inference

Stochastic Methods.

Presentation transcript:

Can small quantum systems learn? NATHAN WIEBE & CHRISTOPHER GRANADE, DEC

Quantum information processing

Our question… Can small quantum agents learn?

Why do I care about this? 1) What does learning mean on a physical level? 2) What inference tasks can a quantum computer accelerate? 3) What are the ultimate limitations that physics places on learning?

The power of quantum systems At some level, the power of quantum systems arises from the fact that they can encode an exponentially large vector using linear memory. Number of qubitsDimension of vector ,294,967, ,446,744,073,709,551,616 1) What precision is the quantum state vector? 2) How do you read it? 3) How can you manipulate it?

The uncertainty principle You cannot precisely know the position and momentum of a particle simultaneously. Measurement disturbs quantum systems. This is known as “wave function collapse”

Wave function collapse

Leveraging Interference Interference can be used to reach target state in 3 operations. 1)Prepare initial state. 2)Reflect about space perp. to ideal. 3)Reflect about initial. Ideal State

Leveraging Interference Interference can be used to reach target state in 3 operations. 1)Prepare initial state. 2)Reflect about space perp. to ideal. 3)Reflect about initial. Ideal State

Leveraging Interference Interference can be used to reach target state in 3 operations. 1)Prepare initial state. 2)Reflect about space perp. to ideal. 3)Reflect about initial. Ideal State Special case of Grover’s search (optimal quantum search algorithm).

The no-cloning theorem Quantum operations are powerful, but quantum data is not robust. Can we make copies of quantum data to save it from damaging measurements? No. UnitaryNon-Unitary

Bayesian inference Despite this tension between quantum fragility and flexibility, it could still be possible to learn efficiently for small quantum devices. To get an intuition about this let us consider a concrete form of learning: Bayesian inference.

Abstracting the quantum problem We model the quantum learning agent’s memory using three registers: All three registers have length that is logarithmic in the size of the problem. In this sense the device is a “small quantum system”.

Abstracting the quantum problem

Simple Bayesian inference algorithm (LYC2014)

Problems with this method Naïve BayesQuantum Scaling with number of model parameters exponentialpolynomial Scaling with number of updates polynomialexponential

This algorithm is optimal Proof Efficient quantum Bayesian inference is impossible in this blackbox setting. Efficient quantum Bayesian inference is impossible in this blackbox setting.

Sidestepping the theorem The most obvious way to sidestep this is to change the machine. Because the prior is stored as a bitstring, it can be efficiently updated. This is no longer small. We therefore consider approximate learning. This need not violate Grover’s bounds.

Approximate Quantum Inference You cannot exactly clone the posterior, but you can efficiently approximate it. We fight failure probability by inferring a Gaussian approximation to the posterior. This requires non-negligible classical memory. Quadratically faster than classical analogue.

What resampling looks like in practice

Repetition codes We can also make the system more robust to measurement by using a repetition code to protect the system Ancilla Likelihood Variable Chernoff bound: mean gives exponentially little information about variables. Can do this without collapsing the state, but requires many copies. K-copies

Conclusion We present a formal method for doing Bayesian inference in small quantum systems. We show that updating cannot be made efficient within this framework. Additional quantum or classical memory allows efficient approximate Bayesian inference in small systems. Is there a more general result? Can small quantum systems learn?

What resampling looks like in practice

Simple quantum “Resampling” Initial Prior Updated Prior Sample from Posterior The mean and the standard deviation can then be learned by sampling from the final posterior distribution. In practice, there are better ways of achieving this.

Improved algorithm for resampling (1D)

Overall query complexity Scaling with D can be reduced using sampling instead of AE, at the price of worse epsilon scaling. Query complexity independent of number of hypotheses. Performing the same task classically (deterministically) requires O(exp(D)) queries.

Empirical results Focusing on the likelihood function I find that the resampling process, for 16 bit x, is very robust to noise. The following uses 200 updates, with 10 updates per resample step: 12% Noise in mean/sd6% Noise in mean/sd25% Noise in mean/sd

Empirical results The success probability (especially for the first several updates) is concentrated around ½. 16 bit model 8 bit model

Adaptive Learning Bayesian inference does not necessarily need to be performed on a sequence of previously observed experiments. It can also be done on the fly as each datum is received. Processing data in this way allows experiments to be chosen to optimize learning.

Example Given this prior distribution, choosing experiments to distinguish between the peaks may be much better than performing a random experiment. Drawback is that finding the optimal experiment to distinguish them is computationally expensive. Would like to distinguish

Formalizing the optimization

Optimizing experiments Locally optimal experiments can be found using gradient ascent, assuming utility fcn can be evaluated on a mesh. Utility Experimental Parameter C

Quantum experiment optimization 1.Use quantum computer to compute utility function for an experiment. 2.Estimate the gradient via 3.For learning rate r, take a step in the direction of the gradient. 4.Repeat until convergence to a local minima.

How do you compute the utility? The basic expression is: Expanding the expression for the utility, we have Utility is found by computing each of these three terms.

Example: Computing

Summary Can perform Bayesian inference on a quantum computer using a number of queries that is independent of the number of hypotheses. Quantum distributions are ill-suited for Bayesian inference because the posterior distribution cannot be cloned. Quantum “resampling” strategies can be employed to classically cache the posterior distribution and remove the exponential decay. Numerical evidence suggests that the method works well in practice and resampling will often, but not always, suffice.

Approximate Bayesian inference is NP-hard

How does quantum computing work? 0 Information is stored in quantum states of matter Quantum states are complex unit vectors Probability of measuring each value is 1/4 qubit More generally, n qubits can be in a state of the form

How we introduce interference Controlled Not Controlled-Controlled Not Hadamard Gate Measurement T Gate

Leveraging Interference Interference can be used to reach target state in 3 operations. 1)Prepare initial state. 2)Reflect about space perp. to ideal. 3)Reflect about initial. Ideal State

Leveraging Interference Interference can be used to reach target state in 3 operations. 1)Prepare initial state. 2)Reflect about space perp. to ideal. 3)Reflect about initial. Ideal State

Leveraging Interference Interference can be used to reach target state in 3 operations. 1)Prepare initial state. 2)Reflect about space perp. to ideal. 3)Reflect about initial. Ideal State