Can small quantum systems learn? NATHAN WIEBE & CHRISTOPHER GRANADE, DEC
Quantum information processing
Our question… Can small quantum agents learn?
Why do I care about this? 1) What does learning mean on a physical level? 2) What inference tasks can a quantum computer accelerate? 3) What are the ultimate limitations that physics places on learning?
The power of quantum systems At some level, the power of quantum systems arises from the fact that they can encode an exponentially large vector using linear memory. Number of qubitsDimension of vector ,294,967, ,446,744,073,709,551,616 1) What precision is the quantum state vector? 2) How do you read it? 3) How can you manipulate it?
The uncertainty principle You cannot precisely know the position and momentum of a particle simultaneously. Measurement disturbs quantum systems. This is known as “wave function collapse”
Wave function collapse
Leveraging Interference Interference can be used to reach target state in 3 operations. 1)Prepare initial state. 2)Reflect about space perp. to ideal. 3)Reflect about initial. Ideal State
Leveraging Interference Interference can be used to reach target state in 3 operations. 1)Prepare initial state. 2)Reflect about space perp. to ideal. 3)Reflect about initial. Ideal State
Leveraging Interference Interference can be used to reach target state in 3 operations. 1)Prepare initial state. 2)Reflect about space perp. to ideal. 3)Reflect about initial. Ideal State Special case of Grover’s search (optimal quantum search algorithm).
The no-cloning theorem Quantum operations are powerful, but quantum data is not robust. Can we make copies of quantum data to save it from damaging measurements? No. UnitaryNon-Unitary
Bayesian inference Despite this tension between quantum fragility and flexibility, it could still be possible to learn efficiently for small quantum devices. To get an intuition about this let us consider a concrete form of learning: Bayesian inference.
Abstracting the quantum problem We model the quantum learning agent’s memory using three registers: All three registers have length that is logarithmic in the size of the problem. In this sense the device is a “small quantum system”.
Abstracting the quantum problem
Simple Bayesian inference algorithm (LYC2014)
Problems with this method Naïve BayesQuantum Scaling with number of model parameters exponentialpolynomial Scaling with number of updates polynomialexponential
This algorithm is optimal Proof Efficient quantum Bayesian inference is impossible in this blackbox setting. Efficient quantum Bayesian inference is impossible in this blackbox setting.
Sidestepping the theorem The most obvious way to sidestep this is to change the machine. Because the prior is stored as a bitstring, it can be efficiently updated. This is no longer small. We therefore consider approximate learning. This need not violate Grover’s bounds.
Approximate Quantum Inference You cannot exactly clone the posterior, but you can efficiently approximate it. We fight failure probability by inferring a Gaussian approximation to the posterior. This requires non-negligible classical memory. Quadratically faster than classical analogue.
What resampling looks like in practice
Repetition codes We can also make the system more robust to measurement by using a repetition code to protect the system Ancilla Likelihood Variable Chernoff bound: mean gives exponentially little information about variables. Can do this without collapsing the state, but requires many copies. K-copies
Conclusion We present a formal method for doing Bayesian inference in small quantum systems. We show that updating cannot be made efficient within this framework. Additional quantum or classical memory allows efficient approximate Bayesian inference in small systems. Is there a more general result? Can small quantum systems learn?
What resampling looks like in practice
Simple quantum “Resampling” Initial Prior Updated Prior Sample from Posterior The mean and the standard deviation can then be learned by sampling from the final posterior distribution. In practice, there are better ways of achieving this.
Improved algorithm for resampling (1D)
Overall query complexity Scaling with D can be reduced using sampling instead of AE, at the price of worse epsilon scaling. Query complexity independent of number of hypotheses. Performing the same task classically (deterministically) requires O(exp(D)) queries.
Empirical results Focusing on the likelihood function I find that the resampling process, for 16 bit x, is very robust to noise. The following uses 200 updates, with 10 updates per resample step: 12% Noise in mean/sd6% Noise in mean/sd25% Noise in mean/sd
Empirical results The success probability (especially for the first several updates) is concentrated around ½. 16 bit model 8 bit model
Adaptive Learning Bayesian inference does not necessarily need to be performed on a sequence of previously observed experiments. It can also be done on the fly as each datum is received. Processing data in this way allows experiments to be chosen to optimize learning.
Example Given this prior distribution, choosing experiments to distinguish between the peaks may be much better than performing a random experiment. Drawback is that finding the optimal experiment to distinguish them is computationally expensive. Would like to distinguish
Formalizing the optimization
Optimizing experiments Locally optimal experiments can be found using gradient ascent, assuming utility fcn can be evaluated on a mesh. Utility Experimental Parameter C
Quantum experiment optimization 1.Use quantum computer to compute utility function for an experiment. 2.Estimate the gradient via 3.For learning rate r, take a step in the direction of the gradient. 4.Repeat until convergence to a local minima.
How do you compute the utility? The basic expression is: Expanding the expression for the utility, we have Utility is found by computing each of these three terms.
Example: Computing
Summary Can perform Bayesian inference on a quantum computer using a number of queries that is independent of the number of hypotheses. Quantum distributions are ill-suited for Bayesian inference because the posterior distribution cannot be cloned. Quantum “resampling” strategies can be employed to classically cache the posterior distribution and remove the exponential decay. Numerical evidence suggests that the method works well in practice and resampling will often, but not always, suffice.
Approximate Bayesian inference is NP-hard
How does quantum computing work? 0 Information is stored in quantum states of matter Quantum states are complex unit vectors Probability of measuring each value is 1/4 qubit More generally, n qubits can be in a state of the form
How we introduce interference Controlled Not Controlled-Controlled Not Hadamard Gate Measurement T Gate
Leveraging Interference Interference can be used to reach target state in 3 operations. 1)Prepare initial state. 2)Reflect about space perp. to ideal. 3)Reflect about initial. Ideal State
Leveraging Interference Interference can be used to reach target state in 3 operations. 1)Prepare initial state. 2)Reflect about space perp. to ideal. 3)Reflect about initial. Ideal State
Leveraging Interference Interference can be used to reach target state in 3 operations. 1)Prepare initial state. 2)Reflect about space perp. to ideal. 3)Reflect about initial. Ideal State