Quantum Boltzmann Machine Mohammad Amin D-Wave Systems Inc. 1
Not the only use of QA Maybe not the best use of QA
Adiabatic Quantum Computation H(t) = (1-s)HD + sHP , s = t/tf energy levels gmin tf ~ (1/gmin)2 Solution Initial state s 1
Thermal Noise System Bath Interaction Dynamical freeze-out P0 s 1 energy levels kBT Dynamical freeze-out P0 s 1
Open quantum calculations of a 16 qubit random problem Classical energies
Equilibration Can Cause Correlation Correlation with simulated annealing Hen et al., PRA 92, 042325 (2015)
Equilibration Can Cause Correlation Boixo et al., Nature Phys. 10, 218 (2014) Correlation with Quantum Monte Carlo
Equilibration Can Cause Correlation Correlation with spin vector Monte Carlo Shin et al., arXiv:1401.7087 SVMC SVMC
Equilibration Can Mask Quantum Speedup Brooke et al., Science 284, 779 (1999) Quantum advantage is expected to be dynamical
Equilibration Can Mask Quantum Speedup Ronnow et al., Science 345, 420 (2014) Hen et al., arXiv:1502.01663 King et al., arXiv:1502.02098 Equilibrated probability!!! Computation time is independent of dynamics!
Residual Energy vs Annealing Time 50 random problems, 100 samples per problem per annealing time Bimodal (J=-1, +1 , h=0) Mean residual energy Lowest residual energy Annealing time (ms)
Residual Energy vs Annealing Time 50 random problems, 100 samples per problem per annealing time Frustrated loops (a=0.25) Bimodal (J=-1, +1 , h=0) Annealing time (ms) Annealing time (ms)
Boltzmann sampling is #P Quantum Boltzmann Distribution? harder than NP What can we do with a Quantum Boltzmann Distribution?
arXiv:1601.02036 Evgeny Andriyash Jason Rolfe Bohdan Kulchytskyy Roger Melko
Machine Learning in our Daily Life
Introduction to Machine Learning Data Model 3 Model Unseen data
Data Model Probabilistic Models Training: Tune q such that Probability distribution Data Variables Parameters q Model Training: Tune q such that
Boltzmann distribution (b =1) Boltzmann Machine Data Variables Parameters q Model Boltzmann distribution (b =1)
Boltzmann Machine Ising model: spins parameters
za Fully Visible BM Hz only has O(N2) parameters needs O(2N) parameters to be fully described
Adding Hidden Variables z i zn visible hidden za = (zn , zi) hidden visible
We need an efficient way to calculate Training a BM Tune such that We need an efficient way to calculate Maximize log-likelihood: Or minimize: gradient descent technique training rate
Calculating the Gradient Average with clamped visibles Unclamped average
Training Ising Hamiltonian Parameters Clamped average Unclamped average Gradients can be estimated using sampling!
Is it possible to train a quantum Boltzmann machine? Question: Is it possible to train a quantum Boltzmann machine? Ising Hamiltonian Transverse Ising Hamiltonian
Transverse Ising Hamiltonian
Quantum Boltzmann Distribution Boltzmann probability distribution: Density matrix: Projection operator Identity matrix
Gradient Descent = Classically: Clamped average Unclamped average =
Calculating the Gradient Gradient cannot be estimated using sampling! ≠ ≠ Clamped average Unclamped average
Two Useful Properties of Trace Golden-Thompson inequality: For Hermitian matrices A and B
Finding lower bounds Golden-Thompson inequality
Finding lower bounds Lower bound for log-likelihood Golden-Thompson inequality Lower bound for log-likelihood
Calculating the Gradients Minimize the upper bound ? Unclamped average
Visible qubits are clamped to their classical values given by the data Clamped Hamiltonian for Infinite energy penalty for states different from v Visible qubits are clamped to their classical values given by the data
We can now use sampling to estimate the steps Estimating the Steps Clamped average Unclamped average We can now use sampling to estimate the steps
Training the Transverse Field (Ga) Minimizing the upper bound: for all visible qubits, thus cannot be estimated from measurements Two problems: Gn cannot be trained using the bound
Example: 10-Qubit QBM Graph: fully connected (K10), fully visible
Example: 10-Qubit QBM Training set: M-modal distribution p = 0.9 M = 8 Random spin orientation Single mode: Hamming distance p = 0.9 Multi-mode: M = 8
Exact Diagonalization Results KL-divergence: Classical BM Bound gradient D=2 Exact gradient (D is trained) D final = 2.5
Sampling from D-Wave Probabilities cross at the anticrossing Dickson et al., Nat. Commun. 4, 1903 (2013) Probabilities cross at the anticrossing
Conclusions: A quantum annealer can provide fast samples of quantum Boltzmann distribution QBM can be trained by sampling QBM may learn some distributions better than classical BM See arXiv:1601.02036