Alan Edelman Oren Mangoubi, Bernie Wang Mathematics

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
Singular Values of the GUE Surprises that we Missed Alan Edelman and Michael LaCroix MIT June 16, 2014 (acknowledging gratefully the help from Bernie Wang)
P. Venkataraman Mechanical Engineering P. Venkataraman Rochester Institute of Technology DETC2014 – 35148: Continuous Solution for Boundary Value Problems.
Random Matrix Laws & Jacobi Operators Alan Edelman MIT May 19, 2014 joint with Alex Dubbs and Praveen Venkataramana (acknowledging gratefully the help.
Numerical Methods for Empirical Covariance Matrix Analysis Miriam Huntley SEAS, Harvard University May 15, Course Project.
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Bayesian statistics – MCMC techniques
1 Quantum Monte Carlo Methods Jian-Sheng Wang Dept of Computational Science, National University of Singapore.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Random Matrices Hieu D. Nguyen Rowan University Rowan Math Seminar
Stochastic Differentiation Lecture 3 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working Group on Continuous.
Dimensional reduction, PCA
A gentle introduction to fluid and diffusion limits for queues Presented by: Varun Gupta April 12, 2006.
Gaussian Information Bottleneck Gal Chechik Amir Globerson, Naftali Tishby, Yair Weiss.
Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion Mehmet Serkan Apaydin, Douglas L. Brutlag, Carlos.
Maximum likelihood (ML)
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Nonlinear Stochastic Programming by the Monte-Carlo method Lecture 4 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO.
Monte Carlo Methods in Partial Differential Equations.
Lecture II-2: Probability Review
Modern Navigation Thomas Herring
Diffusion Maps and Spectral Clustering
Reduced-order modeling of stochastic transport processes Materials Process Design and Control Laboratory Swagato Acharjee and Nicholas Zabaras Materials.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Random Matrix Theory Numerical Computation and Remarkable Applications Alan Edelman Mathematics Computer Science & AI Labs Computer Science & AI Laboratories.
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
Random Matrix Theory and Numerical Linear Algebra: A story of communication Alan Edelman Mathematics Computer Science & AI Labs ILAS Meeting June 3, 2013.
Mathematics for Computer Graphics (Appendix A) Won-Ki Jeong.
Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University
Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Percentile Approximation Via Orthogonal Polynomials Hyung-Tae Ha Supervisor : Prof. Serge B. Provost.
Stochastic Structural Dynamics and Some Recent Developments Y. K. Lin Center for Applied Stochastics Research Florida Atlantic University Boca Raton, FL.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 6.2: Kalman Filter Jürgen Sturm Technische Universität München.
Monte Carlo Simulation CWR 6536 Stochastic Subsurface Hydrology.
Monte Carlo I Previous lecture Analytical illumination formula This lecture Numerical evaluation of illumination Review random variables and probability.
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
9. Convergence and Monte Carlo Errors. Measuring Convergence to Equilibrium Variation distance where P 1 and P 2 are two probability distributions, A.
Stochastic Monte Carlo methods for non-linear statistical inverse problems Benjamin R. Herman Department of Electrical Engineering City College of New.
Monte Carlo Methods Versatile methods for analyzing the behavior of some activity, plan or process that involves uncertainty.
Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.
2.There are two fundamentally different approaches to this problem. One can try to fit a theoretical distribution, such as a GEV or a GP distribution,
Numerical Methods for Stochastic Networks Peter W. Glynn Institute for Computational and Mathematical Engineering Management Science and Engineering Stanford.
The Markov Chain Monte Carlo Method Isabelle Stanton May 8, 2008 Theory Lunch.
Reducing MCMC Computational Cost With a Two Layered Bayesian Approach
The Random Matrix Technique of Ghosts and Shadows Alan Edelman Dept of Mathematics Computer Science and AI Laboratories Massachusetts Institute of Technology.
Photo : Jean-François Dars Anne Papillault
Progress in the method of Ghosts and Shadows for Beta Ensembles Alan Edelman (MIT) Alex Dubbs (MIT) and Plamen Koev (SJS) Aug 10, 2012 IMS Singapore Workshop.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
(joint work with Ai-ru Cheng, Ron Gallant, Beom Lee)
Stochastic Streams: Sample Complexity vs. Space Complexity
Advanced Statistical Computing Fall 2016
Modeling and Simulation CS 313
Monte Carlo methods 10/20/11.
Stochastic Differential Equations and Random Matrices
Multidimensional Integration Part I
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Advances in Random Matrix Theory: Let there be tools
Monte Carlo I Previous lecture Analytical illumination formula
Lecture 4 - Monte Carlo improvements via variance reduction techniques: antithetic sampling Antithetic variates: for any one path obtained by a gaussian.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

On an Integral Geometry Inspired Method for Conditional Sampling from Gaussian Ensembles Alan Edelman Oren Mangoubi, Bernie Wang Mathematics Computer Science & AI Labs January 13, 2014

Talk Sandwich Stories ``Lost and Found”: Random Matrices in the years 1955-1965 Integral Geometry Inspired Method for Conditional Sampling from Gaussian Ensembles Demo: On the higher order correction of the distribution of the smallest singular value

Stories “Lost and Found” Random Matrices in the Years 1955-1965

Lost and Found Wigner thanks Narayana Ironically, Narayana (1930-1987) probably never knew that his polynomials are the moments for Laguerre (Catalan:Hermite :: Narayana:Laguerre) The statistics/physics links were severed Wigner knew Wishart matrices Even dubbed the GOE ``the Wishart set’’ Numerical Simulation was common (starting 1958) Art of simulation seems lost for many decades and then refound

Sir Ronald Alymer Fisher In the beginning… Statisticians found the Laguerre and Jacobi Ensembles John Wishart 1898-1956 Sir Ronald Alymer Fisher 1890-1962 Samarendra Nath Roy 1906-1964 Pao-Lu Hsu 1909-1970 Joint Eigenvalue Densities: real Laguerre and Jacobi Ensembles 1939 etc. Joint Element density

1951: Bargmann, Von Neumann carry the “Wishart torch” to Princeton [Goldstine and Von Neumann, 1951] Statistical Properties of Real Symmetric Matrices with Many Dimensions [Wigner, 1957]

Wigner referencing Wishart 1955-1957 GOE [Wigner, 1957]

Wigner and Narayana Marcenko-Pastur = Limiting Density for Laguerre Photo Unavailable Wigner and Narayana [Wigner, 1957] (Narayana was 27) Marcenko-Pastur = Limiting Density for Laguerre Moments are Narayana Polynomials! Narayana probably would not have known

Dyson (unlike Wigner) not concerned with statisticians Papers concern β =1,2,4 Hermite (lost touch with Laguerre and Jacobi) Terms like Wishart, MANOVA, Gaussian Ensembles probably severed ties Hermite, Laguerre, Jacobi unify

Dyson’s Needle in the Haystack

Dyson’s: Wishart Reference (We’d call it GOE) Dyson Brownian Motion

1964: Harvey Leff

RMT Monte Carlo Computation goes Way Back First Semi-circle plot (GOE) By Porter and Rosenzweig, 1960 Later Semicircle plot By Porter, 1963 Photo Unavailable Charles Porter, (1927-1964) PhD MIT 1953 (Los Alamos, Brookhaven National Laboratory ) Norbert Rosenzweig (1925-1977) PhD Cornell 1951 (Argonne National Lab)

First MC Experiments (1958) [Rosenzweig, 1958] [Blumberg and Porter, 1958]

Early Computations: especially level density & spacings Computer Year Facility FLOPS Reference GEORGE 1957 Argonne ? (Rosenzweig, 1958) IBM 704 1954 Los Alamos 12k (Blumberg and Porter, 1958) (Porter and Rosenzweig, 1960) IBM 7090 1959 Brookhaven 100k (Porter et al., 1963) Figure n # matrices Spacings= # x (n-1) Eigenvector Components = # x n^2 14 2 966 966 x 1 = 966 966 x 4 = 3,864 15 3 5117 5117 x 2 = 10,234 5117 x 9 = 46,053 16 4 1018 1018 x 3 = 3,054 1018 x 16 = 16,288 17 5 1573 1573 x 4 = 6,292 1573 x 25 = 39,325 18 10 108 108 x 9 = 972 108 x 100 = 10,800 19,20,21 20 181 181 x 11 = 1991 N/A 22 40 1 1 x 39 = 39 [Porter and Rosenzweig, 1960]

More Modern Spacing Plot 5000 60 x 60 matrices

Random Matrix Diagonalization 1962 Fortran Program [Fuchel, Greibach and Porter, Brookhaven NL-TR BNL 760 (T-282) 1962] QR was just about being invented at this time

On an Integral Geometry Inspired Method for Conditional Sampling from Gaussian Ensembles

Outline Motivation: General β Tracy-Widom Crofton’s Formula The Algorithm for Conditional Probability Special Case: Density Estimation Code Application: General β Tracy-Widom

Motivating Example: General β Tracy-Widom α=0 α=2/β α=.02 α=.04 α=.06 β=4 β=2 β=1

Motivating Example: General β Tracy-Widom α=0 α=2/β α=.02 α=.04 α=.6 β=4 β=2 β=1

Motivating Example: General β Tracy-Widom α=0 (Persson, Sutton, Edelman, 2013) Small α: Constant Coeff Convection Diffusion Key Fact: Can march forward in time by adding a new [constant x dW] to the operator Mystery: How to march forward the law itself. (This talk: new tool, mystery persists) Question: Conditioned on starting at a point, how do we diffuse? α=2/β α=.02 α=.04 α=.06 β=4 β=2 β=1

Need Algorithms for cases such as Non-Random Random same matrix nonrandom perturbation random scalar perturbation random vector perturbation Sampling Constraint (what we condition on) Derived Statistics (what we histogram) Can we do better than naïve discarding of data?

The Competition: Markov Chain Monte Carlo? MCMC: Design a Markov chain whose stationary distribution is the conditional probability for a very small bin. Need an auxiliary distribution Designing Markov chain with fast mixing can be very tricky Difficult to tell how many steps Markov chain needs to (approximately) converge Nonlinear solver needed Unless we can march along the constraint surface somehow

Conditional Probability on a Sphere -3 -3+ Conditional probability comes with a thickness e.g. is a ribbon surface

Crofton Formula for hypersurface volume random great circle (uniform) fixed manifold 𝑴 h Ambient dim = n 3 Great circle Curve 4 Surface 5 Hypersurface Morgan Crofton (1826-1915)

Ribbon Areas Conditional probability comes with a thickness e.g. a ribbon surface thickness= 1/gradient Ribbon are from Crofton + Layer Cake Lemma -3 -3+ 

Solving on Great Circles e.g. A = tridiagonal with random diagonal is spherically symmetric concentrates on generate random great circle every point on is an solve for on with h

The Algorithm at Work

The Algorithm at Work

The Algorithm at Work

The Algorithm at Work

The Algorithm at Work

The Algorithm at Work

The Algorithm at Work

The Algorithm at Work

Nonlinear Solver \

Conditional Probability Every point on the ribbon is weighed by the thickness Don’t need to remember how many great circles Let be any statistic e.g.,

Special Case: Density Estimation Want to compute probability density at a single point for some random variable Say, Naïve Approach: use Monte Carlo, and see what fraction of points land in bin Very slow if is small Say you want the n=216 truncation here ? max

Special Case: Density Estimation Conditional probability comes with a thickness e.g. a ribbon surface thickness= 1/gradient Ribbon are from Crofton + Layer Cake Lemma -3 -3+ 

A good computational trick is also a good theoretical trick….

Integral Geometry and Crofton’s Formula Rich History in Random Polynomial/Complexity Theory/Bezout Theory Kostlan, Shub, Smale, Rojas, Malajovich, more recent works… We used it in: How many roots of a random real- coefficient polynomial are real? Should find a better place in random matrix theory Bezout theorem V (smale and schub) Larry Guth What manifold? What exact application? “Edelman Costlan Schub—generalized eigenvalue problems”

Our Algorithm

Using the Algorithm, in Step 1: sampling constraint Step 2: derived statistic Step 4: parameters Step 3: ||gradient(sampling constraint)|| e.g., separate f for derived statistic, put the other example derived statistics, but commented out change gradient to norm of gradient Step 5: run the algorithm

Using the Algorithm, in Step 1: sampling constraint separate f for derived statistic, put the other example derived statistics, but commented out change gradient to norm of gradient

Using the Algorithm, in Step 2: derived statistic separate f for derived statistic, put the other example derived statistics, but commented out change gradient to norm of gradient

Using the Algorithm, in Step 3: ||gradient(sampling constraint)|| e.g., separate f for derived statistic, put the other example derived statistics, but commented out change gradient to norm of gradient

Using the Algorithm, in Step 4: parameters separate f for derived statistic, put the other example derived statistics, but commented out change gradient to norm of gradient

Using the Algorithm, in Step 5: run the algorithm note: here r = -2.338*3/2

Conditional Probability Example: Evolving Tracy-Widom is equivalent to where N=10^4 beta=2; beta2= 1; h=N^(-1/3); x=[0:h:10]; n = length(x) b=(1/h^2)*ones(n-1); A=-(2/h^2)*ones(n); A=A-x; T = SymTridiagonal(A+(2/sqrt(beta))*a_t*sqrt(h)/h + (sqrt(4/beta2-4/beta))*randn(n)*sqrt(h)/h,b) Discretized this is a tridiagonal matrix. Step 1: We can condition on the largest eigenvalue. Step 2: We can add to the diagonal and histogram the new eigenvalue

Conditional Probability Example: Numerical Example Results Want conditional density By “evolving” the same samples that we used for estimating the density we can also generate a histogram of the conditional density Conditioned TW TW2 superimposed on 𝑓(𝜆(𝛽=1)|𝜆 𝛽=2 =−2.338) Alan will add a slide with TW for lots of different betas Conditioned TW for ½ airy root, 0, and 1.5 airy root, with shifted painleve to compare TW2 (Painleve) Airy Root

Condition on Evolve β=2 spike to β=1 @β=2 reference TW2 translated to diffusion of @β=2 to β=1 Condition at β=2 TW2+ζ/2 TW2 TW2-ζ/2 TW2-ζ just for reference (significance of λ1= ζ) watch blue curves convect & diffuse from black spikes strong convection weak diffusion weak convection strong diffusion

Complexity Comparison: Suppose we reduce the bin size – we can imagine some physical Catastrophic System Failure cases Naïve Algorithm Log scale Great Circle Algorithm Note: r = -10*error, error = ½ bin size Note: smallest two bins are extrapolated for the naïve algorithm, but all bin sizes are computed for the great circle algorithm Smaller bin sizes cause the naïve algorithm to be very wasteful. Great circle algorithm hardly cares.

Possible Extension: Conditioning on large numbers of variables Higher Dimensional versions of Crofton’s formula Intersections of higher dimensional spheres with lower dimensional manifolds

Applications MLE for covariance matrix rank estimation Most covariance matrix models do not have analytical solution for eigenvalue densities Heavy tailed random matrices Molecular interaction simulations (conditioning on the rare phase change) Stochastic PDE (also functions of ) Weather simulation (conditioning on today’s incomplete weather, what is the probability of rain tomorrow?) Probability of airplane crashing (rare event) Deriving theoretical bounds for conditional probability ?? Other theory??

Acknowledgements NDSEG Fellowship Air Force Office of Scientific Research NSF DMS 1035400 and DMS 1016125