1 Variance Reduction via Lattice Rules By Pierre L’Ecuyer and Christiane Lemieux Presented by Yanzhi Li.

Slides:



Advertisements
Similar presentations
Shortest Vector In A Lattice is NP-Hard to approximate
Advertisements

3.3 Hypothesis Testing in Multiple Linear Regression
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Structural reliability analysis with probability- boxes Hao Zhang School of Civil Engineering, University of Sydney, NSW 2006, Australia Michael Beer Institute.
Fast Algorithms For Hierarchical Range Histogram Constructions
Modeling and Simulation Monte carlo simulation 1 Arwa Ibrahim Ahmed Princess Nora University.
Lecture 3 Nonparametric density estimation and classification
Visual Recognition Tutorial
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
CF-3 Bank Hapoalim Jun-2001 Zvi Wiener Computational Finance.
INTEGRALS Areas and Distances INTEGRALS In this section, we will learn that: We get the same special type of limit in trying to find the area under.
Generating Continuous Random Variables some. Quasi-random numbers So far, we learned about pseudo-random sequences and a common method for generating.
Data Basics. Data Matrix Many datasets can be represented as a data matrix. Rows corresponding to entities Columns represents attributes. N: size of the.
Parametric Inference.
Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.
1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray.
Fast integration using quasi-random numbers J.Bossert, M.Feindt, U.Kerzel University of Karlsruhe ACAT 05.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
6 6.3 © 2012 Pearson Education, Inc. Orthogonality and Least Squares ORTHOGONAL PROJECTIONS.
1 NTRU: A Ring-Based Public Key Cryptosystem Jeffrey Hoffstein, Jill Pipher, Joseph H. Silverman LNCS 1423, 1998.
Space-Filling DOEs Design of experiments (DOE) for noisy data tend to place points on the boundary of the domain. When the error in the surrogate is due.
Random Number Generation Pseudo-random number Generating Discrete R.V. Generating Continuous R.V.
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
Simulation Output Analysis
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
Summarized by Soo-Jin Kim
GROUPS & THEIR REPRESENTATIONS: a card shuffling approach Wayne Lawton Department of Mathematics National University of Singapore S ,
Traffic Modeling.
© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.
AN ORTHOGONAL PROJECTION
Chapter 4 Stochastic Modeling Prof. Lei He Electrical Engineering Department University of California, Los Angeles URL: eda.ee.ucla.edu
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Basic Numerical Procedures Chapter 19 1 Options, Futures, and Other Derivatives, 7th Edition, Copyright © John C. Hull 2008.
AGC DSP AGC DSP Professor A G Constantinides©1 Hilbert Spaces Linear Transformations and Least Squares: Hilbert Spaces.
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
Quasi-Monte Carlo Methods Fall 2012 By Yaohang Li, Ph.D.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Non-Bayes classifiers. Linear discriminants, neural networks.
ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.
CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)
Chapter 13 (Prototype Methods and Nearest-Neighbors )
1 Probability and Statistical Inference (9th Edition) Chapter 5 (Part 2/2) Distributions of Functions of Random Variables November 25, 2015.
Rounding scheme if r * j  1 then r j := 1  When the number of processors assigned in the continuous solution is between 0 and 1 for each task, the speed.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Heuristic Functions. A Heuristic is a function that, when applied to a state, returns a number that is an estimate of the merit of the state, with respect.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
SYSTEMS Identification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Chapter 13 Discrete Image Transforms
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Sums of Random Variables and Long-Term Averages Sums of R.V. ‘s S n = X 1 + X X n of course.
Tiling Spaces: Quasicrystals & Geometry Daniel Bragg.
1 Estimation Chapter Introduction Statistical inference is the process by which we acquire information about populations from samples. There are.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
The simple linear regression model and parameter estimation
Chapter 5 STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESES TESTING
Stochastic Streams: Sample Complexity vs. Space Complexity
Some General Concepts of Point Estimation
Sampling Distributions and Estimation
Background: Lattices and the Learning-with-Errors problem
Lecture 2 – Monte Carlo method in finance
Lattices. Svp & cvp. lll algorithm. application in cryptography
Lecture 4 - Monte Carlo improvements via variance reduction techniques: antithetic sampling Antithetic variates: for any one path obtained by a gaussian.
Nonparametric density estimation and classification
Sampling Plans.
Presentation transcript:

1 Variance Reduction via Lattice Rules By Pierre L’Ecuyer and Christiane Lemieux Presented by Yanzhi Li

2 Outline Motivation Lattice rules Functional ANOVA decomposition Lattice selection criterion Random shifts Examples Conclusions

3 Motivation - MC  E[f(U)] =  [0,1) t f(u)du where U is a t- dimensional vector of i.i.d. Unif(0,1) r.v. Monte Carlo method (MC) Sample n points uniformly in [0,1) t

4 Motivation - MC MC gives a convergence rate MC performs worse for larger t. A large amount of points required in order to be uniform sampling Can we do better? Quasi-Monte Carlo (QMC) Constructs the point set P n more evenly and use a relatively smaller number of points

5 Given fixed number of points, which one is better? MC QMC

6 Motivation - QMC D(P n ): measure the non-evenness of P n, discrepancy between P n and U |Q n -  |  V(f)D * (P n )=V(f)O(n -1 (lnn) t ) where V(f) is the total variation of f and D * (P n ) is the rectangular star discrepancy (D * (P n ) =, J is an interval) Performs better than MC asymptotically

7 Motivation - QMC Drawback For larger t, convergence rate better than that of MC only for impractically large values of n D * (P n ) is difficult to computer The bound is very loose for normal functions Good news Lower discrepancy point sets seem to effectively reduce the integration error, even for larger t

8 Motivation - Questions Why does QMC performs better than MC empirically? Fourier expansion Traditional error bound Variance reduction viewpoint How to select quasi-random point sets P n ? In particular, how to select integration lattice?

9 Lattice Rules (Integration) lattice Discrete set of the real space R t Contains Z t Dual lattice Lattice rule An integration method that approximates  by Q n using the node set P n =L t  [0,1) t

10 Lattice doesn’t mean well distributed

11 Lattice Rules - Integration error Fourier expansion of f with For lattice point set:

12 Functional ANOVA Decomposition Writes f(u) as a sum of orthogonal functions The variance  2 decomposed as

13 Functional ANOVA Decomposition The best mean-square approximation of f(.) by a sum of d-dimensional function is  |I|  d f I (.) f has a low effective dimension in the superposition sense when the approximation is good for small d, which is frequent This suggests that the point sets P n should be chosen on the basis of the quality of the distribution of the points over the subspace I that are deemed important When |I| is small it is possible to make sure P n (I) covers the subspace very well

14 Lattice Selection Criterion It is desirable that P n is (for rank-1 lattice) Fully projection-regular, i.e., for any non- empty I  {1,…,t}, P n (I) contains as many distinct points as P n Or dimension-stationary, i.e., P n ({i 1,…,i d }) = P n ({i 1 +j, …,i d +j}) for all i 1,..,i d and j One full projection-regular example is P n ={(j/n)v mod 1: 0  j<n} for v=(1,a,…,a t-1 ), where a is integer, 0<a<n and gcd(a,n)=1

15 Lattice Selection Criterion L t (I) is a lattice => its points are contained in families of equidistant parallel hyperplanes Choose the family that is farthest apart and let d t (I) be the distance between hyperplanes d t (I)=1/l I where l I is the Euclidean length of the shortest nonzero vector in the dual lattice L t * (I), which has a tight upper bound l d * (n)=c d n 1/d where d=|I| Define a figure of merit l I / l d * (n) so that we can compare the quality of projections of different dimensions

16 Lattice Selection Criterion Minimize d t (I)  Maximize l I / l d * (n) Worse-case figure of merit consideration For arbitrary d  1 and t 1  …  t d  d, define This means takes into account the projections over s successive dimensions for all s  t 1 and over more than d non-successive dimensions that are not too far apart

17

18 Random Shifts When P n is deterministic, the integration error is also deterministic and hard to estimate. To estimate the error, we use independent random shifts Generate a r.v. U~Unif[0,1) t and replace u i by u i ’=(u i +U) mod 1 Let P n ’={u 0 ’,…,u n-1 ’} and Q n ’=(1/n)  i=0 n-1 f(u i ’) Repeat this m times, independently, with the same P n thus obtaining m i.i.d copies of Q n ’, denoted X 1,…,X m

19 Random Shifts Let We have If  2 < , with the MC method, For a randomly shifted lattice rule

20 Random Shifts Since L t * contains exactly 1/n of the points of Z t, the randomly shifted lattice rule reduces the variance compared with MC  the “average” squared Fourier coefficients are smaller over L t * than over Z t, which is true for typical well-behaved functions The previous selection criterion is also aimed to avoid having small vectors h in the dual lattice L t * for the sets I deemed important

21 Example: Stochastic activity network Each arc k, 1  k  is an activity, with random duration ~ F k (  ) Estimate  =P[T  x] Generate N(A) unif[0,1) r.v.s., N(P) is the number of paths

22 Estimated Variance Reduction Factors w.r.t. MC MC: Monte Carlo LR: randomly shifted Lattice Rule CMC: Conditional Monte Carlo t: dimension of the integration n: number of points in P n m=100

23 Conclusions Explain the success of QMC with variance reduction instead of the traditional discrepancy measure Propose a new way of generating lattice, choosing parameters Pay more attention to subspace Things we don’t cover Rules of higher rank Polynomial lattice rule Massaging the problem

24 Variance Reduction via Lattice Rules Thank you! Q&A