Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 O. Teytaud *, S. Gelly *, S. Lallich **, E. Prudhomme ** *Equipe I&A-TAO, LRI, Université Paris-Sud, Inria, UMR-Cnrs 8623 **Equipe ERIC, Université Lyon.

Similar presentations


Presentation on theme: "1 O. Teytaud *, S. Gelly *, S. Lallich **, E. Prudhomme ** *Equipe I&A-TAO, LRI, Université Paris-Sud, Inria, UMR-Cnrs 8623 **Equipe ERIC, Université Lyon."— Presentation transcript:

1 1 O. Teytaud *, S. Gelly *, S. Lallich **, E. Prudhomme ** *Equipe I&A-TAO, LRI, Université Paris-Sud, Inria, UMR-Cnrs 8623 **Equipe ERIC, Université Lyon 2 Email : teytaud@lri.fr, gelly@lri.fr,teytaud@lri.frgelly@lri.fr stephane.lallich@eric.univ-lyon2.fr, eprudhomme@eric.univ-lyon2.freprudhomme@eric.univ-lyon2.fr Quasi-random resampling

2 2 What is the problem ? Many tasks in AI are based on random resamplings : ● cross-validation ● bagging ● bootstrap ●... Resampling is time-consuming ● cross-validation for choosing hyper-parameters ● bagging in huge datasets ==> we want to have with n resamplings the same result than with N>>n resamplings

3 3 A typical example You want to learn a relation x--> y on a huge dataset. The dataset is too large for your favorite learner. A traditional solution is subagging : average 100 learnings performed on random subsamples (1/20) of your dataset We propose : use QR-sampling to average only 40 learnings.

4 4 Organization of the talk (1) why resampling is Monte-Carlo integration (2) quasi-random numbers (3) quasi-random numbers in strange spaces (4) applying quasi-random numbers in resampling (5) when does it work and when doesn't it work ?

5 5 Why resampling is Monte-Carlo integration What is Monte-Carlo integration : E f(x) sum f(x(i)) / n What is cross-validation: Error-rate E f(x) sum f(x(i)) / n where f(x) = error rate with the partitionning x

6 6 An introduction to QR-numbers (1) why resampling is Monte-Carlo integration (2) quasi-random numbers (3) quasi-random numbers in strange spaces (4) applying quasi-random numbers in resampling (5) when does it work and when doesn't it work ?

7 7 QR-numbers (2) quasi-random numbers (less randomized numbers) We have seen that resampling is Monte-Carlo integration, now we will see how Monte-Carlo integration has been strongly improved.

8 8 Quasi-random numbers ? Random samples in [0,1]^d can be not-so-well distributed --> error in Monte-Carlo integration O(1/ n) with n the number of points Pseudo-random samples random samples (we try to be very close to pure random) Quasi-random samples O(1/n) within logarithmic factors --> we don't try to be as close as possible to random --> number of samples much smaller for a given precision

9 Quasi-random = low discrepancy ? Discrepancy = Max |Area – Frequency |

10 A better discrepancy ? Discrepancy2 = mean ( |Area – Frequency |2 )

11 Existing bounds on low- discrepancy-Monte-Carlo Random --> Discrepancy ~ sqrt ( 1/n ) Quasi-random --> Discrepancy ~ log(n)^d/n Koksma & Hlawka : error in Monte-Carlo integration < Discrepancy x V V= total variation (Hardy & Krause) ( many generalizations in Hickernel, A Generalized Discrepancy and Quadrature Error Bound, 1997 )

12 12 Which set do you trust ?

13 13 Which quasi-random numbers ? « Halton-sequence with a simple scrambling scheme » ● fast (as fast as pseudo-random numbers) ; ● easy to implement ; ● available freely if you don't want to implement it. (we will not detail how this sequence is built here) (also: Sobol sequence)

14 14 What else than Monte-Carlo integration ? Thanks to various forms of quasi-random : ● Numerical integration [thousands of papers; Niederreiter 92] ● Learning [Cervellera et al, IEEETNN 2004, Mary phD 2005] ● Optimization [Teytaud et al, EA'2005] ● Modelizat° of random-process [Growe-Kruska et al, BPTP'03, Levy's method] ● Path planning [Tuffin]

15 15... and how to do in strange spaces ? (1) why resampling is Monte-Carlo integration (2) quasi-random numbers (3) quasi-random numbers in strange spaces (4) applying quasi-random numbers in resampling (5) when does it work and when doesn't it work ?

16 16 Have fun with QR in strange spaces (3) quasi-random numbers in strange spaces We have seen that resampling is Monte-Carlo integration, and how Monte-Carlo is replaced by Quasi- Random Monte-Carlo. But resampling is random in a non-standard space. We will see how to do Quasi-Random Monte- Carlo in non-standard spaces.

17 17 Quasi-random numbers in strange spaces We have seen hypercubes :

18 18... but we need something else ! Sample of points---> QR sample of points Sample of samples---> QR sample of samples

19 19 Quasi-random points in strange spaces Fortunately, some QR-points exist also in various spaces.

20 20 Why not in something isotropic ? How to do it in the sphere ? Or for gaussian distributions ?

21 21 For the gaussian : easy ! Generate x in [0,1]^d by quasi-random Build y: P( N < y(i) ) = x(i) It works because distrib = product of distrib(y(i)) What in the general case ?

22 22 Ok ! - generate x in [0,1]^d - define y(i) such that P(t<y(i) | y(1), y(2),..., y(i-1))=x(i) Ok !

23 23 However, we will do that ● We do not have better than this general method for the strange distributions in which we are interested ● At least we can prove the O(1/n) property (see the paper) ● Perhaps there is much better ● Perhaps there is much simpler

24 24 The QR-numbers in resampling (4) applying quasi-random numbers in resampling we have seen that resampling is Monte-Carlo integration, that we were able of generating quasi-random points for any distribution in continuous domains; ==> it should work ==> let's see in details how to move the problem to the continuous domain

25 25 QR-numbers in resampling A very particular distribution for QR-points : bootstrap samples. How to move the problem to continuous spaces ? y(i) = x(r(i)) where r(i) = randomly uniformly distributed in [[1,n]] ==> this is discrete

26 26 QR-numbers in resampling A very particular distribution for QR-points : bootstrap samples. How to move the problem to continuous spaces ? We know : We need : y(i) = x(r(i)) where r(i) = randomly uniformly distributed in [[1,n]] --> many solutions exist Rectangular uniform distribution Any continuous distribution Continuous distribution Our discrete distribution

27 27 What are bootstrap samples ? Our technique works for various forms of resamplings : - subsamples without replacement (random-CV, subagging) - subsamples with replacement (bagging, bootstrap) - random partitionning (k-CV). W.l.o.g., we present here the sampling of n elements in a sample of size n with replacement (= bootstrap resampling). (usefull in e.g. Bagging, bias/variance estimation...)

28 28 A naive solution y(i) = x(r(i)) r(1),...,r(n) = ceil( n x qr ) where qr [0,1]^n QR in dimension n with n the number of examples. 0.1 0.9 0.84 0.9 0.7

29 29 A naive solution y(i) = x(r(i)) r(1),...,r(n) = ceil( n x qr ) where qr [0,1]^n QR in dimension n with n the number of examples. 0.1 0.9 0.84 0.9 0.7

30 30 A naive solution y(i) = x(r(i)) r(1),...,r(n) = ceil( n x qr ) where qr [0,1]^n QR in dimension n with n the number of examples. 0.1 0.9 0.84 0.9 0.7

31 31 A naive solution y(i) = x(r(i)) r(1),...,r(n) = ceil( n x qr ) where qr [0,1]^n QR in dimension n with n the number of examples. 0.1 0.9 0.84 0.9 0.7

32 32 A naive solution y(i) = x(r(i)) r(1),...,r(n) = ceil( n x qr ) where qr [0,1]^n QR in dimension n with n the number of examples. 0.1 0.9 0.84 0.9 0.7 ==> (1, 0,0,1,3)

33 33 A naive solution y(i) = x(r(i)) r(1),...,r(n) = ceil( n x qr ) where qr [0,1]^n QR in dimension n with n the number of examples. ==> all permutations of ( 0.1, 0.9, 0.84, 0.9, 0.7) lead to the same result !

34 34...which does not work. In practice it does not work better than random. Two very distinct QR-points can lead to very similar resamples (permutation of a point lead to the same sample). We have to remove this symetry.

35 35 A less naive solution z(i) = number of times x(i) appears in the bootstrap sample z(1) = binomial z(2) | z(1) = binomial z(3) | z(1), z(2) = binomial... z(n-1) | z(1), z(2),...,z(n-2) = binomial z(n) | z(1), z(2),..., z(n) = constant ==> yes, it works ! ==> moreover, it works for many forms of resamplings and not only bootstrap !

36 36 With dimension-reduction it's better Put x(i)'s in k clusters z(i) = number of times an element of cluster i appears in the bootstrap sample z(1) = binomial z(2) | z(1) = binomial z(3) | z(1), z(2) = binomial... z(k-1) | z(1), z(2),...,z(k-2) = binomial z(k) | z(1), z(2),..., z(k) = constant (then, randomly draw the elements in each cluster)

37 37 Let's summarize Put x(i)'s in k clusters z(i) = number of times an element of cluster i appears in the bootstrap sample z(1) = binomial z(2) | z(1) = binomial... z(k) | z(1), z(2),..., z(k) = constant we quasi-randomize this z(1),...,z(k) Then, we randomly draw the elements in each cluster.

38 38 Let's conclude (1) why resampling is Monte-Carlo integration (2) quasi-random numbers (3) quasi-random numbers in strange spaces (4) applying quasi-random numbers in resampling (5) when does it work and when doesn't it work ?

39 39 Experiments In our (artificial) experiments : ● QR-randomCV is better than randomCV ● QR-bagging is better thanbagging ● QR-subaggingis better thansubagging ● QR-Bsfdis better thanBsfd (a bootstrap) But QR-kCV is not better than kCV kCV already has some derandomization: each point appears the same number of times in learning

40 40 A typical example You want to learn a relation x--> y on a huge ordered dataset. The dataset is too large for your favorite learner. A traditional solution is subagging : average 100 learnings performed on random subsets (1/20) of your dataset We propose : use QR-sampling to average only 40 learnings. Or do you have a better solution for choosing 40 subsets of 1/20 ?

41 41 Conclusions Therefore: ● perhaps simpler derandomizations are enough ? ● perhaps in cases like CV in which « symetrizing » (picking each example the same number of times) is easy, this is useless ? For bagging, subagging, bootstrap, simplifying the approach is not so simple ==> now, we use QR-bagging, QR-subagging and QR- bootstrap instead of bagging, subbagging and bootstrap

42 42 Further work Real-world experiments (in progress, for DP-applications) Other dimension reduction (this one involves clustering) Simplified derandomization methods (jittering, antithetic variables,...) Random clustering for dimension reduction ? (yes, we have not tested, sorry...)

43 Low Discrepancy ? Random --> Discrepancy ~ sqrt ( 1/n ) Quasi-random --> Discrepancy ~ log(n)^d/n Koksma & Hlawka : error in Monte-Carlo integration < Discrepancy x V V= total variation (Hardy & Krause) ( many generalizations in Hickernel, A Generalized Discrepancy and Quadrature Error Bound, 1997 )

44 Dimension 1 ● What would you do ?

45 Dimension 1 ● What would you do ?

46 Dimension 1 ● What would you do ?

47 Dimension 1 ● What would you do ?

48 Dimension 1 ● What would you do ?

49 Dimension 1 ● What would you do ?

50 Dimension 1 ● What would you do ? ● --> Van Der Corput ● n=1, n=2, n=3... ● n=1, n=10, n=11, n=100, n=101, n=110... ● x=.1, x=.01, x=.11, x=.001, x=.101,...

51 Dimension 1 more general ● p=2, but also p=3, 4,... but p=13 is not very nice :

52 Dimension n ● x --> (x,x) ?

53 Dimension n ● x --> (x,x') ?

54 Dimension n : Halton ● x --> (x,x') with prime numbers

55 Dimension n+1 : Hammersley ● x --> (n/N,x,x') but --> closed sequence

56 Dimension n : the trouble ● There are not so many small prime numbers

57 Dimension n : scrambling (when random comes back) ● Pi(p) : [1,p-1] --> [1,p-1] ● Pi(p) applied to ordinate with prime p

58 Dimension n : scrambling ● Pi(p) : [1,p-1] --> [1,p-1] (randomly chosen) ● Pi(p) applied to ordinate with prime p (there is much more complicated)


Download ppt "1 O. Teytaud *, S. Gelly *, S. Lallich **, E. Prudhomme ** *Equipe I&A-TAO, LRI, Université Paris-Sud, Inria, UMR-Cnrs 8623 **Equipe ERIC, Université Lyon."

Similar presentations


Ads by Google