Download presentation
Presentation is loading. Please wait.
1
Recent Progress On Sampling Problem
Yin Tat Lee (MSR/UW), Santosh Vempala (Gatech)
2
My Dream Tell the complexity of a convex problem by looking at the formula. Example Minimum Cost Flow Problem: This is a linear program, each row has two non-zero. It can be solved in π (π π ). [LS14] (Previous: π (π π ) for graph with π edges and π vertices.)
3
My Dream Tell the complexity of a convex problem by looking at the formula. Example Submodular Minimization: where π satisfies diminishing return, i.e. π π+π βπ π β€π π+π βπ π βπβπ, πβπ. π can be extended to a convex function on 0,1 π . subgradient of π can be computed in π 2 time. It can be solved in π ( π 3 ). [LSW15] (Previous: π ( π 5 )) Fundamental in combinatorial optimization. Worth β₯2 Fulkerson prizes
4
Algorithmic Convex Geometry
To describe a formula, we need some operations. Given a convex set πΎ, we have following operations Membership(x): Check if π₯βπΎ. Separation(x): Assert π₯βπΎ, or find a hyperplane separate π₯ and πΎ. Width(c): Compute min π₯βπΎ π π π₯. Optimize(c): Compute argmin π₯βπΎ π π π₯. Sample(g): Sample according to π π₯ 1 πΎ . (assume π is logconcave) Integrate(g): Compute πΎ π π₯ ππ₯ . (assume π is logconcave) Theorem: They are all equivalent by polynomial time algorithms. One of the Major Source of Polynomial Time Algorithms!
5
Algorithmic Convex Geometry
Traditionally viewed as impractical. Now, we have an efficient version of ellipsoid method. Algorithmic Convex Geometry Why those operations? For any convex π, define the dual π β π = argmax π₯ π π π₯βπ(π₯), and π πΎ =β 1 πΎ π . Progress: We are getting the tight polynomial equivalence between left 4. Membership π πΎ (π₯) Integration πΎ π π₯ ππ₯ Today Focus Width π πΎ β (π) Separation π π πΎ (π₯) Sample ~ π βπ 1 πΎ Optimization π π πΎ β (π) Convex Optimization
6
Problem: Sampling Input: a convex set πΎ.
Output: sample a point from the uniform distribution on K. Generalized Problem: Input: a logconcave distribution π Output: sample a point according to π. Why? useful for optimization, integration/counting, learning, rounding. Best way to minimize convex function with noisy value oracle. Only way to compute volume of convex set.
7
Non-trivial application: Convex Bandit
Game: For each round π‘=1,2,β―,π, the player Adversary selects a convex loss function β π‘ Chooses (possibly randomly) π₯ π‘ from unit ball in n dim based on past observations. Receives the loss/observation β π‘ π₯ π‘ β[0,1]. Nothing else about β π‘ is revealed! Measure performance by regret: There is a good fixed action, but We only learn one point each iteration! Adversary can give confusing information! SΓ©bastien Bubeck Ronen Eldan The gold of standard is getting π( π ). Namely, π¦ π is better than π¦ π 2/3 .
8
Non-trivial application: Convex Bandit
Game: For each round π‘=1,2,β―,π, the player Adversary selects a convex loss function β π‘ Chooses (possibly randomly) π₯ π‘ from unit ball in n dim based on past observations. Receives the loss/observation β π‘ π₯ π‘ β[0,1]. Nothing else about β π‘ is revealed! Measure performance by regret: After a decade of research, we have π
π = π π . (The first polynomial time and regret algorithm.) SΓ©bastien Bubeck Ronen Eldan The gold of standard is getting π( π ). Namely, π¦ π is better than π¦ π 2/3 .
9
How to Input the set Oracle Setting:
A membership oracle: answer YES/NO to βπ₯βπΎβ. A ball π₯+ππ΅ such that π₯+ππ΅βπΎβπ₯+poly π ππ΅. Explicit Setting: Given explicitly, such as polytopes, spectrahedrons, β¦ In this talk, we focus on polytope {π΄π₯β₯π}. (m = # constraints)
10
Outline Oracle Setting: Explicit Setting: (original promised talk)
Introduce the ball walk KLS conjecture and its related conjectures Main Result Explicit Setting: (original promised talk) Introduce the geodesic walk Bound the # of iteration Bound the cost per iteration
11
Sampling Problem Input: a convex set πΎ with a membership oracle Output: sample a point from the uniform distribution on K. Conjectured Lower Bound: π 2 . Generalized Problem: Given a logconcave distribution π, sampled π₯ from π.
12
Conjectured Optimal Algorithm: Ball Walk
At π₯, pick random π¦ from π₯+πΏ π΅ π , if π¦ is in πΎ, go to π¦. otherwise, sample again This walk may get trapped on one side if the set is not convex.
13
Isoperimetric constant
For any set πΎ, we define the isoperimetric constant π πΎ by π πΎ = min π Area(ππ) minβ‘(vol π ,vol π π ) Theorem Given a random point in πΎ, we can generate another in π( π πΏ 2 π πΎ 2 log(1/π)) iterations of Ball Walk where πΏ is step size. π πΎ or πΏ larger, mix better. πΏ cannot be too large, otherwise, fail probability is ~1. π large, hard to cut the set π small, easy to cut the set
14
Isoperimetric constant of Convex Set
Note that π πΎ is not affine invariant and can be arbitrary small. However, you can renormalize πΎ such that Cov πΎ =πΌ. Definition: πΎ is isotropic, if it is mean 0 and Cov πΎ =πΌ. Theorem: If πΏ< π , ball walk stays inside the set with constant probability. Theorem: Given a random point in isotropic πΎ, we can generate another in π( π 2 π πΎ 2 log(1/π)) To make body isotropic, we can sample the body to compute covariance. L 1 π πΎ =1/πΏ.
15
KLS Conjecture Kannan-LovΓ‘sz-Simonovits Conjecture: For any isotropic convex πΎ, π πΎ =Ξ©(1). If this is true, Ball Walk takes O( π 2 ) iter for isotropic πΎ (Matched the believed information theoretical lower bound.) To get the βtightβ reduction from membership to sampling, it suffices to prove KLS conjecture
16
KLS conjecture and its related conjectures
Slicing Conjecture: Any unit volume convex set πΎ has a slice with volume Ξ©(1). Thin-Shell Conjecture: For isotropic convex πΎ, πΌ( π₯ β π 2 )=π(1). Generalized Levy concentration: For logconcave distribution π, 1-Lipschitz π with πΌπ=0, β |π π₯ βπΌπ|>π‘ =exp(βΞ© π‘ ). Essentially, it is asking if all convex sets looks like ellipsoids.
17
Do you know better way to bound mixing time of ball walk?
Main Result What if we cut the body by sphere only? π πΎ β π πππ π β₯ π πΎ [Lovasz-Simonovits 93] π=Ξ© 1 π β1/2 . [Klartag 2006] π=Ξ© 1 π β1/2 log 1/2 π. [Fleury, Guedon, Paouris 2006] π=Ξ© 1 π β1/2 log 1/6 π log β2 log π . [Klartag 2006] π=Ξ©(1) π β0.4 . [Fleury 2010] π=Ξ©(1) π β [Guedon, Milman 2010] π=Ξ©(1) π β [Eldan 2012] π= Ξ© 1 π= Ξ© (1) π β [Lee Vempala 2016] π=Ξ© 1 π β In particular, we have π( π 2.5 ) mixing for ball walk. Do you know better way to bound mixing time of ball walk?
18
Outline Oracle Setting: Explicit Setting: Introduce the ball walk
KLS conjecture and its related conjectures Main Result Explicit Setting: Introduce the geodesic walk Bound the # of iteration Bound the cost per iteration
19
Problem: Sampling Input: a polytope with π constraints and π variables. Output: sample a point from the uniform distribution on K. {π΄π₯β₯π} Iterations Time/Iter Polytopes KN09 Dikin walk ππ π π 1.38 LV16 Ball walk π 2.5 ππ LV16 Geodesic walk π π 0.75 π π 1.38 First sub-quadratic algorithm. Cost of matrix inversion
20
How does nature mix particles?
Brownian Motion. It works for sampling on β π . However, convex set has boundary ο. Option 1) Reflect it when you hit the boundary. However, it need tiny step for discretization.
21
How does the nature mixes particle?
Brownian Motion. It works for sampling on β π . However, convex set has boundary ο. Option 2) Remove the boundary by blowing up. However, this requires explicit polytopes.
22
Blowing Up? After blow up Non-Uniform Distribution on Real Real Line
The distortion makes the hard constraint becomes βsoftβ. Original Polytope Uniform Distribution on [0,1]
23
Enter Riemannian manifolds
π-dimensional manifold M is an π-dimensional surface. Each point π has a tangent space π π π of dimension π, the local linear approximation of M at π; tangents of curves in π lie in π π π. The inner product in π π π depends on π: π’,π£ π Informally, you can think it is like assigning an unit ball for every point
24
Enter Riemannian manifolds
Each point π has a linear tangent space π π π. The inner product in π π π depends on π: π’,π£ π Length of a curve π: 0,1 βπ is πΏ π = π ππ‘ π π‘ π(π‘) ππ‘ Distance π(π₯,π¦) is the infimum over all paths in M between x and y.
25
βGeneralizedβ Ball Walk
At x, pick random y from π· π₯ where π· π₯ ={π¦:π π₯,π¦ β€1}.
26
Hessian manifold π 3 π 2 π 1 Hessian manifold: a subset of β π with inner product defined by π’,π£ π = π’ π π» 2 π π π£. For polytope π π π π₯β₯ π π βπ , we use the log barrier function π π₯ = π=1 π log( 1 π π π₯ ) π π π₯ = π π π π₯β π π is the distance from π₯ to constraint π π blows up when π₯ close to boundary Our walk is slower when it is close to boundary.
27
random walk on real line
Suggested algorithm At x, pick random y from π· π₯ where π· π₯ ={π¦:π π₯,π¦ β€1} is induced by log barrier. Doesnβt work! Converges to the boundary since the volume of βboundaryβ is +β. (Called Dikin Ellipsoid) Corresponding Hessian Manifold random walk on real line Original Polytope
28
Getting Uniform Distribution
Lemma If π π₯βπ¦ =π(π¦βπ₯), then stationary distribution is uniform. To make a Markov chain π symmetric, we use π π₯βπ¦ = min π π₯βπ¦ ,π π¦βπ₯ ππ π₯β π¦ β―ππ π₯=π¦ . To implement it, we sample π¦ according to π(π₯βπ¦) if π π₯βπ¦ <π π¦βπ₯ , go to π¦. Else, we go to π¦ with probability π π¦βπ₯ /π(π₯βπ¦); Stay at x otherwise.
29
Dikin Walk At x, pick random y from π· π₯ if π₯β π· π¦ , reject π¦ else, accept π¦ with probability min(1, vol π· π₯ vol π· π¦ ). [KN09] proved it takes π (ππ) steps. Better than the previous best π π 2.5 for oracle setting. [Copied from KN09]
30
Dikin Walk and its Limitation
At x, pick random y from π· π₯ if π₯β π· π¦ , reject π¦ else, accept π¦ with probability min(1, vol π· π₯ vol π· π¦ ). Dikin Walk and its Limitation Dikin ellipsoid is fully contained in πΎ. Idea: Pick next step y from a blown-up Dikin ellipsoid. Can afford to blow up by ~ π/ log π . WHP π¦βπΎ. In high dimension, volume of π· π₯ is not that smooth. (Worst case 0,1 π ) Any larger step makes the success probability exponentially small! 0,1 π is the worst case for ball walk, hit-and-run, Dikin walk ο.
31
Going back to Brownian Motion
At x, pick random y from π· π₯ if π₯β π· π¦ , reject π¦ else, accept π¦ with probability min(1, vol π· π₯ vol π· π¦ ). Going back to Brownian Motion The walk is not symmetric in the βspaceβ. Tendency of going to center. Taking step size to 0, Dikin walk becomes a stochastic differential equation: π π₯ π‘ =π π₯ π‘ ππ‘+π π₯ π‘ π π π‘ where π π₯ π‘ = π β²β² π₯ π‘ β1/2 and π( π₯ π‘ ) is the drift towards center. Corresponding Hessian Manifold Original Polytope
32
What is the drift? Fokker-Planck equation
The probability distribution of the SDE π π₯ π‘ =π π₯ π‘ ππ‘+π π₯ π‘ π π π‘ is given by ππ ππ‘ π₯,π‘ =β π ππ₯ π π₯ π π₯,π‘ π 2 π π₯ 2 π 2 π₯ π π₯,π‘ . To make the stationary distribution constant, we need β π ππ₯ π π₯ π 2 π π₯ 2 π 2 π₯ =0 Hence, we have π π₯ =ππβ².
33
A New Walk A new walk: π₯ π‘+β = π₯ π‘ +ββ
π π₯ π‘ +π π₯ π‘ π with π~π(0,βπΌ). It doesnβt make sense.
34
Exponential map Exponential map exp π : π π πβ π is defined as
πΎ π£ : unique geodesic (shortest path) from p with initial velocity π£.
35
Anyway to avoid using filter?
Geodesic Walk A new walk: π₯ π‘+β = exp π₯ π‘ (β/2β
π π₯ π‘ +π π₯ π‘ π) with π~π(0,βπΌ). However, this walk has discretization error. So, we do a metropolis filter after. Since our walk is complicated, the filter is super complicated. Anyway to avoid using filter?
36
Outline Oracle Setting: Explicit Setting: (original promised talk)
Introduce the ball walk KLS conjecture and its related conjectures Main Result Explicit Setting: (original promised talk) Introduce the geodesic walk Bound the # of iteration Bound the cost per iteration
37
Geodesic Walk A new walk: π₯ π‘+β = exp π₯ π‘ (β/2β
π π₯ π‘ +π)
with π~π(0,βπΌ). Geodesic is better than βstraight lineβ: It extends infinitely. It gives a massive cancellation.
38
Key Lemma 1: Provable Long Geodesic
Straight line defines finitely; Geodesic defines infinitely. Thm [LV16]: For manifold induced by log barrier, a random geodesic πΎ starting from π₯ satisfies π π π πΎ β² π‘ β€π( π β 1 4 )( π π π π₯βπ) for 0β€π‘β€ π ( π 1/4 ). Namely, the geodesic is well behavior for a long time. Remark: If central path in IPM had this, we have a π 5/4 time algorithm for MaxFlow!
39
Key Lemma 2: Massive Cancellation
Consider a SDE on 1 dimensional real line (NOT manifold) π π₯ π‘ =π π₯ π‘ ππ‘+π π₯ π‘ π π π‘ . How good is the βEuler methodβ, namely π₯ 0 +βπ π₯ 0 + β π π₯ 0 π? By βTaylorβ expansions, we have π₯ β = π₯ 0 +βπ π₯ 0 + β π π₯ 0 π+ β 2 π β² π₯ 0 π π₯ 0 π 2 β1 +π β If π β² π₯ 0 β 0, the error is π(β). If π β² π₯ 0 =0, the error is π( β 1.5 ). For geodesic walk, π β² π₯ 0 =0 (Christoffel symbols vanish in normal coordinates)
40
Is high order method for SDE used in MCMC?
Convergence Theorem Thm [LV16]: For log barrier, the geodesic walk mixes in π π π 0.75 steps. Thm [LV16]: For log barrier on 0,1 π , it mixes in π ( π 1/3 ) steps. ο The best bound for ball-walk, hit-and-run and Dikin walk is π( π 2 ) steps for 0,1 π . Our walk is similar to Milstein method. Is high order method for SDE used in MCMC?
41
Outline Oracle Setting: Explicit Setting: (original promised talk)
Introduce the ball walk KLS conjecture and its related conjectures Main Result Explicit Setting: (original promised talk) Introduce the geodesic walk Bound the # of iteration Bound the cost per iteration
42
How to implement the algorithm
Can we simply do Taylor expansion? In high dim, it may take π π time to compute the π π‘β derivatives. How to implement the algorithm In tangent plane at x, pick π€βΌ π π₯ (0,πΌ), i.e. standard Gassian in β. β π₯ Compute π¦= exp π₯ β 2 π π₯ + β π€ Accept with probability Min 1, π π¦ β π₯ π π₯ β π¦ How to compute geodesic and rejection probability? Need high accuracy for rejection probability ο due to βdirectednessβ. Geodesic is given by geodesic equation; probability is given by Jacobi field.
43
Collocation Method for ODE A weakly polynomial time algorithm for some ODEs
Consider the ODE π¦ β² =π π‘,π¦(π‘) with π¦ 0 = π¦ 0 . Given a degree π poly π and distinct points π‘ 1 , π‘ 2 ,β―, π‘ π , let π(π) be the unique degree π poly π s.t. π β² π‘ =π(π‘,π(π‘)) on π‘= π‘ 1 , π‘ 2 ,β―, π‘ π π 0 =π(0). Lem [LV16]: π is well defined. If π‘ π are Chebyshev points on [0,1], then πΏππ π =π πΏππ π . Thm [LV16]: If πΏππ π β€0.001, we can find a fix point of π efficiently. p
44
π(π log 2 π π β1 ) with π(πlog π π β1 ) evaluations of π.
Collocation Method for ODE A weakly polynomial time algorithm for some ODEs Consider the ODE π¦ β² =π π‘,π¦(π‘) with π¦ 0 = π¦ 0 . Thm [LV16]: Suppose that πΏππ π β€0.001 There is a degree π poly π such that π β² β π β² β€π. Then, we can find a π¦ such that π¦βπ¦ 1 =π(π) in time π(π log 2 π π β1 ) with π(πlog π π β1 ) evaluations of π. Remark: No need to compute πβ²! In general, the runtime is π (ππ Lip π(1) (π)) instead.
45
How can I bound the πππ ππ derivatives?
For 1 variable function, we can estimate π π‘β derivatives easily. Idea: reduce estimating derivatives of general functions to 1 variable. In general, we write πΉ β€ π₯ π D π πΉ(π₯) β€ π π 0 . Calculus rule: πΉ β€ π₯ π and πΊ β€ πΉ(π₯) π, then πΊβπΉ β€ π₯ πβ(πβπ 0 ).
46
Implementation Theorem
Using the trick before, we show geodesic can be approximated by π (1) degree poly. Hence, collocation method finds in π (1) steps. Thm [LV16]: If ββ€ π β1/2 , 1 step of Geodesic walk can be implemented in matrix multiplication time. For hypercube, ββ€ π (1) suffices.
47
Questions We have no background on numerical ODE/SDE and RG. So, the running time should be improvable easily. How to avoid the filtering step? Is there way to tell a walk mixed or not? (i.e. even if we cannot prove KLS, the algorithm can still stop early.) Is higher order method in SDE useful in MCMC? Any other suggestion/heuristic for sampling on convex set?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.