Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI B609: β€œFoundations of Data Science”

Similar presentations


Presentation on theme: "CSCI B609: β€œFoundations of Data Science”"β€” Presentation transcript:

1 CSCI B609: β€œFoundations of Data Science”
Lecture 5: Dimension Reduction, Separating and Fitting Gaussians Slides at Grigory Yaroslavtsev

2 Gaussian Annulus Theorem
Gaussian in 𝒅 dimensions ( 𝑁 𝒅 ( 0 𝒅 , 1)): Pr 𝒙=( 𝑧 1 ,…, 𝑧 𝒅 ) = 2πœ‹ βˆ’ 𝒅 2 𝑒 βˆ’ 𝑧 𝑧 2 2 +…+ 𝑧 𝒅 2 2 Nearly all mass in annulus of radius 𝒅 and width 𝑂(1): Thm. For any πœ·β‰€ 𝒅 all but 3 𝑒 βˆ’π’„ 𝜷 2 probability mass satisfies 𝒅 βˆ’πœ·β‰€ 𝒙 𝟐 ≀ 𝒅 +𝜷 for constant 𝒄

3 Nearest Neighbors and Random Projections
Given a database 𝑨 of 𝑛 points in ℝ 𝒅 Preprocess 𝑨 into a small data structure 𝑫 Should answer following queries fast: Given π’’βˆˆ ℝ 𝒅 find closest π’™βˆˆπ‘¨: π‘Žπ‘Ÿπ‘”π‘šπ‘– 𝑛 π‘₯βˆˆπ‘¨ π’’βˆ’π’™ 2 Project each π’™βˆˆπ‘¨ onto 𝑓(𝒙), where 𝑓: ℝ 𝒅 β†’ ℝ π’Œ Pick π’Œ vectors 𝒖 1 ,…, 𝒖 π’Œ i.i.d: 𝒖 𝑖 ∼ 𝑁 𝒅 0 𝒅 ,1 𝑓 𝒗 =(〈 𝒖 1 ,𝒗βŒͺ,…, βŒ©π’– π’Œ ,𝒗βŒͺ) Will show that w.h.p. 𝑓 𝒗 2 β‰ˆ π’Œ 𝒗 2 Return: π‘Žπ‘Ÿπ‘”π‘šπ‘– 𝑛 π‘₯∈𝐴 𝑓 𝒒 βˆ’π‘“ 𝒙 2 =π‘Žπ‘Ÿπ‘”π‘šπ‘– 𝑛 π‘₯∈𝐴 𝑓 π’’βˆ’π’™ 2 β‰ˆ π’Œ π‘Žπ‘Ÿπ‘”π‘šπ‘– 𝑛 π‘₯∈𝐴 π’’βˆ’π’™ 2

4 Random Projection Theorem
Pick π’Œ vectors 𝒖 1 ,…, 𝒖 π’Œ i.i.d: 𝒖 𝑖 ∼ 𝑁 𝒅 0 𝒅 ,1 𝑓 𝒗 =(〈 𝒖 1 ,𝒗βŒͺ,…, βŒ©π’– π’Œ ,𝒗βŒͺ) Will show that w.h.p. 𝑓 𝒗 2 β‰ˆ π’Œ 𝒗 2 Thm. Fix π’—βˆˆ ℝ 𝒅 then βˆƒπ‘>0: for 𝝐∈ 0,1 : Pr 𝒖 𝑖 ∼ 𝑁 𝒅 0 𝒅 , 𝑓 𝒗 2 βˆ’ π’Œ 𝒗 2 β‰₯𝝐 π’Œ 𝒗 2 ≀3 𝑒 βˆ’π‘π’Œ 𝝐 2 Scaling: 𝒗 𝟐 =1 Key fact: 𝒖 𝑖 ,𝒗 = 𝑗=1 𝒅 𝒖 𝑖𝑗 𝒗 𝑗 ~𝑁 0, 𝒗 ​ =𝑁 0,1 Apply β€œGaussian Annulus Theorem” with π’Œ=𝒅

5 Nearest Neighbors and Random Projections
Thm. Fix π’—βˆˆ ℝ 𝒅 then βˆƒπ‘>0: for 𝝐∈ 0,1 : Pr 𝒖 𝑖 ∼ 𝑁 𝒅 0 𝒅 , 𝑓 𝒗 2 βˆ’ π’Œ 𝒗 2 β‰₯𝝐 π’Œ 𝒗 2 ≀3 𝑒 βˆ’π‘π’Œ 𝝐 2 Return: π‘Žπ‘Ÿπ‘”π‘šπ‘– 𝑛 π‘₯∈𝐴 𝑓 𝒒 βˆ’π‘“ 𝒙 2 β‰ˆ π’Œ π‘Žπ‘Ÿπ‘”π‘šπ‘– 𝑛 π‘₯∈𝐴 π’’βˆ’π’™ 2 Fix and let 𝒗= π’’βˆ’ 𝒙 𝑖 for 𝒙 𝑖 βˆˆπ‘¨ and let π’Œ=𝑂 𝜸log 𝑛 𝝐 2 1±𝝐 π’Œ π’’βˆ’ 𝒙 𝑖 𝟐 β‰ˆ 𝑓 𝒒 βˆ’π‘“ 𝒙 𝟐 (prob. 1βˆ’ 𝑛 βˆ’π›Ύ ) Union bound: For fixed 𝒒 distances to 𝑨 preserved with prob. 1βˆ’ 𝑛 βˆ’π›Ύ+1

6 Separating Gaussians One-dimensional mixture of Gaussians:
𝑝 π‘₯ = 𝑀 1 𝑝 1 π‘₯ + 𝑀 2 𝑝 2 π‘₯ E.g. modeling heights of men/women Parameter estimation problem: Given samples from a mixture of Gaussians Q: Estimate means and (co)-variances Sample origin problem: Given samples from well-separated Gaussians Q: Did they come from the same Gaussian?

7 Separating Gaussians Gaussian in 𝒅 dimensions ( 𝑁 𝒅 ( 0 𝒅 , 1)):
Pr 𝒙=( 𝑧 1 ,…, 𝑧 𝒅 ) = 2πœ‹ βˆ’ 𝒅 2 𝑒 βˆ’ 𝑧 𝑧 2 2 +…+ 𝑧 𝒅 2 2 Nearly all mass in annulus of radius 𝒅 and width 𝑂(1) Almost all mass in a slab 𝒙 βˆ’π‘β‰€ π‘₯ 1 ≀𝑐} for 𝑐=𝑂(1) Pick 𝒙 ~Gaussian and rotate coordinates to make it π‘₯ 1 Pick π’šβˆΌGaussian, w.h.p. projection of π’š on 𝒙 is ∈[βˆ’π‘,𝑐] 𝒙 βˆ’π’š 2 β‰ˆ 𝒙 π’š 2 2

8 Separating Gaussians In coordinates: 𝒙=( 𝒅 ±𝑂 1 , 0,0,…, 0)
𝒙=( 𝒅 ±𝑂 1 , 0,0,…, 0) π’š=(±𝑂 1 , 𝒅 ±𝑂 1 , 0,…, 0) W.h.p: π’™βˆ’π’š 𝟐 𝟐 =2𝒅±𝑂( 𝒅 )

9 Separating Gaussians Two spherical unit variance Gaussians centered at 𝒑,𝒒 𝛿= π’‘βˆ’π’’ 𝟐 (π’™βˆΌπ‘(𝒑,1),π’šβˆΌπ‘(𝒒,1)) 𝒙= 𝒅 ±𝑂 1 , 0,0,0…, 0 π’š=(±𝑂 1 ,𝛿±𝑂 1 , 𝒅 ±𝑂 1 , 0,…, 0) π’™βˆ’π’š 𝟐 𝟐 = 𝛿 2 +2𝒅±𝑂 𝒅

10 Separating Gaussians Same Gaussian: π’™βˆ’π’š 𝟐 𝟐 =2𝒅±𝑂( 𝒅 )
π’™βˆ’π’š 𝟐 𝟐 =2𝒅±𝑂( 𝒅 ) Different Gaussians: π’™βˆ’π’š 𝟐 𝟐 = 𝛿 2 +2𝒅±𝑂 𝒅 Separation requires: 2𝒅±𝑂 𝒅 < 𝛿 2 +2𝒅±𝑂 𝒅 𝑂 𝒅 < 𝛿 2 πœ” 𝒅 𝟏/πŸ’ < 𝛿

11 Fitting Spherical Gaussian to Data
Given samples 𝒙 𝟏 , 𝒙 𝟐 ,…, 𝒙 𝒏 Q: What are parameters of best fit 𝑁(𝝁,𝜎)? Pr 𝒙 π’Š =( 𝑧 1 ,…, 𝑧 𝒅 ) = 2πœ‹ βˆ’ 𝒅 2 𝑒 βˆ’ ( 𝝁 1 βˆ’π‘§ 1 ) ( 𝝁 2 βˆ’π‘§ 2 ) 2 +…+ ( 𝝁 𝒅 βˆ’π‘§ 𝒅 ) = 2πœ‹ βˆ’ 𝒅 2 𝑒 βˆ’ | πβˆ’π’›| Pr 𝒙 𝟏 = 𝒛 𝟏 , 𝒙 𝟐 = 𝒛 𝟐 ,…, 𝒙 𝒏 = 𝒛 𝒏 = 2πœ‹ βˆ’ 𝒅𝑛 2 𝑒 βˆ’ | πβˆ’ 𝒛 𝟏 | | πβˆ’ 𝒛 𝟐 | …+| πβˆ’ 𝒛 𝒅 | 𝜎 2

12 Maximum Likelihood Estimator
PDF: 2πœ‹ βˆ’ 𝒅𝑛 2 𝑒 βˆ’ | πβˆ’ 𝒛 𝟏 | | πβˆ’ 𝒛 𝟐 | …+| πβˆ’ 𝒛 𝒅 | 𝜎 2 MLE for 𝝁 is 𝝁= 1 𝑛 𝒙 𝟏 + 𝒙 𝟐 +…+ 𝒙 𝒏 Take gradient w.r.t 𝝁 and make it =0 𝛻 𝝁 πβˆ’π’™ =2(πβˆ’π’™) 2 πβˆ’ 𝒙 𝟏 +2 πβˆ’ 𝒙 𝟐 +…+2 πβˆ’ 𝒙 𝒏 =0

13 MLE for Variance π‘Ž=| πβˆ’ 𝒙 𝟏 | 2 2 +| πβˆ’ 𝒙 𝟐 | 2 2 +…+| πβˆ’ 𝒙 𝒅 | 2 2
π‘Ž=| πβˆ’ 𝒙 𝟏 | | πβˆ’ 𝒙 𝟐 | …+| πβˆ’ 𝒙 𝒅 | 2 2 𝜈=1/2 𝜎 2 PDF: 𝑒 βˆ’π‘Žπœˆ π’™βˆˆ 𝑅 𝒅 𝑒 βˆ’ 𝜈 𝒙 𝑑𝒙 𝑛 Log(PDF): βˆ’π‘Žπœˆβˆ’π‘› ln π’™βˆˆ 𝑅 𝒅 𝑒 βˆ’ 𝜈 𝒙 𝑑𝒙 Differentiate w.r.t. 𝜈 and set derivative =0

14 MLE for Variance Log(PDF): βˆ’π‘Žπœˆβˆ’π‘› ln π’™βˆˆ 𝑅 𝒅 𝑒 βˆ’ 𝜈 𝒙 2 2 𝑑𝒙
βˆ’π‘Ž+𝑛 π’™βˆˆ 𝑅 𝒅 𝒙 𝑒 βˆ’ 𝜈 𝒙 𝑑𝒙 π’™βˆˆ 𝑅 𝒅 𝑒 βˆ’ 𝜈 𝒙 𝑑𝒙 𝑦= πœˆπ’™ : βˆ’π‘Ž+ 𝑛 𝜈 π’™βˆˆ 𝑅 𝒅 𝑦 2 𝑒 βˆ’ 𝑦 2 𝑑𝒙 π’™βˆˆ 𝑅 𝒅 𝑒 βˆ’ 𝑦 2 𝑑𝒙 =βˆ’π‘Ž+ 𝑛 𝜈 Γ— 𝒅 2 =0 𝜈= 1 2 𝜎 2 β‡’ MLE(𝜎)= π‘Ž 𝑛𝒅 = sample standard deviation


Download ppt "CSCI B609: β€œFoundations of Data Science”"

Similar presentations


Ads by Google