Presentation is loading. Please wait.

Presentation is loading. Please wait.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.

Similar presentations


Presentation on theme: "The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy."— Presentation transcript:

1 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy Summer School Drawing Astrophysical Inferences from Data Sets William H. Press The University of Texas at Austin Lecture 2

2 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 2 Many, though not all, common distributions look sort-of like this: Suppose we want to summarize p(x) by a single number a, its “center”. Let’s find the value a that minimizes the mean-square distance of the “typical” value x: We already saw the beta distribution with ,  > 0 as an example on the interval [0,1]. We’ll see more examples soon.

3 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 3 expec t a t i onno t a t i on: h any t h i ng i ´ R x ( any t h i ng ) p ( x ) d x expec t a t i on i s l i near, e t c. This is the variance Var(x), but all we care about here is that it doesn’t depend on a. T h em i n i mum i so b v i ous l ya = h x i. ( T a k e d er i va t i ve wr t aan d se tt ozero i f you l i k emec h an i ca l ca l cu l a- t i ons. ) m i n i m i ze: ¢ 2 ´ ­ ( x ¡ a ) 2 ® = ­ x 2 ¡ 2 ax + a 2 ® = ( ­ x 2 ® ¡ h x i 2 ) + (h x i ¡ a ) 2 (in physics this is called the “parallel axis theorem”)

4 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 4 Higher moments, centered moments defined by ¹ i ´ ­ x i ® = R x i p ( x ) d x M i ´ ­ ( x ¡ h x i) i ® = R ( x ¡ h x i) i p ( x ) d x But generally wise to be cautious about using high moments. Otherwise perfectly good distributions don’t have them at all (divergent). And (related) it can take a lot of data to measure them accurately. Third and fourth moments also have “names” The centered second moment M 2, the variance, is by far most useful M 2 ´ V ar ( x ) ´ ­ ( x ¡ h x i) 2 ® = ­ x 2 ® ¡ h x i 2 ¾ ( x ) ´ p V ar ( x ) “standard deviation” summarizes a distribution’s half-width (r.m.s. deviation from the mean)

5 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 5 Certain combinations of higher moments are also additive. These are called semi-invariants. Mean and variance are additive over independent random variables: note “bar” notation, equivalent to Skew and kurtosis are dimensionless combinations of semi-invariants A Gaussian has all of its semi-invariants higher than I 2 equal to zero. A Poisson distribution has all of its semi-invariants equal to its mean.

6 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 6 This is a good time to review some standard (i.e., frequently occurring) distributions: Normal (Gaussian): Cauchy (Lorentzian): tails fall off “as fast as possible” tails fall off “as slowly as possible”

7 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 7 Student: “bell shaped” but you get to specify the power with which the tails fall off. Normal and Cauchy are limiting cases. (Also occurs in some statistical tests.) we’ll see uses for “heavy-tailed” distributions later note that  is not (quite) the standard deviation! “Student” was actually William Sealy Gosset (1876-1937), who spent his entire career at the Guinness brewery in Dublin, where he rose to become the company’s Master Brewer.

8 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 8 Common distributions on positive real line: Exponential:

9 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 9 Lognormal:

10 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 10 Gamma distribution: Gamma and Lognormal are both commonly used as convenient 2- parameter fitting functions for “peak with tail” positive distributions. Both have parameters for peak location and width. Neither has a separate parameter for how the tail decays. –Gamma: exponential decay –Lognormal: long-tailed (exponential of square of log)

11 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 11 Chi-square distribution (we’ll use this a lot!) Has only one parameter  that determines both peak location and width. is often an integer, called “number of degrees of freedom” or “DF” the independent variable is  2, not  It’s actually just a special case of Gamma, namely Gamma( /2,1/2)

12 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 12 A deviate from is called a t-value. is exactly the distribution of the sum of the squares of t-values. Let’s prove the case of =1: p Y ( y ) d y = 2 p X ( x ) d x S o, p Y ( y ) = y ¡ 1 = 2 p X ( y 1 = 2 ) = 1 p 2 ¼y e ¡ 1 2 y Why this will be important: We will know the distribution of any “statistic” that is the sum of a known number of t 2 -values.

13 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 13 The characteristic function of a distribution is its Fourier transform. Á X ( t ) ´ Z 1 ¡ 1 e i t x p X ( x ) d x (Statisticians often use notational convention that X is a random variable, x its value, p X (x) its distribution.) Á X ( 0 ) = 1 Á 0 X ( 0 ) = Z i xp X ( x ) d x = i ¹ ¡ Á 00 X ( 0 ) = Z x 2 p X ( x ) d x = ¾ 2 + ¹ 2 So, the coefficients of the Taylor series expansion of the characteristic function are the (uncentered) moments. Characteristic function of a distribution

14 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 14 l e t S = X + Y p S ( s ) = Z p X ( u ) p Y ( s ¡ u ) d u Á S ( t ) = Á X ( t ) Á Y ( t ) Addition of independent r.v.’s: (Fourier convolution theorem.) Á a X ( t ) = Z e i t x p a X ( x ) d x = Z e i t x 1 a p X ³ x a ´ d x = Z e i ( a t )( x = a ) p X ³ x a ´ d x a = Á X ( a t ) Scaling law for characteristic functions: Scaling law for r.v.’s: Properties of characteristic functions:

15 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 15 Á X ( t ) ´ Z 1 ¡ 1 e i t x p X ( x ) d x Proof of convolution theorem: Fourier transform pair

16 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 16 What’s the characteristic function of a Gaussian? Tell Mathematica that sig is positive. Otherwise it gives “cases” when taking the square root of sig^2

17 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 17 What’s the characteristic function of ? Since we already proved that =1 is the distribution of a single t 2 -value, this proves that the general case is the sum of t 2 -values.

18 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 18 Cauchy distribution has ill-defined mean and infinite variance, but it has a perfectly good characteristic function: Matlab and Mathematica both sadly fails at computing the characteristic function of the Cauchy distribution, but you can use fancier methods* and get: Á C auc h y ( t ) = e i ¹ t ¡ ¾ j t j note non-analytic at t=0 *If t>0, close the contour in the upper 1/2-plane with a big semi-circle, which adds nothing. So the integral is just the residue at the pole (x-  )/  =i, which gives exp(-  t). Similarly, close the contour in the lower 1/2-plane for t<0, giving exp(  t). So answer is exp(-|  t|). The factor exp(i  t) comes from the change of x variable to x- .


Download ppt "The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy."

Similar presentations


Ads by Google