Presentation is loading. Please wait.

Presentation is loading. Please wait.

Moritz Backes Measuring Correlations. Motivation e.g.: To use two-dimensional sidebands in data-driven background estimation techniques the control samples.

Similar presentations


Presentation on theme: "Moritz Backes Measuring Correlations. Motivation e.g.: To use two-dimensional sidebands in data-driven background estimation techniques the control samples."— Presentation transcript:

1 Moritz Backes Measuring Correlations

2 Motivation e.g.: To use two-dimensional sidebands in data-driven background estimation techniques the control samples must be free of correlations Correlations are usually measured with the correlations coefficient BUT: The correlations coefficient catches only linear correlations.  THUS: Independent variables have correlation 0 BUT: The converse is not true How can we measure higher order correlations..?  Lecture by Kyle Cranmer in the academic training lecture program in February (http://indico.cern.ch/conferenceDisplay.py?confId=48425) suggested Mutual Information as a quantity to measure non-linear correlations.http://indico.cern.ch/conferenceDisplay.py?confId=48425  I tried to examine possible ways/quantities to measure correlations (linear, non-linear) and compare their properties.. 27/02/09Moritz Backes2

3 Correlation Coefficient Quantifies linear dependence between two variables: Values between -1.. 1 Independent variables have correlation 0  the converse is not true: 27/02/09Moritz Backes3 Correlation Coefficient: -0.028 Correlation Coefficient: 0.000 Correlation Coefficient: 0.002 Correlation Coefficient: 0.999

4 Quantifies any functional dependency between two variables: Takes values between 0..1 Closely related to profile plots Function: y = 4x(1-x) has zero linear correlation Obviously not symmetric Correlation Ratio and Profile Plots 27/02/09Moritz Backes4 Conditional Expectation of Y given X

5 Shape Comparison of PDFs using Kolmogorov Test Compare two-dimensional PDF P xy (X,Y) with product PDF of the two projections P x (X)*P y (Y) Check the two distributions for shape compatibility using a Kolmogorov-Smirnov test Compatible shapes means no correlations  Does not seem to be 100 % reliable, KS test depends on binning, statistics etc.. 27/02/09Moritz Backes5

6 Mutual Information Lecture by Kyle Cranmer in the academic training lecture program in February (http://indico.cern.ch/conferenceDisplay.py?confId=48425) briefly mentioned Mutual Information as a quantity to measure non- linear correlations.http://indico.cern.ch/conferenceDisplay.py?confId=48425 Quantity originates from information theory (related to entropy)  Thus: Mutual Information is the reduction in the uncertainty of X due to the knowledge of Y. 27/02/09Moritz Backes6

7 Mutual Information for a number of gaussian toys.. 27/02/09Moritz Backes7

8 Estimating Mutual Information (Dependence on Dataset Size) Returns an absolute value Arbitrary values Depends on PDFs not on actual data –Thus dependency on number of bins and size of datasets –Mutual info needs to be estimated from dist. by smoothening 27/02/09Moritz Backes8

9 Estimating Mutual Information (Dependence on Bin Number) Returns an absolute value Arbitrary values Depends on PDFs not on actual data –Thus dependency on number of bins and size of datasets –Mutual info needs to be estimated from dist. by smoothening 27/02/09Moritz Backes9

10 Mutual Information as function of SM toys 27/02/09Moritz Backes10

11 Comparison (linear Cor.) 27/02/09Moritz Backes11

12 Comparison (anti-linear Cor.) 27/02/09Moritz Backes12

13 Comparison (no linear Cor.) 27/02/09Moritz Backes13

14 Some more examples.. 27/02/09Moritz Backes14

15 Conclusions Several quantities / methods available to measure correlations of higher order between variables –Linear correlations: correlation coefficient –Functional dependencies: Correlations ratio –Non Functional dependencies: Shape comparison + KS Test & Mutual Information –Not easy to interpret Mutual Information, calculation is based on PDFs. –Thus, from limited data one can only make estimates, which are dependent on the statistical power of the sample (binning and dataset size)  On stat. limited datasets distributions must be smoothened out (i.e. by kernel estimation or by using toys) to get a useful number. 27/02/09Moritz Backes15

16 Examples (3) 27/02/09Moritz Backes16

17 Examples (2) 27/02/09Moritz Backes17

18 Examples (1) 27/02/09Moritz Backes18

19 Estimating Mutual Information of Stat. Limited Datasets Returns an absolute value Depends on PDFs not on actual data –Thus dependency on number of bins and size of datasets 27/02/09Moritz Backes19

20 Mutual Information Returns an absolute value Arbitrary values Depends on PDFs not on actual data –Thus dependency on number of bins and size of datasets –Mutual info needs to be estimated from dist. By smoothening 27/02/09Moritz Backes20


Download ppt "Moritz Backes Measuring Correlations. Motivation e.g.: To use two-dimensional sidebands in data-driven background estimation techniques the control samples."

Similar presentations


Ads by Google