Download presentation
Presentation is loading. Please wait.
Published byQuentin Allen Modified over 8 years ago
1
Moritz Backes Measuring Correlations
2
Motivation e.g.: To use two-dimensional sidebands in data-driven background estimation techniques the control samples must be free of correlations Correlations are usually measured with the correlations coefficient BUT: The correlations coefficient catches only linear correlations. THUS: Independent variables have correlation 0 BUT: The converse is not true How can we measure higher order correlations..? Lecture by Kyle Cranmer in the academic training lecture program in February (http://indico.cern.ch/conferenceDisplay.py?confId=48425) suggested Mutual Information as a quantity to measure non-linear correlations.http://indico.cern.ch/conferenceDisplay.py?confId=48425 I tried to examine possible ways/quantities to measure correlations (linear, non-linear) and compare their properties.. 27/02/09Moritz Backes2
3
Correlation Coefficient Quantifies linear dependence between two variables: Values between -1.. 1 Independent variables have correlation 0 the converse is not true: 27/02/09Moritz Backes3 Correlation Coefficient: -0.028 Correlation Coefficient: 0.000 Correlation Coefficient: 0.002 Correlation Coefficient: 0.999
4
Quantifies any functional dependency between two variables: Takes values between 0..1 Closely related to profile plots Function: y = 4x(1-x) has zero linear correlation Obviously not symmetric Correlation Ratio and Profile Plots 27/02/09Moritz Backes4 Conditional Expectation of Y given X
5
Shape Comparison of PDFs using Kolmogorov Test Compare two-dimensional PDF P xy (X,Y) with product PDF of the two projections P x (X)*P y (Y) Check the two distributions for shape compatibility using a Kolmogorov-Smirnov test Compatible shapes means no correlations Does not seem to be 100 % reliable, KS test depends on binning, statistics etc.. 27/02/09Moritz Backes5
6
Mutual Information Lecture by Kyle Cranmer in the academic training lecture program in February (http://indico.cern.ch/conferenceDisplay.py?confId=48425) briefly mentioned Mutual Information as a quantity to measure non- linear correlations.http://indico.cern.ch/conferenceDisplay.py?confId=48425 Quantity originates from information theory (related to entropy) Thus: Mutual Information is the reduction in the uncertainty of X due to the knowledge of Y. 27/02/09Moritz Backes6
7
Mutual Information for a number of gaussian toys.. 27/02/09Moritz Backes7
8
Estimating Mutual Information (Dependence on Dataset Size) Returns an absolute value Arbitrary values Depends on PDFs not on actual data –Thus dependency on number of bins and size of datasets –Mutual info needs to be estimated from dist. by smoothening 27/02/09Moritz Backes8
9
Estimating Mutual Information (Dependence on Bin Number) Returns an absolute value Arbitrary values Depends on PDFs not on actual data –Thus dependency on number of bins and size of datasets –Mutual info needs to be estimated from dist. by smoothening 27/02/09Moritz Backes9
10
Mutual Information as function of SM toys 27/02/09Moritz Backes10
11
Comparison (linear Cor.) 27/02/09Moritz Backes11
12
Comparison (anti-linear Cor.) 27/02/09Moritz Backes12
13
Comparison (no linear Cor.) 27/02/09Moritz Backes13
14
Some more examples.. 27/02/09Moritz Backes14
15
Conclusions Several quantities / methods available to measure correlations of higher order between variables –Linear correlations: correlation coefficient –Functional dependencies: Correlations ratio –Non Functional dependencies: Shape comparison + KS Test & Mutual Information –Not easy to interpret Mutual Information, calculation is based on PDFs. –Thus, from limited data one can only make estimates, which are dependent on the statistical power of the sample (binning and dataset size) On stat. limited datasets distributions must be smoothened out (i.e. by kernel estimation or by using toys) to get a useful number. 27/02/09Moritz Backes15
16
Examples (3) 27/02/09Moritz Backes16
17
Examples (2) 27/02/09Moritz Backes17
18
Examples (1) 27/02/09Moritz Backes18
19
Estimating Mutual Information of Stat. Limited Datasets Returns an absolute value Depends on PDFs not on actual data –Thus dependency on number of bins and size of datasets 27/02/09Moritz Backes19
20
Mutual Information Returns an absolute value Arbitrary values Depends on PDFs not on actual data –Thus dependency on number of bins and size of datasets –Mutual info needs to be estimated from dist. By smoothening 27/02/09Moritz Backes20
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.