Developments in other math and statistical classes

Slides:



Advertisements
Similar presentations
Katherine Jenny Thompson
Advertisements

I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Biomedical Statistics Testing for Normality and Symmetry Teacher:Jang-Zern Tsai ( 蔡章仁 ) Student: 邱瑋國.
Analysis of distorted waveforms using parametric spectrum estimation methods and robust averaging Zbigniew LEONOWICZ 13 th Workshop on High Voltage Engineering.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
12-1 Multiple Linear Regression Models Introduction Many applications of regression analysis involve situations in which there are more than.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
12 Multiple Linear Regression CHAPTER OUTLINE
N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.
x – independent variable (input)
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics.
The Simple Regression Model
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
15th September 2005PHYSTAT 05, Oxford 1 Statistics in ROOT René Brun, Anna Kreshuk, Lorenzo Moneta PH/SFT group, CERN ftp://root.cern.ch/root/phystat05.ppt.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Correlation & Regression
Hydrologic Statistics
Inference for regression - Simple linear regression
Class 4 Ordinary Least Squares SKEMA Ph.D programme Lionel Nesta Observatoire Français des Conjonctures Economiques
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
30th September 2005ROOT2005 Workshop 1 Developments in other math and statistical classes Anna Kreshuk, PH/SFT, CERN.
Numerical Descriptive Techniques
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Summary statistics Using a single value to summarize some characteristic of a dataset. For example, the arithmetic mean (or average) is a summary statistic.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Statistical Methods Statistical Methods Descriptive Inferential
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Variation This presentation should be read by students at home to be able to solve problems.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
A statistical test for point source searches - Aart Heijboer - AWG - Cern june 2002 A statistical test for point source searches Aart Heijboer contents:
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
Shashi M. Kanbur SUNY Oswego Workshop on Best Practices in Astro-Statistics IUCAA January 2016.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
WARM UP: Penny Sampling 1.) Take a look at the graphs that you made yesterday. What are some intuitive takeaways just from looking at the graphs?
Statistics 350 Review. Today Today: Review Simple Linear Regression Simple linear regression model: Y i =  for i=1,2,…,n Distribution of errors.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Estimating standard error using bootstrap
Lecture Slides Elementary Statistics Twelfth Edition
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
The simple linear regression model and parameter estimation
Advanced Quantitative Techniques
Chapter 4 Basic Estimation Techniques
AP Biology Intro to Statistics
Inference for the mean vector
Special Topics In Scientific Computing
Unfolding Problem: A Machine Learning Approach
Modelling data and curve fitting
Linear regression Fitting a straight line to observations.
BA 275 Quantitative Business Methods
Statistical Process Control
Pattern Recognition and Machine Learning
Section 2: Linear Regression.
6.1 Introduction to Chi-Square Space
15.1 The Role of Statistics in the Research Process
AP Statistics Chapter 12 Notes.
Diagnostics and Remedial Measures
Linear Regression and Correlation
Unfolding with system identification
Presentation transcript:

Developments in other math and statistical classes Anna Kreshuk, PH/SFT, CERN 30th September 2005 ROOT2005 Workshop

Contents News in fitting Multidimensional methods Linear fitter Robust fitter Fitting of multigraphs Multidimensional methods Robust estimator of multivariate location and scatter New methods in old classes Future plans 30th September 2005 ROOT2005 Workshop

Linear Fitter (0) To fit functions linear in parameters Polynomials, hyperplanes, linear combinations of arbitrary functions TLinearFitter can be used directly or through TH1, TGraph, TGraph2D::Fit interfaces When used directly, can fit multidimensional functions 30th September 2005 ROOT2005 Workshop

Linear Fitter (1) Special formula syntax: Linear parts separated by “++” signs: “1 ++ sin(x) ++ sin(2*x) ++ cos(3*x)” “[0] + [1]*sin(x) + [2]*sin(2*x) + [3]*cos(3*x)” Simple to use in multidimensional case “x0 ++ x1 ++ exp(x2) ++ log(x3) ++ x4” Polynomials (pol0, pol1…) and hyperplanes (hyp1, hyp2, …) are the fastest to compute By default, polynomials in TH1, TGraph::Fit functions now go through Linear Fitter Data to be used for fitting is not copied into the fitter 30th September 2005 ROOT2005 Workshop

Linear Fitter (2) Advantages in separating linear and non-linear fitting: Doesn’t require setting initial parameter values The gain in speed Function Linear fitter Minuit Pol3 in TGraphErrors 1000 fits of 1000 points Average CPU time 1.95 30.54 TMath::Sin(x) + TMath::Sin(2*x) 2.39 21.34 30th September 2005 ROOT2005 Workshop

Robust fitting (0) Least Trimmed Squares regression – extension of the TLinearFitter class Motivation: least-squares fitting is very sensitive to bad observations Robust fitter is used to fit datasets with outliers The algorithm tries to fit h points (out of N) that have the smallest sum of squared residuals 30th September 2005 ROOT2005 Workshop

Robust fitting (1) High breakdown point - smallest proportion of outliers that can cause the estimator to produce values arbitrarily far from the true parameters Graph.Fit(“pol3”, “rob=0.75”, -2, 2); 2nd parameter – fraction h of the good points 30th September 2005 ROOT2005 Workshop

TMultiGraph::Fit A multigraph is a collection of graphs Fit function, implemented in this class, allows to fit all graphs simultaneously, as if all the points belong to the same graph All options of TGraph::Fit supported 30th September 2005 ROOT2005 Workshop

Multivariate covariance Minimum Covariance Determinant Estimator – a highly robust estimator of multivariate location and scatter Motivation: arithmetic mean and regular covariance estimator are very sensitive to bad observations Class TRobustEstimator The algorithm tries to find a subset of h observations (out of N) with the minimal covariance matrix determinant 30th September 2005 ROOT2005 Workshop

Multivariate covariance High breakdown point Left – covariance ellipses of a 1000-point dataset with 250 outliers Right – distances of points from the robust mean, calculated using robust covariance matrix Indices of outlying points can be returned 30th September 2005 ROOT2005 Workshop

News in TMath More distribution functions, densities and quantile functions Median (for weighted observations) and K-th order statistic Kolmogorov test for unbinned data 30th September 2005 ROOT2005 Workshop

News in TH1 and TF1 TH1: TF1: Chi2 test Mean & RMS error, skewness and kurtosis TF1: Derivatives (1st, 2nd and 3rd) Improved minimization – a combination of grid search and Brent’s method (golden section search and parabolic interpolation) 30th September 2005 ROOT2005 Workshop

Future plans (0) Short term: sPlot – A statistical tool to unfold data distributions class TSPlot to be added soon Left – Projection plot cut on the likelihood ratio Excess of events – Signal? Background? Right – sPlot – no cut getting rid of background by statistical methods Signal!!! More about sPlot in Muriel Pivk’s presentation at 11:05 30th September 2005 ROOT2005 Workshop

Future plans (1) From PHYSTAT05 conference in Oxford, September 2005: Following a talk by Marc Paterno, an interface with R was discussed – ROOT Trees can already be read from R promt Following a talk by Jim Linnemann, a repository of physics-oriented statistical software in Fermilab was discussed. Rajendran Raja – Goodness of fit for unbinned likelihood fits Nikolai Gagunashvili – Chi2 test for comparison of weighted and unweighted histograms Martin Block – Outlier rejection and fitting with Lorentz weights F.Tegenfeldt & J.Conrad – More on confidence intervals Kyle Cranmer – More on hypothesis testing and confidence intervals A lot of other interesting suggestions… 30th September 2005 ROOT2005 Workshop

Future plans (2) Statistical plots Quantile-quantile plot – useful for determining if 2 samples come from the same distribution Boxplot Spiderplot 30th September 2005 ROOT2005 Workshop

Future plans (3) Loess – locally weighted regression FFT A procedure for estimating a regression surface by multivariate smoothing FFT Cluster analysis 30th September 2005 ROOT2005 Workshop