Statistical modelling Gil McVean, Department of Statistics Tuesday 24 th Jan 2012.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Brief introduction on Logistic Regression
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Statistical Estimation and Sampling Distributions
Statistics review of basic probability and statistics.
The General Linear Model Or, What the Hell’s Going on During Estimation?
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Models with Discrete Dependent Variables
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
Visual Recognition Tutorial
Estimating parameters from data Gil McVean, Department of Statistics Tuesday 3 rd November 2009.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Maximum likelihood (ML) and likelihood ratio (LR) test
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Maximum likelihood (ML)
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Evaluating Hypotheses
Log-linear and logistic models
Statistical Background
Random variables, distributions and limit theorems Gil McVean, Department of Statistics Monday 3 rd November 2008.
Inferences About Process Quality
Genetic Association and Generalised Linear Models Gil McVean, WTCHG Weds 2 nd November 2011.
Maximum likelihood (ML)
Inferential Statistics
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Inference for regression - Simple linear regression
Regression Analysis (2)
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
DATA ANALYSIS Module Code: CA660 Lecture Block 3.
The Triangle of Statistical Inference: Likelihoood
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Multinomial Distribution
Repeated Measures  The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject.
Random variables, distributions and limit theorems Gil McVean, Department of Statistics Wednesday 11 th February 2009.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
1 Model choice Gil McVean, Department of Statistics Tuesday 17 th February 2007.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Introduction to Probability - III John Rundle Econophysics PHYS 250
CHAPTER 29: Multiple Regression*
Simple Linear Regression
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

Statistical modelling Gil McVean, Department of Statistics Tuesday 24 th Jan 2012

Questions to ask… How can we use models in statistical inference? Where do commonly-used models come from? What is a likelihood? How can we use likelihood to estimate parameters, measure certainty and test hypotheses? How do we add covariates to simple models? How do you construct and use linear models? What’s so general about generalised linear modelling? 2

3 What is a random variable? A random variable is a number associated with the outcome of a stochastic process –Waiting time for next bus –Average number hours sunshine in May –Age of current prime-minister In statistics, we want to take observations of random variables and use this to make statements about the underlying stochastic process –Did this vaccine have any effect? –Which genes contribute to disease susceptibility? –Will it rain tomorrow? Parametric models provide much power in the analysis of variation (parameter estimation, hypothesis testing, model choice, prediction) –Statistical models of the random variables –Models of the underlying stochastic process

4 What is a distribution? A distribution characterises the probability (mass) associated with each possible outcome of a stochastic process Distributions of discrete data characterised by probability mass functions Distributions of continuous data are characterised by probability density functions (pdf) For RVs that map to the integers or the real numbers, the cumulative density function (cdf) is a useful alternative representation x x 0123

5 Expectations and variances Suppose we took a large sample from a particular distribution, we might want to summarise something about what observations look like ‘on average’ and how much variability there is The expectation of a distribution is the average value of a random variable over a large number of samples The variance of a distribution is the average squared difference between randomly sampled observations and the expected value

6 iid In most cases, we assume that the random variables we observe are independent and identically distributed The iid assumption allows us to make all sorts of statements both about what we expect to see and how much variation to expect Suppose X, Y and Z are iid random variables and a and b are constants

7 Where do ‘commonly-used’ distributions come from? At the core of much statistical theory and methodology lie a series of key distributions (e.g. Normal, Poisson, Exponential, etc.) These distributions are closely related to each other and can be ‘derived’ as the limit of simple stochastic processes when the random variable can be counted or measured In many settings, more complex distributions are constructed from these ‘simple’ distributions –Ratios: E.g. Beta, Cauchy –Compound: E.g. Geometric, Beta –Mixture models

8 The simplest model Bernoulli trials –Outcomes that can take only two values: (0 and 1) with probabilities  and 1 -  respectively. E.g. coin flipping, indicator functions The likelihood function calculates the probability of the data What is the probability of observing the sequence (if  = 0.5) – ? – ? Are they both equally probable?

9 The binomial distribution Often, we don’t care about the exact order in which successes occurred. We might therefore want to ask about the probability of k successes in n trials. This is given by the binomial distribution For example, the probability of exactly 3 heads in 4 coins tosses = –P(HHHT)+P(HHTH)+P(HTHH)+P(THHH) –Each order has the same Bernoulli probability = (1/2) 4 –There are 4 choose 3 = 4 orders Generally, if the probability of success is , the probability of k successes in n trials The expected number of successes is n  and the variance is n  (1-  ) n = 10  = 0.2

10 The geometric distribution Bernoulli trials have a memory-less property –The probability of success ( X = 1 ) next time is independent of the number of successes in the preceding trials The number of trials between subsequent successes takes a geometric distribution –The probability that the first success occurs at the k th trial You can expect to wait an average of 1/  trials for a success, but the variance is  = 0.5  =

11 The Poisson distribution The Poisson distribution is often used to model ‘rare events’ It can be derived in two ways –The limit of the Binomial distribution as  →0 and n→∞ (n  =  ) –The number of events observed in a given time for a Poisson process (more later) It is parameterised by the expected number of events =  –The probability of k events is The expected number of events is , and the variance is also  For large , the Poisson is well approximated by the normal distribution red = Poisson(5) blue = bin(100,0.05)

12 Going continuous In many situations while the outcome space of random variables may really be discrete (or at least measurably discrete), it is convenient to allow the random variables to be continuously distributed For example, the distribution of height in mm is actually discrete, but is well approximated by a continuous distribution (e.g. normal) Commonly-used continuous distributions arise as the limit of discrete processes

13 The Poisson process Consider a process when in every unit of time some event might occur E.g. every generation there is some chance of a gene mutating (with probability of approx 1 in 100,000 ) The probability of exactly one change in a sufficiently small interval h ≡ 1/n is P = vh ≡ v/n, where P is the probability of one change and n is the number of trials. The probability of two or more changes in a sufficiently small interval h is essentially 0 In the limit of the number of trials becoming large the total number of events (e.g. mutations) follows the Poisson distribution Time hh

14 The exponential distribution In the Poisson process, the time between successive events follows an exponential distribution –This is the continuous analogue of the geometric distribution –It is memory-less. i.e. f(x + t | X > t) = f(x ) x f(x)f(x)

15 The gamma distribution The gamma distribution arises naturally as the distribution of a series of iid random exponential variables The gamma distribution has expectation  and variance  2 More generally,  need not be an integer (for example, the Chi-square distribution with one degree of freedom is a Gamma (½, ½) distribution)  =  = 0.5  =  = 2  =  = 10

16 The beta distribution The beta distribution models random variables that take the value [0,1] It arises naturally as the proportional ratio of two gamma distributed random variables The expectation is  /(  +  ) In Bayesian statistics, the beta distribution is the natural prior for binomial proportions (beta-binomial) –The Dirichlet distribution generalises the beta to more than 2 proportions  =  = 0.5  =  = 2  =  = 10

17 The normal distribution The normal distribution is related to most distributions through the central limit theorem The normal distribution naturally describes variation of characters influenced by a large number of processes (height, weight) or the distribution of large numbers of events (e.g. limit of binomial with large np or Poisson with large  ) blue= Poiss(100) red= N(100,10)

18 The exponential family of distributions Many of the distributions covered (e.g. normal, binomial, Poisson, gamma) belong to the exponential family of probability distributions a k-parameter member of the family has a density or frequency function of the form E.g. the Bernoulli distribution (x = 0 or 1) is Such distributions have the useful property that simple functions of the data, T(x), contain all the information about model parameter –E.g. in Bernoulli case T(x) = x

Estimating parameters from data 19

Parameter estimation We can formulate most questions in statistics in terms of making statements about underlying parameters We want to devise a framework for estimating those parameters and making statements about our certainty There are several different approaches to making such statements –Moment estimators –Likelihood –Bayesian estimation 20

Moment estimation Moment methods are the simplest way of estimating parameters In such techniques parameter values are found that match sample moments (mean, variance, etc.) to those expected E.g. for random variables X 1, X 2, etc. sampled from a N( ,  2 ) distribution 21

Problems with moment estimation It is not always possible to exactly match sample moments with their expectation It is not clear when using moment methods how much of the information in the data about the parameters is being used –Often not much.. Why should MSE be the best way of measuring the value of an estimator? 22

Is there an optimal way to estimate parameters? For any model the maximum information about model parameters is obtained by considering the likelihood function The likelihood function is proportional to the probability of observing the data given a specified parameter value One natural choice for point estimation of parameters is the maximum likelihood estimate, the parameter values that maximise the probability of observing the data The maximum likelihood estimate (mle) has some useful properties (though is not always optimal in every sense ) 23

An intuitive view on likelihood 24

An example Suppose we have data generated from a Poisson distribution. We want to estimate the parameter of the distribution The probability of observing a particular random variable is If we have observed a series of iid Poisson RVs we obtain the joint likelihood by multiplying the individual probabilities together 25

Comments Note in the likelihood function the factorials have disappeared. This is because they provide a constant that does not influence the relative likelihood of different values of the parameter It is usual to work with the log likelihood rather than the likelihood. Note that maximising the log likelihood is equivalent to maximising the likelihood We can find the mle of the parameter analytically Note that here the mle is the same as the moment estimator Find where the derivative of the log likelihood is zero Take the natural log of the likelihood function 26

Sufficient statistics In this example we could write the likelihood as a function of a simple summary of the data – the mean This is an example of a sufficient statistic. These are statistics that contain all information about the parameter(s) under the specified model For example, support we have a series of iid normal RVs Mean square Mean 27

Properties of the maximum likelihood estimate The maximum likelihood estimate can be found either analytically or by numerical maximisation The mle is consistent in that it converges to the truth as the sample size gets infinitely large The mle is asymptotically efficient in that it achieves the minimum possible variance (the Cramér-Rao Lower Bound) as n→∞ However, the mle is often biased for finite sample sizes –For example, the mle for the variance parameter in a normal distribution is the sample variance 28

Comparing parameter estimates Obtaining a point estimate of a parameter is just one problem in statistical inference We might also like to ask how good different parameter values are One way of comparing parameters is through relative likelihood For example, suppose we observe counts of 12, 22, 14 and 8 from a Poisson process The maximum likelihood estimate is 14. The relative likelihood is given by 29

Using relative likelihood The relative likelihood and log likelihood surfaces are shown below 30

Interval estimation In most cases the chance that the point estimate you obtain for a parameter is actually the correct one is zero We can generalise the idea of point estimation to interval estimation Here, rather than estimating a single value of a parameter we estimate a region of parameter space –We make the inference that the parameter of interest lies within the defined region The coverage of an interval estimator is the fraction of times the parameter actually lies within the interval The idea of interval estimation is intimately linked to the notion of confidence intervals 31

Example Suppose I’m interested in estimating the mean of a normal distribution with known variance of 1 from a sample of 10 observations I construct an interval estimator The chart below shows how the coverage properties of this estimator vary with a If I choose a to be 0.62 I would have coverage of 95% 32

Confidence intervals It is a short step from here to the notion of confidence intervals We find an interval estimator of the parameter that, for any value of the parameter that might be possible, has the desired coverage properties We then apply this interval estimator to our observed data to get a confidence interval We can guarantee that among repeat performances of the same experiment the true value of the parameter would be in this interval 95% of the time We cannot say ”There is a 95% chance of the true parameter being in this interval” 33

Example – confidence intervals for normal distribution Creating confidence intervals for the mean of normal distributions is relatively easy because the coverage properties of interval estimators do not depend on the mean (for a fixed variance) For example, the interval estimator below has 95% coverage properties for any mean There is an intimate link between confidence intervals and hypothesis testing 34

Confidence intervals and likelihood Thanks to the central limit theorem there is another useful result that allows us to define confidence intervals from the log-likelihood surface Specifically, the set of parameter values for which the log-likelihood is not more than 1.92 less than the maximum likelihood will define a 95% confidence interval –In the limit of large sample size the LRT is approximately chi-squared distributed under the null This is a very useful result, but shouldn’t be assumed to hold –i.e. Check with simulation 35

Making simple models complex: Adding covariates 36

What is a covariate? A covariate is a quantity that may influence the outcome of interest –Genotype at a SNP –Age of mice when measurement was taken –Batch of chips from which gene expression was measured Previously, we have looked at using likelihood to estimate parameters of underlying distributions We want to generalise this idea to ask how covariates might influence the underlying parameters Much statistical modelling is concerned with considering linear effects of covariates on underlying parameters 37

What is a linear model? In a linear model, the expectation of the response variable is defined as a linear combination of explanatory variables Explanatory variables can include any function of the original data But the link between E(Y) and X (or some function of X ) is ALWAYS linear and the error is ALWAYS Gaussian Response variable InterceptLinear relationships with explanatory variables Interaction term Gaussian error 38

What is a GLM? There are many settings where the error is non-Gaussian and/or the link between E(Y) and X is not necessarily linear –Discrete data (e.g. counts in multinomial or Poisson experiments) –Categorical data (e.g. Disease status) –Highly-skewed data (e.g. Income, ratios) Generalised linear models keep the notion of linearity, but enable the use of non- Gaussian error models g is called the link function –In linear models, the link function is the identity The response variable can be drawn from any distribution of interest (the distribution function) –In linear models this is Gaussian 39

Poisson regression In Poisson regression the expected value of the response variable is given by the exponent of the linear term The link function is the log Note that several distribution functions are possible (normal, Poisson, binomial counts), though in practice Poisson regression is typically used to model count data (particularly when counts are low) 40

Example: Caesarean sections in public and private hospitals 41

Boxplots of rates of C sections 42

Fitting a model without covariates 43 > analysis<-glm(d$Caes ~ d$Births, family = "poisson") > summary(analysis) Call: glm(formula = d$Caes ~ d$Births, family = "poisson") Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.132e e < 2e-16 *** d$Births 4.406e e e-16 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: on 19 degrees of freedom Residual deviance: on 18 degrees of freedom AIC: Number of Fisher Scoring iterations: 4 Implies an average of 13 per 1000 births

Fitting a model with covariates Unexpectedly, this indicates that public hospitals actually have a higher rate of Caesarean sections than private ones 44 glm(formula = d$Caes ~ d$Births + d$Hospital, family = "poisson") Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.351e e e-08 *** d$Births 3.261e e e-08 *** d$Hospital 1.045e e *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Implies an average of 15.2 per 1000 births in public hospitals and 5.4 per 1000 births in private ones

Checking model fit Look a distribution of residuals and how well observed values are predicted 45

What’s going on? Initial summary suggested opposite result to GLM analysis. Why? Relationship between no. Births and no. C sections does not appear to be linear Accounting for this removes most of the apparent differences between hospital types There is also one quite influential outlier 46 NormalCaesarean Private41216 Public Relative risk Private (compared to public) = 2.8

Simpson’s paradox Be careful about adding together observations – this can be misleading E.g. Berkeley sex-bias case 47 Applicants % admitted Men844244% Women432135% MajorMenWomen Applicants % admittedApplicants % admitted A82562%10882% B56063%2568% C32537%59334% D41733%37535% E19128%39324% F2726%3417% Appears that women have lower success But actually women are typically more successful at the Departmental level, just apply to more competitive subjects

48

Finding MLEs in GLM In linear modelling we can use the beautiful compactness of linear algebra to find MLEs and estimates of the variance for parameters Consider an n by k+1 data matrix, X, where n is the number of observations and k is the number of explanatory variables, and a response vector Y –the first column is ‘1’ for the intercept term The MLEs for the coefficients (  ) can be estimated using In GLMs, there is usually no such compact analytical expression for the MLEs –Use numerical methods to maximise the likelihood 49

Testing hypotheses in GLMs For the parameters we are interested in we typically want to ask how much evidence there is that these are different from zero For this we need to construct confidence intervals for the parameter estimates We could estimate the confidence interval by finding all parameter values with log-likelihood no greater than 1.92 units worse than the MLE (2-unit support interval) Alternatively, we might appeal to the central limit theorem (CLT) and use bootstrap techniques to estimate the variance of parameter estimates However, we can also appeal to theoretical considerations of likelihood (again based on the CLT) that show that parameter estimates are asymptotically normal with variance described by the Fisher information matrix Informally, the information matrix describes the sharpness of the likelihood curve around the MLE and the extent to which parameter estimates are correlated 50

Logistic regression When only two types of outcome are possible (e.g. disease/not-disease) we can model counts by the binomial If we want to perform inference about the factors that influence the probability of ‘success’ it is usual to use the logistic model The link function here is the logit 51

Example: testing for genotype association In a cohort study, we observe the number of individuals in a population that get a particular disease We want to ask whether a particular genotype is associated with increased risk The simplest test is one in which we consider a single coefficient for the genotypic value GenotypeAAAaaa Genotypic value 012 Frequency in population 22  2 Probability of disease p0p0 p1p1 p2p  0 = -4  1 = 2 52

A note on the model Note that each copy of the risk allele contribute in an additive way to the exponent This does not mean that each allele ‘adds’ a fixed amount to the probability of disease Rather, each allele contributes a fixed amount to the log-odds This has the effect of maintaining Hardy-Weinberg equilibrium within both the cases and controls 53

Cont. Suppose in a given study we observe the following counts We can fit a GLM using the logit link function and binomial probabilities We have genotype data stored in the vector gt and disease status in the vector status Using R, this is specified by the command –glm(formula = status ~ gt, family = binomial) Genotype012 Counts with disease Counts without disease

Interpreting results Call: glm(formula = status ~ gt, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) <2e-16 *** gt <2e-16 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: on 1999 degrees of freedom Residual deviance: on 1998 degrees of freedom AIC: Number of Fisher Scoring iterations: 6 MLE for coefficients Standard error for estimates Estimate/std. error P-value from normal distribution Measure of contribution of individual observations to overall goodness of fit (for MLE model) Measure of goodness of fit of null (compared to saturated model) Measure of goodness of fit of fitted model Penalised likelihood used in model choice Number of iterations used to find MLE 55

Adding in extra covariates Adding in additional explanatory variables in GLM is essentially the same as in linear model analysis Likewise, we can look at interactions In the disease study we might want to consider age as a potentially important covariate glm(formula = status ~ gt + age, family = binomial) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) <2e-16 *** gt <2e-16 *** age * 56

Adding model complexity In the disease status analysis we might want to generalise the fitted model to one in which each genotype is allowed its own risk glm(formula = status ~ g1 + g2 + age, family = binomial) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) e-14 *** g1TRUE e-06 *** g2TRUE < 2e-16 *** age * --- Null deviance: on 1999 degrees of freedom Residual deviance: on 1996 degrees of freedom AIC:

Model choice It is worth remembering that we cannot simply identify the ‘significant’ parameters and put them in our chosen model Significance for a parameter tests the marginal null hypothesis that the coefficient associated with that parameter is zero If two explanatory variables are highly correlated then marginally neither may be significant, yet a linear model contains only one would be highly significant There are several ways to choose appropriate models from data. These typically involve adding in parameters one at a time and adding some penalty to avoid over-fitting. 58