Sequence Kernel Association Tests (SKAT) for the Combined Effect of Rare and Common Variants 2013.06.17 統計論文 奈良原
The American journal of Human Genetics (2013) SKAT Wu, M.C., Lee, S., Cai, T., Li, Y., Boehnke, M., and Lin, X. (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93. Developed for rare-variant analysis
Background of rare variant analysis Classic: burden tests Collapsing method: rare variant +/- in a region Counts of rare alleles Combined multivariate and collapsing (CMC) method rare variants: collapsed, common variants: each forms a separate group --> Combined by Hotelling's T2 statistic Weighted sum Non-burden tests C-alpha test Sequence kernel association test (SKAT) Problem of Burden tests Burden tests assume that all rare variants influence the phenotype in the same direction with the same magnitude of effects (after weighting). methods that are robust to different direction and magnitude of effects
Development of SKAT SKAT (2011) A kernel regression approach non-parametric non-linear regression flexible weighting function weights based on minor allele frequency based on SNP functional annotation Wide range of application binary/continuous traits adjustment for covariates both rare and common variants (up-weighting rare variants) Efficient computation Score test for variance-component in linear mixed model
Development of SKAT (2) SKAT-O: Optimal unified approach (AJHG, 2012) Combination of a burden test and SKAT Burden test ... optimal when most variants in a region are causal and the effects are in the same direction SKAT ... optimal when a large fraction of the variants in a regions are non-causal or the effects of causal variants are in different directions Extension of SKAT-O to testing a combined effect of rare and common variants (AJHG, 2013)
methods
Linear mixed model Genetic effect: random effect
Variance component score test Choice of a kernel function weighted linear weighted quadratic weighted IBS = kernel function Genetic similarity between subjects (weighted) Choice of weights Typical parameter: a1=1, a2=25 P value given by the Davies method Approximation of Q statistic
Optimal unified approach, SKAT-O Next, they unified burden test and SKAT to optimize the rare variant analysis. Burden test More powerful than SKAT when most variants in a region are causal and the effects are in the same direction SKAT More powerful than burden test when a large fraction of the variants in a regions are non-causal or the effects of causal variants are in different directions
Weighted burden test statistic SKAT statistic aggregates the variants before regression first regresses and aggregates the individual variant statistics
Unifying two test statistics Optimal value of ρ is determined by grid search. Qρ is equivalently calculated by the formula of score test statistic ρ: correlation between different βj's ρ=0: regression coefficients are not correlated to each other --> SKAT ρ=1: regression coefficients are perfectly correlated --> Burden test
Rare and common variants together in the SKAT-O framework Different weighting functions are defined for rare and common variants. The effects of rare and common variants are fitted together using separate random effect terms.
Model
Statistic Weighted sum of statistics of rare and common variants
Predefined parameters Weights Rare variants Common variants Contribution, φ Equal contribution or searching the optimal value of φ Beta(1, 25) Beta(0.5, 0.5) MAF
Appendix
Kernel in statistics In Bayesian statistics The kernel of a probability density function or probability mass function is the form of PDF or PMF in which any factors that are not functions of any of the variables in the domain are omitted (normalization factor). Ex. kernel of a normal distribution PDF: Kernel:
Kernel in statistics (2) In non-parametric statistics A kernel is a weighting function Usage Kernel density estimation to estimate random variables' density functions In kernel regression to estimate the conditional expectation of a random variable In time-series to estimate the spectral density Estimation of a time-varying intensity for a point process Definition A kernel is a non-negative real-valued integrable function K satisfying the following two requirements: If K is a kernel, then so is the function K*. K*(u) = λK(λu), where λ > 0 --> A kernel is a PDF. --> A kernel is symmetric about u=0.
Kernel regression The kernel regression is a non-parametric approach to find a non-linear relation between a pair of random variables X and Y. The goal is to estimate a function m that gives conditional expectation of a variable Y relative to a variable X: A kernel is used to estimate a function m.
Kernel trick A kernel trick is a method to project data into a higher-dimensional space so that non-linear data can be separated by a hyperplane. non-linear --> linear Kernel function K(x, z) = <Φ(x), Φ(z)> Φ(・): a function to project data into higher-dimensional space <・, ・> : inner product
Application of kernel method Kernel PCA (non-linear PCA) Kernel CCA Support vector machine
Support vector machine SVM is a machine learning approach that utilizes the kernel function to project data in a higher-dimensional space that can separate the data by a hyperplane. SVM is a non-linear classifier.
Variance component score test Lin, X. (1997). Variance component testing in generalised linear models with random effects. Biometrika84, 309–326. Variance component tests in linear mixed model Likelihood ratio test Score statistic Computationally efficient Wald statistic