Download presentation
Presentation is loading. Please wait.
1
GxG and GxE
2
Top 4 SNPs for r_met The top 4 SNPs for r_met are:
chr9 rs chr9 rs chr9 rs chr5 rs26411 We want to test SNP by SNP Epistasis for the top 4 SNPs for r_met using PLINK.
3
PLINK Input Files MAP file: top4SNPs.map PED file: top4SNPs.ped
Phenotype data: r_met.txt
4
SNP by SNP Interaction (GxG)
PLINK makes a model based on allele dosage for each SNP, A and B, and fits the model in the form of Y ~ 0 + 1.A + 2.B + 3.AB + e See reference: PLINK commands: plink --noweb --file top4SNPs --epistasis --epi1 1 --pheno conty.txt --out younameit
5
SNP by SNP Interaction (GxG)
The output is in the form: CHR1 Chromosome of first SNP SNP1 Identifier for first SNP CHR2 Chromosome of second SNP SNP2 Identifier for second SNP OR_INT Odds ratio for interaction STAT Chi-square statistic, 1df P Asymptotic p-value
6
SNP by SNP Interaction (GxG)
Results: CHR1 SNP1 CHR2 SNP2 BETA_INT STAT P 5 rs26411 9 rs 0.85 rs 0.9064 rs 0.946 0.2549 1.007 0.3157 0.1224 0.7264 0.8518
7
SNP by SNP Interaction (GxG)
The output can be controlled via plink --noweb --file top4SNPs --epistasis --epi out younameit which means only record results that are significant p<= (This prevents too much output from being generated).
8
Covariate File PLINK provides the ability to test for a difference in association with a quantitative trait between two environments (or, more generally, two groups). Covariate file: gender.txt Col 1 is family ID, Col 2 is sample ID, Col 3 is gender (male: 1; female: 2)
9
Quantitative Trait Interaction (GxE)
PLINK commands: plink --noweb --file top4SNPs --gxe --covar gender.txt --pheno r_met.txt --out younameit The output is in the form: CHR Chromosome number SNP SNP identifier NMISS1 Number of non-missing genotypes in first group (1) BETA1 Regression coefficient in first group SE Standard error of coefficient in first group NMISS2 As above, second group BETA2 As above, second group SE As above, second group Z_GXE Z score, test for interaction P_GXE Asymptotic p-value for this test
10
Quantitative Trait Interaction (GxE)
Results: CHR SNP NMISS1 BETA1 SE1 NMISS2 BETA2 SE2 Z_GXE P_GXE 5 rs26411 280 63 0.171 0.8765 9 rs 281 0.5774 0.1412 0.8819 0.2821 0.3344 rs 278 0.5029 0.1111 0.6459 0.2565 0.609 rs 0.7273 0.1418 0.6243
11
Population Stratification Correction Using EIGENSTRAT
12
EIGENSTRAT The EIGENSTRAT method uses principal components analysis to explicitly model ancestry differences between cases and controls along continuous axes of variation; the resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. The EIGENSOFT package has a built-in plotting script and supports multiple file formats and quantitative phenotypes. The package is based on ideas from Price et al See
13
EIGENSTRAT Input Files (PED Format)
genotype file: the same as PLINK PED file *** file name MUST end in .ped *** snp file: the same as PLINK MAP file *** file name MUST end in .pedsnp indiv file: the first six columns of PLINK PED file *** file name MUST end in .pedind ***
14
Run PCA on Input Genotype Data
We call smartpca.pl to run PCA on input genotype data. Options: -i example.ped : genotype file -a example.pedsnp : snp file -b example.pedind : indiv file -k k : (Default is 10) number of principal components to output -o example.pca : output file of principal components -p example.plot : prefix of output plot files of top 2 principal components. (labeling individuals according to labels in indiv file) -e example.eval : output file of all eigenvalues -l example.log : output logfile
15
Run PCA on Input Genotype Data
Commands: smartpca.pl –i genotype.ped –a genotype.pedsnp –b genotype.pedind –k 10 –o genotype.pca –p genotype.plot –e genotype.eval –l genotype.log Main Outputs: genotype.pca genotype.plot.pdf
16
Test the Significance of PCs
Phenotype data: r_met.txt PC data: pc.txt Test the Significance of PCs y=read.table("r_met.txt") pc=read.table("pc.txt") y=as.matrix(y) pc=as.matrix(pc) fit=lm(y~pc) summary(fit)
17
Genotype Imputation Using IMPUTE2
18
IMPUTE2 IMPUTE version 2 (also known as IMPUTE2) is a genotype imputation and haplotype phasing program based on ideas from Howie et al. 2009: B. N. Howie, P. Donnelly, and J. Marchini (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genetics 5(6): e See
19
IMPUTE2 Input Files Genotype file (specified in -g)
Suppose you want to create a genotype for 2 individuals at 5 SNPs whose genotypes are SNP 1 : AA AA SNP 2 : GG GT SNP 3 : CC CT SNP 4 : CT CT SNP 5 : AG GG The correct genotype file would be SNP1 rs A C SNP2 rs G T SNP3 rs C T SNP4 rs C T SNP5 rs A G
20
IMPUTE2 Input Files Map file (specified in -m)
This file should have three columns: physical position (in base pairs), recombination rate between current position and next position in map (in cM/Mb), and genetic map position (in cM). The file should also have a header line with an unbroken character string for each column (e.g., "position COMBINED_rate(cM/Mb) Genetic_Map(cM)"). All of IMPUTE2 reference panel download packages come with appropriate recombination map files.
21
IMPUTE2 Input Files File of known haplotypes (specified in -h)
The file contains known haplotypes, with one row per SNP and one column per haplotype. All alleles must be coded as 0 or 1, and each -h file must be provided with a corresponding legend file. IMPUTE2 provides formatted haplotypes from the HapMap Project and the 1,000 Genomes Project in the reference panel download packages.
22
IMPUTE2 Input Files Legend files (specified in -l)
Legend file(s) with information about the SNPs in the -h file(s). Each file should have four columns: rsID, physical position (in base pairs), allele 0, and allele 1. The last two columns specify the alleles underlying the 0/1 coding in the corresponding -h file; these alleles can take values in {A,C,G,T}. Each legend file should also have a header line with an unbroken character string for each column (e.g., "rsID position a0 a1"). IMPUTE2 provides legend files for data from the HapMap Project and the 1,000 Genomes Project in our reference panel download packages. When using two -h files with IMPUTE2, you must supply the corresponding legend files in the same order, i.e., the file with more SNPs comes first.
23
Basic Commands Genomic interval to use for reference
-int <lower> <upper> specifies genomic interval to use for inference, as specified by <lower> and <upper> boundaries in base pair position. The boundaries can be expressed either in long form (e.g., -int ) or in exponential notation (e.g., -int 5.42e e6). This option is particularly useful for restricting test jobs to small regions or splitting whole-chromosome analyses into manageable chunks, as discussed in the section on analyzing whole chromosomes. Effective size of the population -Ne specifies "Effective size" of the population from which your dataset was sampled. IMPUTE2 suggests setting -Ne to in the majority of modern imputation analyses.
24
Stand Alignment Options
-strand_g specifies file showing the strand orientation of the SNP allele codings in the -g file, relative to a fixed reference point. Each SNP occupies one line, and the file should have two columns: (i) the base pair position of the SNP and (ii) the strand orientation ('+' or '-') of the alleles in the genotype file; the columns should be separated by a single space.
25
Output Files The main output file follows the same format as the -g file. Use -o to specify name of main output file.
26
Example This is the most common genotype imputation scenario: we want to impute untyped SNPs in a study dataset from a panel of reference haplotypes. The following command shows how to run this kind of analysis with IMPUTE2, using the example data that come with the program download: ./impute2 \ -m ./Example/example.chr22.map \ -h ./Example/example.chr22.1kG.haps \ -l ./Example/example.chr22.1kG.legend \ -g ./Example/example.chr22.study.gens \ -strand_g ./Example/example.chr22.study.strand \ -int 20.4e6 20.5e6 \ -Ne \ -o ./Example/example.chr22.one.phased.impute2
27
Sample Size and Power Calculation
28
Sample Size and Power Calculation
Power analysis is an important aspect of experimental design. It allows us to determine the sample size required to detect an effect of a given size with a given degree of confidence. Conversely, it allows us to determine the probability of detecting an effect of a given size with a given level of confidence, under sample size constraints. If the probability is unacceptably low, we would be wise to alter or abandon the experiment.
29
Sample Size and Power Calculation
The following four quantities have an intimate relationship: sample size effect size significance level = P(Type I error) = probability of finding an effect that is not there power = 1 - P(Type II error) = probability of finding an effect that is there Given any three, we can determine the fourth.
30
Power Analysis in R The pwr package in R implements power analysis.
For each function, you enter three of the four quantities (effect size, sample size, significance level, power) and the fourth is calculated. See reference page:
31
Power Analysis in R Example: library(pwr)
Using a two-tailed test proportions, and assuming a significance level of 0.01 and a common sample size of 30 for each proportion, what effect size can be detected with a power of .75? library(pwr) pwr.2p.test(n=30,sig.level=0.01,power=0.75)
32
Sample Size Calculation Using Quanto
Download page: Suppose, in a matched case control study, DNA samples have been collected to determine the effects of each SNP’s on the risk of having cardio vascular disease. We are interested in calculating the sample size needed to have the effect size (or odds ratio) in the range of with at least 80 percent power under dominance model. Moreover, the minor allele frequency is chosen to be 10 percent, and a type 1 error level of 0.05.
33
Sample Size Calculation Using Quanto
Under Parameters option, i. Select Outcome/Design>Disease>Case-control (Matched). ii. Select Hypothesis>Gene only. iii. Click onto Gene G and then type onto 0.1 on the allele frequency box. Select dominance inheritance mode. Click Ok. iv. Under Outcome model, specify baseline disease risk which is the disease risk in unexposed genetically normal subjects. For this study, let’s consider the baseline disease risk as 0.1. Under Genetic effect box, specify the effect size. In this case, consider 1.3 to 3.0 with an interval range of 0.5. v. Under Power window, specify power as 0.8 and click ok to calculate sample size. Type 0.05 on the type 1 error rate box. Click ok. vi. Click onto Calculate button.
34
Sample Size Calculation Using Quanto
The following output will be displayed. RG Gene kP The column “Gene” reflects the number of case-control pair needed. P0 is the baseline disease risk specified and kP is the overall disease risk in the general population (calculated by the software). For a range of odds ratio (RG), Quanto provides the number of case-control pairs required for the desired power.
35
Power software
36
piface.jar by Lenth (2006) Link: Select the two sample T test sigma1 and sigma2: standard deviation for each group Set true difference of means Solve for power by set sample size
37
Microarray power/sample size estimation
Link: Set the accepted # of false positives and fold differences(FC) Set the estimated standard deviation of the gene intensity measurements on the base-two logarithmic scale (0.7 recommended) Solve for sample size and per-gene alpha
38
RnaSeqSampleSize URL: Use sample size estimation by prior data(say, TCGA data) Use large repNumber to get more precise estimation.(50 may be enough)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.