Single nucleotide polymorphisms Usman Roshan
SNPs DNA sequence variations that occur when a single nucleotide is altered. Must be present in at least 1% of the population to be a SNP. Occur every 100 to 300 bases along the 3 billion-base human genome. Many have no effect on cell function but some could affect disease risk and drug response.
Toy example
SNPs on the chromosome
Perl exercise Determining SNPs from a pairwise genome alignment: –Can we solve this problem with a Perl script?
Bi-allelic SNPs Most SNPs have one of two nucleotides at a given position For example: –A/G denotes the varying nucleotide as either A or G. We call each of these an allele –Most SNPs have two alleles (bi-allelic)
Perl exercise Determining SNP type from a multiple genome alignment.
SNP genotype We inherit two copies of each chromosome (one from each parent) For a given SNP the genotype defines the type of alleles we carry Example: for the SNP A/G one’s genotype may be –AA if both copies of the chromosome have A –GG if both copies of the chromosome have G –AG or GA if one copy has A and the other has G –The first two cases are called homozygous and latter two are heterozygous
SNP genotyping
Perl exercise SNP encoding: –Convert SNP genotype from a character sequence to numeric one
Real SNPs SNP consortium: snp.cshl.org SNPedia:
Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick random humans with and without cancer (say breast cancer) –Perform SNP genotyping –Look for associated SNPs –Also called genome-wide association study
Case-control example Study of 100 people: –Case: 50 subjects with cancer –Control: 50 subjects without cancer Count number of dominant and recessive alleles and form a contingency table #Recessive alleles #Dominant alleles Case1040 Control248
Perl exercise Contingency table: –Compute contingency table given case and control SNP genotype data
Odds ratio Odds of recessive in cancer = a/b = e Odds of recessive in no-cancer = c/d = f Odds ratio of recessive in cancer vs no-cancer = e/f #Recessive alleles #Dominant alleles Cancerab No cancercd
Risk ratio (Relative risk) Probability of recessive in cancer = a/(a+b) = e Probability of recessive in no-cancer = c/(c+d) = f Risk ratio of recessive in cancer vs no-cancer = e/f #Recessive alleles #Dominant alleles Cancerab No cancercd
Odds ratio vs Risk ratio Risk ratio has a natural interpretation since it is based on probabilities In a case-control model we cannot calculate the probability of cancer given recessive allele. Subjects are chosen based disease status and not allele type Odds ratio shows up in logistic regression models
Example Odds of recessive in case = 15/35 Odds of recessive in control = 2/48 Odds ratio of recessive in case vs control = (15/35)/(2/48) = 10.3 Risk of recessive in case = 15/50 Risk of recessive in control = 2/50 Risk ratio of recessive in case vs control = 15/2 = 7.5 #Recessive alleles #Dominant alleles Case1535 Control248
Odds ratios in genome-wide association studies Higher odds ratio means stronger association Therefore SNPs with highest odds ratios should be used as predictors or risk estimators of disease Odds ratio generally higher than risk ratio Both are similar when small
Statistical test of association (P-values) P-value = probability of the observed data (or worse) under the null hypothesis Example: –Suppose we are given a series of coin-tosses –We feel that a biased coin produced the tosses –We can ask the following question: what is the probability that a fair coin produced the tosses? –If this probability is very small then we can say there is a small chance that a fair coin produced the observed tosses. –In this example the null hypothesis is the fair coin and the alternative hypothesis is the biased coin