Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

What is an association study? Define linkage disequilibrium
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Concepts and Connections
Basics of Linkage Analysis
BMI 731- Winter 2005 Chapter1: SNP Analysis Catalin Barbacioru Department of Biomedical Informatics Ohio State University.
Joint Linkage and Linkage Disequilibrium Mapping
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Linkage. Announcements 23andme genotyping. 23andme will genotype in ~3 weeks. You need to deliver finished spit kit by Friday NOON.
Published Genome-Wide Associations through ,617 published GWA at p≤5X10 -8 for 249 traits Autism marker Multiple Sclerosis Marker The GWAS Human.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
Lab 13: Association Genetics. Goals Use a Mixed Model to determine genetic associations. Understand the effect of population structure and kinship on.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
3%20GWASancestry.pptx.
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
Genome-wide association studies Usman Roshan. SNP Single nucleotide polymorphism Specific position and specific chromosome.
Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis Dinu et al, J. Biomedical.
Single nucleotide polymorphisms Usman Roshan. SNPs DNA sequence variations that occur when a single nucleotide is altered. Must be present in at least.
Genetic Recombination in Eukaryotes
Inferring Haplotypes Dr. Russell Thomson. A Haplotype. …AGCTATATTA…..GGCTGCTC…..AGCAGCGA… …AGCTAAATTA…..GGCTCCTC…..AGCAGCGA… One individual. Marker 1Marker.
Chi-square test Pearson's chi-square (χ 2 ) test is the best-known of several chi-square tests. It is mostly used to assess the tests of goodness of fit.
Resolving membership in a study in shared aggregate genetics data David W. Craig, Ph.D. Investigator & Associate Director Neurogenomics Division
SNPs DNA differs between humans by 0.1%, (1 in 1300 bases) This means that you can map DNA variation to around 10,000,000 sites in the genome Almost all.
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
Genetic Linkage 1 rs rs Chr. 4 Chr. 12.
Class GWAS Go to genotation.stanford.edu Go to “traits”, then “GWAS” Look up your SNPs Fill out the table Submit information.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,
Analysis of genome-wide association studies
11.4 Hardy-Wineberg Equilibrium. Equation - used to predict genotype frequencies in a population Predicted genotype frequencies are compared with Actual.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
Non-Mendelian Genetics
CS177 Lecture 10 SNPs and Human Genetic Variation
Gene Hunting: Linkage and Association
Genome-Wide Association Study (GWAS)
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Eran Halperin November 10, 2009
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Genes in human populations n Population genetics: focus on allele frequencies (the “gene pool” = all the gametes in a big pot!) n Hardy-Weinberg calculations.
Lab 13: Association Genetics December 5, Goals Use Mixed Models and General Linear Models to determine genetic associations. Understand the effect.
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
POLYMORPHISM AND VARIANT ANALYSIS Saurabh Sinha, University of Illinois.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
1 Balanced Translocation detected by FISH. 2 Red- Chrom. 5 probe Green- Chrom. 8 probe.
Genome wide association studies (A Brief Start)
GenABEL: an R package for Genome Wide Association Analysis
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
1 Genetic Mapping Establishing relative positions of genes along chromosomes using recombination frequencies Enables location of important disease genes.
Genome-Wides Association Studies (GWAS) Veryan Codd.
AP Biology Heredity PowerPoint presentation text copied directly from NJCTL with corrections made as needed. Graphics may have been substituted with a.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Date of download: 11/12/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Influence of Child Abuse on Adult DepressionModeration.
Power Calculations for GWAS
Of Sea Urchins, Birds and Men
Xiaole Shirley Liu STAT115/STAT215/
upstream vs. ORF binding and gene expression?
Genome Wide Association Studies using SNP
Recombination (Crossing Over)
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Linkage, Recombination, and Eukaryotic Gene Mapping
Exercise: Effect of the IL6R gene on IL-6R concentration
Genome-Wide Association Studies: Present Status and Future Directions
Presentation transcript:

Linkage

Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at Open up today’s lecture on powerpoint, there are tables to fill out. Stuart Office Hours Monday 4-6 pm.

Class GWAS Go to genotation.stanford.edu Go to “traits”, then “GWAS” Look up your SNPs Fill out the table Submit information

Terminology Genotype frequency: The frequency of a particular genotype in the population; e.g. A/a B/b. If the SNPs segregate randomly, you can calculate this by multiplying each of the allele frequencies. Linkage disequilibrium: If the SNPs segregate randomly, they are said to be in equilibrium. If they do not segregate randomly, they are in linkage disequilibrium. Haplotype: a set of markers that co-segregate with each other. abcor abcor ABC abcABCABC Phase: refers to whether the alleles are in cis or in trans. abor aB ABAb

Scenario 1 C T G A Chrom 1Chrom 2 C TG A Scenario 2

Data 1 click on “LD Blocks exercise” rs AA 15 AG20 GG rs CC12 CT23 TT | rs /rs |count|frequency | TT / GG|13|0.37 | TT / AG|10|0.29 | TT / AA|0|0 | CT / GG|7|0.2 | CT / AG|5|0.14 | CT / AA|0|0 | CC / GG|0|0 | CC / AG|0|0 | CC / AA|0|0

Plan A: Plan B: Scenario 1 or 2?

Data 1 click on “LD Blocks exercise” rs AA 15 AG20 GG rs CC12 CT23 TT rs A 55 G rs C 58 T rs A.79 G rs C.73 T

rs A.79 G rs C.73 T | rs /rs observedexpected | TT / GG.37(2 * T) * (2*G) | TT / AG.28(2*T) * 2*(A*G) | TT / AA0 | CT / GG.20 | CT / AG.14 | CT / AA0 | CC / GG0 | CC / AG0 | CC / AA0

Genetic Linkage 1 rs rs Chr. 4 Chr. 12

Data 2 rs GG 14 CG6 CC rs AA12 AG8 GG | rs /rs countfrequency | GG / AA70.25 | GG / AG10.04 | GG / GG00 | CG / AA10.04 | CG / AG | CG / GG20.07 | CC / AA00 | CC / AG00 | CC / GG60.21

rs G.46 C rs A.50 G | rs /rs frequencyexpected | GG / AA0.25(G*G) * (A*A) | GG / AG0.04(G*G) * 2 * (A*G) | GG / GG0 | CG / AA0.04 | CG / AG0.39 | CG / GG0.07 | CC / AA0 | CC / AG0 | CC / GG0.21

Genetic Linkage 2 rs rs Chr kb R 2 =.901

Data 3 rs C.44T rs A.66G rs /rs frequencyexpected | TT / AA0.06 | TT / AG0 | TT / GG0.26 | CT / AA0.06 | CT / AG0.03 | CT / GG0.13 | CC / AA0.06 | CC / AG0.26 | CC / GG0.13

Genetic Linkage 3 Chr. 2 Chr. 26 rs rs Ear wax, TT-> dry earwax Lactase, GG -> lactose intolerance

Sequence APOA2 in 72 people Look at patterns of polymorphisms

Find polymorphisms at these positions. Reference sequence is listed.

Sequence of the first chromosome. Circle is same as reference.

slide created by Goncarlo Abecasis

2818 C 2818 T 3027 T.87 T alleles 3027 C.13 C alleles.92 C Allele.08 T allele

2818 C 2818 T 3027 T.87 x.92 = x.08 = T alleles 3027 C.13 x.92 = x.08 = C alleles.92 C Allele.08 T allele Expected haplotype frequencies if unlinked

2818 C 2818 T 3027 T T alleles 3027 C C alleles.92 C Allele.08 T allele Expected if unlinked Observed

R – correlation coefficient P AB – P A P B R = SQR(P A x P a x P B x P b )

Calculate R R =.86 – (.87)(.92) / SQR (.87 *.13 *.92 *.08) =.06 / SQR (7.2 x ) =.06 /.085 =.706

slide created by Goncarlo Abecasis

R 2 = =.497

Haplotype blocks

slide created by Goncarlo Abecasis

Published Genome-Wide Associations through 07/2012 Published GWA at p≤5X10 -8 for 18 trait categories NHGRI GWA Catalog

Genome Wide Association Studies Genotype of SNPxxx GGGGGGGGGGGGGGGGGG GGGGGGGGGGGGGGGGGG GGGGGGGGGGGGGGGGGG AAAAAAAAAAAAAAAAAAAA Genotype of SNPxxx GGGGGGGGGGGGGGGGGG AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA G is risk, A is protective

Colorectal cancer 1057 cases 960 controls 550K SNPs

1027 Colorectal cancer 960 controls Cancer: 0.57G 0.43T controls: 0.49G 0.51T Colorectal cancer data from rs

Cancer: 0.57G 0.43T controls: 0.49G 0.51T Are these different? Chi squared

Chi squared

Chi squared = 31 P values = 10 -7

Stuart’s genotype Homozygous bad allele 

Other models Dominant: Assume G is dominant. GG or GT vs TT GG or GTTT Cases Controls706254

Other models Recessive: Assume G is recessive. GG vs GT or TT GGGT or TT Cases Controls235725

Other models additive: GG > GT > TT Do linear regression 3 genotype x 2 groups

% cancer TT GT GG %cancer =  (genotype) + 

Allelic odds ratio: ratio of the allele ratios in the cases divided by the allele ratios in the controls How different is this SNP in the cases versus the controls? Cancer.57 G/.43 T = 1.32 Control.49 G/.51T = 0.96 Allelic Odds Ratio = 1.32/0.96 = 1.37

Allelic odds ratio*: ratio of the allele ratios in the cases divided by the allele ratio in the entire population (need allele ratio from entire population to do this) How different is this SNP in the cases versus everyone?

Likelihood ratio: What is the likelihood of seeing a genotype given the disease compared to the likelihood of seeing the genotype given no disease? Increased Risk: What is the likelihood of seeing a trait given a genotype compared to overall likelihood of seeing the trait in the population?

Multiple hypothesis testing P =.05 means that there is a 5% chance for this to occur randomly. If you try 100 times, you will get about 5 hits. If you try 547,647 times, you should expect 547,647 x.05 = 27,382 hits. So 27,673 (observed) is about the same as one would randomly expect. “Of the 547,647 polymorphic tag SNPs, 27,673 showed an association with disease at P <.05.”

Multiple hypothesis testing Here, have 547,647 SNPs = # hypotheses False discover rate = q = p x # hypotheses. This is called the Bonferroni correction. Want q =.05. This means a positive SNP has a.05 likelihood of rising by chance. At q =.05, p =.05 / 547,647 =.91 x This is the p value cutoff used in the paper. “Of the 547,647 polymorphic tag SNPs, 27,673 showed an association with disease at P <.05.”

Multiple hypothesis testing The Bonferroni correction is too conservative. It assumes that all of the tests are independent. But the SNPs are linked in haplotype blocks, so there really are less independent hypotheses than SNPs. Another way to correct is to permute the data many times, and see how many times a SNP comes up in the permuted data at a particular threshold. “Of the 547,647 polymorphic tag SNPs, 27,673 showed an association with disease at P <.05.”

Summary Are the SNPs linked? Calculate Correlation Is the SNP associated with a disease? Chi-squared Is the SNP genome-wide significant? Correct for multiple hypothesis testing How big is the effect of the SNP? Odds ratio, increased likelihood