Ivan P. Gorlov, Olga Y. Gorlova & Christopher I. Amos.

Slides:



Advertisements
Similar presentations
A simple algebraic cancer equation: calculating how cancers may arise with normal mutation rates Peter Calabrese & Darryl Shibata Question: A fundamental.
Advertisements

April 5 th, 2011 FAMILY HISTORY AND THE PUBLICS HEALTH.
The Five Factors of Evolution
CZ5225 Methods in Computational Biology Lecture 9: Pharmacogenetics and individual variation of drug response CZ5225 Methods in Computational Biology.
Which Phenotypes Can be Predicted from a Genome Wide Scan of Single Nucleotide Polymorphisms (SNPs): Ethnicity vs. Breast Cancer Mohsen Hajiloo, Russell.
Ferdinand van ’t Hooft Cardiovascular Genetics and Genomics Group Karolinska Institutet, Stockholm, Sweden Genome-Wide Association Study GWAS
Gene 210 Cancer Genomics May 5, Key events in investigating the cancer genome M R Stratton Science 2011;331:
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
What is Evolution? Variation exists in all populations Variation is inherited Evolution is heritable changes in a population over many generations. Descent.
Positional Cloning LOD Sib pairs Chromosome Region Association Study Genetics Genomics Physical Mapping/ Sequencing Candidate Gene Selection/ Polymorphism.
ATM, Ataxia Telangiectasia, & Cancer
AP Chapter Mutation  Variation  Natural Selection  Speciation Organisms better suited to the environment SURVIVE & REPRODUCE at a greater rate.
BioVision Alexandria 2010 Linking Genes to Disease: Leveraging the Human Genome Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy)
BRCA Genes Dallas Henson.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Rs Vanessa Burns Gene 210 SNP. rs Involved in lung vascular development and skeletogenesis PrrX1-/- mice are embryonic lethal – Cleft palate.
Genetics-multistep tumorigenesis genomic integrity & cancer Sections from Weinberg’s ‘the biology of Cancer’ Cancer genetics and genomics Selected.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
Conservation of genomic segments (haplotypes): The “HapMap” n In populations, it appears the the linear order of alleles (“haplotype”) is conserved in.
CS177 Lecture 10 SNPs and Human Genetic Variation
SNPs and the Human Genome Prof. Sorin Istrail. A SNP is a position in a genome at which two or more different bases occur in the population, each with.
Carlos Alvarez BCMM/CMHG/CGT Joint Meeting September 15 th, 2011 Multidimensional genomics in mammalian cohorts in mammalian cohorts.
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
blank  The Movement of Alleles  The Movement of Alleles Migration and Inter-breeding.
Genes in human populations n Population genetics: focus on allele frequencies (the “gene pool” = all the gametes in a big pot!) n Hardy-Weinberg calculations.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Interleukin 6 pathway and coronary heart disease
EVOLUTION Inheritable Variation. Where does variation come from? Remember that inheritable variation comes from mutations and gene shuffling Inheritable.
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
GENETICS Dr. Samar Saleh Assiss. Lecturer Mosul Medical College Pathology3 rd year.
Genes and Variation Genotypes and phenotypes in evolution Natural selection acts on phenotypes and does not directly on genes. Natural selection.
약물유전체학 Pharmacogenomics Kangwon National Univ School of Medicine Hee Jae Lee PhD.
Robert E. Schoen, MD MPH Associate Professor of Medicine and Epidemiology Division of Gastroenterology University of Pittsburgh Hereditary Colorectal Cancer:
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Nucleotide variation in the human genome
Chapter 16 Section 1 Genes and Variation
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Genetic Diseases By: Mr. Hunter.
Chapter 14 Microevolution in Modern Human Populations
Evolutionary Change in Populations
Genetic Variation Within Populations
Counseling Patients About Germline BRCA Mutations
Nat. Rev. Endocrinol. doi: /nrendo
HARDY-WEINBERG and GENETIC EQUILIBRIUM
Epigenetic heterogeneity is associated with genetic heterogeneity in CLL samples. Epigenetic heterogeneity is associated with genetic heterogeneity in.
Stephen M Prescott, Raymond L White  Cell 
From Prescription to Transcription: Genome Sequence as Drug Target
Figure 1 Allele frequency and effect size for ALS-associated genes
Genomic alterations in breast cancer cell line MDA-MB-231.
Summary of the population genetics measures used in this study.
Gene Discovery for Complex Traits: Lessons from Africa
Genetic Disorders.
Ivan P. Gorlov, Olga Y. Gorlova, Shamil R. Sunyaev, Margaret R
Wenting Wu, Christopher I. Amos, Jeffrey E. Lee, Qingyi Wei, Kavita Y
Specific Tumor Suppressor Genes
Genetic Variation MT: Allele Frequency.
Hunting for Celiac Disease Genes
The principles of genetic association
Significance of the principal genetic variants associated with familial pulmonary fibrosis by their strength and frequency. Significance of the principal.
A population shares a common gene pool.
Figure 2 Distribution of DEPDC5 variants in patients and controls
The I1307K APC gene variant was identified using primers designed to detect the I1307K. The I1307K APC gene variant was identified using primers designed.
Prevalence of germline T790M.
Concordance between the genomic landscape identified by whole-exome sequencing of plasma cfDNA and tumor; DNA and recurrence of KDR/VEGFR2 oncogenic mutations.
Germline variants influencing primary tumor type.
Model for distinct pathways of CCA tumorigenesis.
Presentation transcript:

Ivan P. Gorlov, Olga Y. Gorlova & Christopher I. Amos

Colon cancer results from genetic alterations in multiple genes Inherited mutations in the APC gene dramatically increase risk of colon cancer Cancer development reflects early germline susceptibility as well as subsequent mutations that promote cancer

Lu et al., 2014

The continuum of variation Chung et al.Carcinogenesis 31: , 2010 LinkageSequencingAssociation

SNPs are the most common type of polymorphism in the human genome SNPs occur ~ every 400 bases Minor allele frequency describes prevalence of rarer variant (0,0.5] Very Few (<0.01%) SNPs are associated with diseases SNPs are the bread of Genome Wide Association Studies (GWAS) Success requires inference about effects from Causal Variants

GWAS Have Been Highly Successful in Finding Novel Assocations

Manhattan plot of all lung cancers from 1000 Genomes Imputation TP63 hTERT hMSH5 BRCA2 CHRNA5 CHEK2

There are multiple signals on Chr 15q.25 - sequencing of 96 people Observed in 4 cases Intron Additional variation (small microsatellites and in/dels) in the promoters of both CHRNA5 and CHRNA3

sdSNPs are deleterious enough to impair gene function and increase disease risk. Their effects, however, are not strong enough for selection to eliminate them from population (genetic drift, founder effects, and population bottlenecks are factors that help retain sdSNPs).

We estimated the proportion of nonsynonymous SNPs predicted to be functional by PolyPhen and SIFT in different MAF categories PolyPhen SIFT Proportions of functional SNPs in different MAF categories

The Erudite Hypothesis:  Large fraction of the genetic susceptibility influenced by rare (<5%) variants with relatively strong effect size  Targeting rarer variants for analysis may detect causal variants when sample size is large. The majority of the significant SNPs detected by GWAS are SNPs tagging untyped causal variants – greatly reduces power – but indicates region for further study Brute force analysis is not going to be sufficient to overcome power loss for multiple comparison – have to upweight analyses for most likely causal variants

MAF and a Required Sample Size The numbers of cases and controls that are required in an association study to detect disease variants with allelic odds ratios of 1.2 (red), 1.3 (blue), 1.5 (yellow), and 2 (black). Numbers shown are for a statistical power of 80% at a significance level of P <10-6.

SNP Replication Rate GWAS replication rate is low: about 5% This suggest that GWAS produce a considerable number of false discoveries Development of methods to predict which SNP will be replicated and which will not is important – relevant for seq. We evaluated this approach using the results of published GWASs

Evaluating SNP replications Retrieved all data from the Catalog of Published GWA Studies ( Sep 13 Restricted analysis to SNPs mapping to a single gene and associated with a disease rather than variation SNPs related to 106 diseases were studied, 2659 SNPs from 512 studies SNP findings sorted by date with the first report denoted as a discovery and subsequent studies may be replications Reproducibility score modeled as the ratio of successful replications over the total number of subsequent studies.

Characteristics Studied 3' UTR, 3' Downstream, 5' Upstream, 5' UTR, Coding nonsynonymous, Coding synonymous, Intergenic, Intronic, Non-coding, and Non-coding intronic.

Univariate Test Results

Further evaluating SNP Type (1) All pair-wise comparisons inside the group should be insignificant; and (2) All pairwise comparisons between the groups should be significant. SNP reproducibility was lowest in the group 1 (5’ UTR, Intergenic, 5’ Upstream, Non-coding intronic, Intronic), intermediate in the group 2 (Coding synonymous, 3’ Downstream, Non-coding), and highest in the group 3 (Coding nonsynonymous, 5’UTR). 43 SNPs had >1 annotation

Univariate Predictors of Replication

Multivariable Analysis of Predictors CharacteristicBse(B)p-value SNP_TYPE E-08 Pvalue_mlog E-07 5_MAF_Groups E-05 OMIM Nuclear localization Kinases Conservation index Growth factor Tissue_specific eQTL HapMap Receptors Transcription factors Gene Size

Interactions among factors

Identification of Causal Variants Showing a variants has a causal effect on a disease process is challenging Good candidate for the causal variant should be linked to relevant biological mechanism. For example it may have effect on gene expression or protein structure/folding. Functional analysis is needed to prove that (1) SNP change the important biological function, and that (2) alteration of function is associated with disease risk.

Conclusions Multivariable analysis showed that 11% of variability was explained by SNP predictors with 5% attributed to –log(P) value alone. SNP characteristics are second most prominent, followed by MAF (in wrong direction for weighting) Could also define profile measurement

Number of GWA Studies Evaluated Number of Different Diseases Studied Number of Replicates

Selecting SNPs for Validation Selection SNPs for validation stage is tricky The common approach is based on the ranking SNPs based on P-values from the discovery stage and select the top SNPs People also take into account the region: SNPs located in the region detected by linkage analysis of associated with genes from candidate pathway are considered to be a good candidate for replication

Are GWAS-Detected SNPs Tagging or Causal Variants Even large genotyping studies e.g. genotyping 1,000,000 SNPs cover only a fraction of genetic variation in the human genome: there are more than 70,000,000 SNPs in the human genome Causal variant may or may not be on a genotyping platform but is likely to be captured by sequencing studies Short repeats are not detectable Some regions of the genome are not yet mapped Inversions are difficult to identify

Improving power to detect uncommon variants Single variant as disease locus* Hardy-Weinberg Equilibrium in controls Affection status modeled by a penetrance table. case-control samples High risk allele A with frequency p Low risk allele B with frequency q Power calculation actual sample size needed for genotype-phenotype relationship to risk Peng et al. Hum Genet Jun;127(6): *Multiple variants in a region can be modeled using burden tests

Power = 0.8 Relative risk A: 4, Disease prevalence: 0.05,Disease allele frequency: p=0.01, MAF 0.01, significance 1 x 10 -7

Discovery, Expansion, and Replication Find new associations through pooled or meta-analysis Independent replication studies to confirm genotype-phenotype associations Fine-map of association signals Discovery, Expansion, and Replication Find new associations through pooled or meta-analysis Independent replication studies to confirm genotype-phenotype associations Fine-map of association signals Biological Studies Identify risk-enhancing genetic variants Examine functional consequences of a genetic variant Determine biological mechanisms of risk enhancement Biological Studies Identify risk-enhancing genetic variants Examine functional consequences of a genetic variant Determine biological mechanisms of risk enhancement Epidemiologic Studies Evaluate gene-gene and gene- environment interactions Assess penetrance and population attributable risk Develop complex risk models Evaluate clinical/analytic validity of risk models in observational studies Epidemiologic Studies Evaluate gene-gene and gene- environment interactions Assess penetrance and population attributable risk Develop complex risk models Evaluate clinical/analytic validity of risk models in observational studies GAME-ON Integrative Projects

Steering Committee (policies and processes) 2 voting members from each U19 Working Groups Chairs NCI Program Officers Next Generation Genomics Technologies Analytic and Risk Modeling Functional Assays Epidemiology and Clinical External Advisory Committee LungProstateBreastColonOvarian Working Groups in areas of interest U19- based on Cancer Sites GAME-ON Organizational Structure Executive Committee (decisional body) 5 P.I.s NCI program Officers EpigeneticsTERT-CLPTM1L

Clinical and Epidemiological Working Group Established July 2010; Chairs – Roz Eeles, Rayjean Hung Major initiatives ‒ Pan-cancer meta-analysis across sites, ‒ Inflammation pathway, ‒ pleiotropy analysis (with PAGE) Inventory of data and biospecimens ‒ 96,205 cases/241,880 controls ‒ whole blood derived DNA for cases and 81,673 controls ‒ 3317 fresh frozen tumors ‒ 14,150 FFPE tumors Virtual pathology network Managing data harmonization across sites

Analytical and Risk Modeling Working Group Established July 2010, Chair – Peter Kraft Held regular webinars about innovative methods Provided leadership for integration of data from HapMap 2 Provided guidelines for ongoing imputations of 1000 Genomes imputation Developed and implemented novel meta-analytical methods Evaluated approaches for detecting gene x environment effects

Sporadic Lung Cancer Genome Wide Chr 5p Association Wang, et al. Nature Genetics 40(12): 1407–1409 (2008)

There is Marked Heterogeneity in Associations for Chromosome 5p region

Evaluation of 5p Genes as Potential Lung Tumorigenesis Modifiers in a shRNA/ KRAS LSL-G12D/+ Mouse Model

Loss of CLPTM1L sensitizes lung tumor cells to genotoxic agents

Prevention and Cessation The Public Health Mission Initiation Cigarette Use Nicotine Dependence Cessation and Remission

The Tobacco and Genetics Consortium (2010) Nature Genetics Chromosome 15q25 Is Important for Smoking CHRNA5- A3-B4

Genetics of Smoking Cessation Though the strong genetic effect of the  5 nicotinic receptor contributes to heavy smoking, little to no effect is seen for smoking cessation. University of Wisconsin Timothy Baker and Colleagues Survival Analysis Smoking Cessation N=1,015 Chen et al., 2012

Genetic Effect Seen in the Placebo Group Treatment group (N=898) Placebo group (N=117) Entire Sample (N=1015) UW-TTURC study. Haplotypes based on 2 SNPs (rs , rs680244) H1=GC20.8% H2=GT43.7% H3=AC35.5% Chen et al., 2012

OR (Abstinence) Haplotypes A Significant Genotype by Treatment Interaction Reference Haplotypes (rs , rs680244) H1=GC(20.8%) H2=GT(43.7%) H3=AC(35.5%) Chen et al., 2012

OR (Abstinence) Haplotypes Haplotypes predict abstinence in individuals receiving placebo medication Chen et al., 2012

OR (Abstinence) Haplotypes A Significant Genotype by Treatment Interaction Reference (X 2 =8.97, df=2, p=0.01) Chen et al., 2012

OR (Abstinence) Haplotypes Haplotypes do NOT predict abstinence in individuals receiving pharmacologic treatment Reference Chen et al., 2012

OR (Abstinence) Haplotypes Smokers with the high risk haplotype are more likely to respond to pharmacologic treatment Reference Chen et al., 2012

OR (Abstinence) Haplotypes Smokers with the low risk haplotype do NOT benefit from pharmacologic treatment Reference Chen et al., 2012

The genotype by treatment effect do not differ across pharmacologic treatments. Abstinence(Proportion) No difference in haplotypic risks on cessation across medication groups (wald=1.16, df=3, p=0.88) Chen et al., 2012

Conclusions GWAS Studies have been amazingly productive in identifying new genetic architecture of common influences for complex diseases Evaluating the actual functional effects of identified variants can be challenging, relatively few causal variants are directly identified Follow-up studies have identified significant effects of genetic loci on cancer development Genetic loci identified by GWAS analyses have provided insights into cancer prevention Sequencing is useful but expensive, targeted applications are best

U19OtherSourcesCIDRGenotype Lung7,000040,000Toronto, China Ovary5,000040,000Cambridge Colorectal5,000040,000USC Breast6,00077,000Genome Canada EU CR-UK DKFZ BCFR MDEIE 40,000Cambridge, Genome Quebec Prostate45,00046,000NCI BH Movember 40,000Cambridge, USC, NHGRI BRCA1/2 Carriers 020,00010,000Genome Quebec, Cambridge TOTAL68,000143,000210,000 Oncochip - $49/sample