Download presentation
Presentation is loading. Please wait.
1
By: Brittany Duncan Mentors: Janet Sinsheimer PhD (UCLA) Mary Sehl M.D.(UCLA) DNA repair SNPs Associated with Breast Cancer
2
What We Aim to Do To ultimately determine: What SNP and Environmental factors contribute to breast cancer Whether a combination of SNPs acting independently might be significant SNP-SNP interactions associated with breast cancer
3
Why is this Important? Medical: Determining SNP associations with Breast Cancer would: Help predict and prevent future cases Bioinformatics: Comparing two analysis techniques will: Help to create generalized method for analyzing future SNP interactions
4
SNP-Single Nucleotide Polymorphism www.dnalandmarks.com/.../marker_s ystems_snp.html A single nucleotide change at one particular locus Must be present in at least 1% of the population Can result in genotypic and phenotypic effects ACCGTTGTGACCTGCAGTGGAAACAGTATGA ACCATTGTGACATGCAGTGGAAACAGTGTGA
5
Mechanisms of DNA Repair NER = nucleotide-excision repair, BER = base-excision repair, MMR = mismatch repair, DSBR =double strand break repair, DRCCD = damage recognition cell cycle delay response, NHEJ = non-homologous end-joining HR = Homologous Recombination
6
DSBR pathway Double stranded break repair pathway One mechanism responsible for the repair and maintenance of the integrity of DNA BRCA1 and 2 key elements in this pathway Vulnerability to breast cancer may be due to an individual’s capability in repairing damaged DNA
7
Steps to Success Recreate data found in previous paper Implement Cordell and Clayton: Stepwise regression method Write up results and Create tables Future Direction: Compare results to Lasso method
8
UCLA Cancer Registry UCLA familial cancer registry Participants may have cancer or not but must meet these criteria: Be 18 yrs or older Two family members with a same type of cancer or related cancers Or must have a family history of cancer susceptibility Mutation in BRCA1 or BRCA2 gene http://www.registry.mednet.ucla.edu/
9
Preliminary Work Case/control study 399 Caucasian (unrelated) women were chosen for studyCaucasian 104 SNPs in 17 genes of the DSBR pathway were chosen Logistic regression analysis conducted on each SNP to determine associations with breast cancer Adjusted models to include covariates Findings 12 significant SNPs
10
Confirming Data: The Process
11
First Step: Defining Variables Genotype. Frequency DV DV G – G 199 +0 +0 A – G 143 +1 +1 A – A 19 +2 +1 Additive A allele confers risk in having breast cancer and A-A even more so Dominant A allele confers risk in having breast cancer regardless of number of copies Example of SNP rs16889040 on RAD21 gene, Chromosome 5 Additive Dominant
12
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.42388 0.72444 -1.965 0.049358 age 0.04464 0.01305 3.419 0.000628 brca1brca1 0.49067 0.39063 1.256 0.209079 brca2 brca2 -0.11683 0.49631 -0.235 0.813896 EDUCATION1 EDUCATION1 0.08139 0.33849 0.240 0.809976 EDUCATION2 EDUCATION2 0.28671 0.34757 0.825 0.409424 Ashkenazi_statusAshkenazi_status -0.68789 0.28608 -2.405 0.016192 SNP -0.76382 0.27855 -2.742 0.006104 Logit(Y) = B 0 + B 1 X 1 ….+ B n X n Example output from Logistic Regression Dominant Model rs16889040 Education
13
MRE11A NBS1 RAD50 ATM BRCA1 XRCC6 XRCC5 DNA-PK XRCC4 LIG4 ZNF350 BRIP1 RAD51 BRCA2 RAD54L RAD52 XRCC2 XRCC3 TP53 Double-Strand Break Repaired DNA Non-Homologous End Joining H2AX RAD21 Homologous Recombination
14
Cordell and Clayton Method: Stepwise Logistic Regression
15
Stepwise Logistic Regression: Stepwise logistic regression Cordell and Clayton Method used 8 genes that had significant SNPs in them Ran forward regression analysis on each gene Performed LRT and from test found p-valueLRT
16
Cumulative Effects Cumulative Effects: SNPs in model but act independently Findings: No Accumulation of SNPS were found significant
17
Interactive Effects Multiplicative effects- Multiplicative effects- interaction between SNPs Findings: RAD21 Gene interesting but not enough information to be considered significant SNPd: SNPf SNPd: SNPg SNPf: SNPg Three way interaction was found to be not significant SNPd = rs16888927 SNPf = rs16888997 SNPg = rs16889040
18
SNP Interactions SNPs OR(e β ) p-value. SNPd: SNPf 1.81212 0.090404 SNPd: SNPg 1.76986 0.096392 SNPf: SNPg 1.78383 0.090659 Using p-value threshold of 0.05
19
Special Thanks To my amazing mentors at UCLA: Janet Sinsheimer PhD, Biostatistics lab Mary Sehl M.D., Dr. Sinsheimer’s lab UCLA For making the SoCalBSI program possible: The wonderful mentors at California State Los Angeles Dr. Momand, Dr. Warter Perez, Dr. Sharp, Dr. Johnston, Mr. Johnston, Dr. Huebach, Dr. Krilowicz Program Coordinator Ronnie Cheng Funding: American Society of Clinical Oncology – Mary Sehl National Science Foundation - SOCALBSI National Institute of Health - SOCALBSI Economic and Workplace Development -SOCALBSI
20
Question Slides Recoding for Education Why Use Education? Why Only Caucasian Women? LRT/Chi^2 NEHJ and HR Multiple vs Independent LRT Test Three Way Interaction OR Lasso Method
21
Recoding for Education Logistic Regression Education: 1-8 answers in a survey 1-3 highest education high school (control) 4-5 some college 6-8 higher education Educ1 Educ2 1-3 0 0 μ1 = μ + 0X α1 + 0Xα2 4-5 1 0 μ2 = μ + 1X α1 + 0X α2 6-8 0 1 μ3 = μ + 0X α1 + 1X α2 Coded in 0 and 1 transformation from linear to logistic Linear: Y = B 0 + B 1 X 1 ….+ B n X n Logistic: ln[ pi/(1-pi n ) ] = B 0 + B 1 X 1 ….+ B n X n Y == {0,1} Essentially the log of the probability of the odds Back
22
Why Use Education as a Covariate? Routinely include at least 1 socioeconomic covariate Education: Not necessarily because statistically interesting, but because other studies have repeatedly found significance Back
23
Why Only White Women? Homogeneous Population In different populations (men and other ethnicities), different genes may be involved Not enough sampling of any other group How data was found: Registry Website and Questionnaire in English Location of UCLA Etc… Back
24
LRT Roughly estimated as a chi-squared distribution X 2= 3.84 for 1 df P-val =.05 http://www.union.edu/PUBLIC/BIODEPT/chi.html Back
25
Cell cycle with NEHJ and HR Alignment and ligation of termini at DSB HR http://www2.mrc-lmb.cam.ac.uk/personal/sl/Html/Graphics/CellCycle.gif Lord, Garret, Ashworth Clin Cancer Res 2006; 12(15) GC- use sister chromatid as template SSA- homologous sequences aligned, residues no longer present are deleted Back
26
Multiple vs. Acting Independently Cumulative: logit(P(Y)) = α + β T z + Ɣ 1 SNP1 + Ɣ 2 SNP2 Multiplicative: logit(P(Y)) = α + β T z + Ɣ 1 SNP1 + Ɣ 2 SNP2 + Ɣ 3 SNP1*SNP2 Covariates Independent Combination of two Back
27
LRT Test Equ: LRT= 2ln(L(HA)/L(H0) ) For a 1 df, 3.84 or higher corresponds to a p-value of 0.05 or lower Alternative model fits the data better Less than 3.84 Null model fits the data better Testing for which model fits the data better Back
28
Three Way Interaction logit(P(Y)) = α + β T z +SNPd + SNPf + SNPg +SNPd*SNPf*SNPg Covariates Back
29
ODDS RATIO Coded in 0 and 1 transformation from linear to logistic Linear: Y = B 0 + B 1 X 1 ….+ B n X n Logistic: ln[ pi/(1-pi n ) ] = B 0 + B 1 X 1 ….+ B n X n Y == {0,1} Odds Ratio is e B because of Logistic Regression’s Transformed form Back
30
Lasso Penalized Regression Exploratory method used when large amount of predictors and small amount of data Penalizes model for having to many borderline significant predictors F(θ) = 1/2 Σ i (yi - μ –Σ j (x ij β j )) 2 + λΣ j | β j | Least SquaresPenalty Term Back
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.