Zhi Yang, MS Department of Preventive Medicine, USC Jul 29, 2018

Slides:



Advertisements
Similar presentations
Multilevel Event History Modelling of Birth Intervals
Advertisements

Properties of Least Squares Regression Coefficients
Coefficient of Determination- R²
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 9 – Correlation and linear regression Marshall University.
Sample Size And Power I Jean B. Nachega, MD, PhD Department of Medicine & Centre for Infectious Diseases Stellenbosch University
NOTATION & ASSUMPTIONS 2 Y i =  1 +  2 X 2i +  3 X 3i + U i Zero mean value of U i No serial correlation Homoscedasticity Zero covariance between U.
Introduction to Regression Analysis
1 Graphical Diagnostic Tools for Evaluating Latent Class Models: An Application to Depression in the ECA Study Elizabeth S. Garrett Department of Biostatistics.
Statistical Analysis SC504/HS927 Spring Term 2008 Session 5: Week 20: 15 th February OLS (2): assessing goodness of fit, extension to multiple regression.
Review of Matrix Algebra
Timed. Transects Statistics indicate that overall species Richness varies only as a function of method and that there is no difference between sites.
CHAPTER 6 ECONOMETRICS x x x x x Dummy Variable Regression Models Dummy, or indicator, variables take on values of 0 or 1 to indicate the presence or absence.
Comparing Population Parameters (Z-test, t-tests and Chi-Square test) Dr. M. H. Rahbar Professor of Biostatistics Department of Epidemiology Director,
Spatial Regression Model For Build-Up Growth Geo-informatics, Mahasarakham University.
Correlation & Regression
Econometrics: The empirical branch of economics which utilizes math and statistics tools to test hypotheses. Special courses are taught in econometrics,
STA291 Statistical Methods Lecture 27. Inference for Regression.
Fundamentals of Statistical Analysis DR. SUREJ P JOHN.
Sampling and Nested Data in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine.
Probabilistic and Statistical Techniques 1 Lecture 24 Eng. Ismail Zakaria El Daour 2010.
Introduction to Linear Regression
How much do you smoke?. I Notice... That the median for the males is 13.5 cigarettes per day and the median for females is 10 cigarettes per day. This.
Lecturer: Kem Reat, Viseth, PhD (Economics)
ObservationsInferences vs.
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1.
CHAPTER 3 INTRODUCTORY LINEAR REGRESSION. Introduction  Linear regression is a study on the linear relationship between two variables. This is done by.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
FIXED AND RANDOM EFFECTS IN HLM. Fixed effects produce constant impact on DV. Random effects produce variable impact on DV. F IXED VS RANDOM EFFECTS.
1.What is Pearson’s coefficient of correlation? 2.What proportion of the variation in SAT scores is explained by variation in class sizes? 3.What is the.
Sampling and Nested Data in Practice-Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine.
Linear Correlation (12.5) In the regression analysis that we have considered so far, we assume that x is a controlled independent variable and Y is an.
BIOSTATISTICS Linear regression. Copyright ©2011, Joanna Szyda INTRODUCTION 1.Linear regression equation 2.Estimation of linear regression coefficients.
Observation vs. Inferences The Local Environment.
Designs for Experiments with More Than One Factor When the experimenter is interested in the effect of multiple factors on a response a factorial design.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
We will use the 2012 AP Grade Conversion Chart for Saturday’s Mock Exam.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
Prediction of lung cancer mortality in Central & Eastern Europe Joanna Didkowska.
STATISTICAL METHODS IN FISHERIES Statistics is the study of the collection, organization, and interpretation of data. It deals with all aspects of this.
Inference for Least Squares Lines
AP Biology Intro to Statistics
Regression Scientific
Scientific Investigation
CHS 221 Biostatistics Dr. wajed Hatamleh
Regression Analysis.
Individual presentation
Understanding Standards Event Higher Statistics Award
CORRELATION & LINEAR REGRESSION
The Association between External Ear Size and Medical Student Performance: A Purely Hypothetical Study John Star Student, B.S. and Jane Doe Mentor, M.D.,
AP Biology Intro to Statistics
Inference for Geostatistical Data: Kriging for Spatial Interpolation
CHAPTER 29: Multiple Regression*
Estimating Populations
CHAPTER 26: Inference for Regression
Chapter 11 Analysis of Variance
Longitudinal Analysis Beyond effect size
Scientific Practice Regression.
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Regression in the 21st Century
Stochastic Optimization Maximization for Latent Variable Models
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Multiple Regression Chapter 14.
PCA of Waimea Wave Climate
Latent Variable Mixture Growth Modeling in Mplus
Adequacy of Linear Regression Models
Regression Analysis.
F test for Lack of Fit The lack of fit test..
Presentation transcript:

Zhi Yang, MS Department of Preventive Medicine, USC Jul 29, 2018 Statistical Approach for Investigating Change in Mutational Processes During Cancer Growth and Development Zhi Yang, MS Department of Preventive Medicine, USC Jul 29, 2018 Hello everyone My name is Zhi, I am a third year Phd student in the biostatistics program. In project four, we also do hierarchical modeling but in tumors by using somatic mutations. More specifically, we will describe the somatic mutation with mutational signature, which is a concept I will introduce later in the talk. Therefore, we use hierarchical modeling of mutational signatures in tumors to capture the change during the tumor growth.

A Unifying Model to Test Difference? HiLDA = “Hierarchical Latent Dirichlet Allocation” Uncertainty in Proportions Somatic mutations pmsignature Estimated Proportions, 𝒒 Are 𝒒 different in two groups? Regress 𝒒 on 𝑮 (0=branch, 1=trunk) HiLDA If people would like to infer the difference in signature proportions, they can take the point estimates by using any current methods, for example, R package pmsignature by assuming independence. Then, take the fractions to regress on the indicator variable group, 1 as trunk 2

Hierarchical Latent Dirichlet Allocation 𝒑 𝒊 𝟎 𝒑 𝒊 𝟏 Hyperprior 𝒒 𝑖 1 𝑍 𝑖,𝑗 1 𝑋 𝑖,𝑗 1 𝒇 𝑘 𝑘=1…𝐾 𝑗=1… 𝑛 𝑖 1 𝑖=1…𝑁 𝑗=1… 𝑛 𝑖 0 𝑋 𝑖,𝑗 0 𝑍 𝑖,𝑗 0 𝒒 𝑖 0 Signature Latent signature assignment Observed Mutation Proportions 𝜹 𝑘 Hyperprior Branch Trunk Adding animation for hyperprior 3

Methods: HiLDA Branch - Trunk 2nd sig 3rd sig Coefficient -0.786 0.984 Group 𝒈 Tumor 𝒊 log(fractions) of Signature 𝒌 1st sig 2nd sig 3rd sig Trunk 1 𝜶 𝟐 + 𝜸 𝟏,𝟐 𝟎 𝜶 𝟑 + 𝜸 𝟏,𝟑 𝟎 2 𝜶 𝟐 + 𝜸 𝟐,𝟐 𝟎 𝜶 𝟑 + 𝜸 𝟐,𝟑 𝟎 … 16 𝜶 𝟐 + 𝜸 𝟏𝟔,𝟐 𝟎 𝜶 𝟑 + 𝜸 𝟏𝟔,𝟑 𝟎 Branch 𝜶 𝟐 + 𝜷 𝟐 + 𝜸 𝟏,𝟐 𝟏 𝜶 𝟑 + 𝜷 𝟑 + 𝜸 𝟏,𝟑 𝟏 𝜶 𝟐 + 𝜷 𝟐 + 𝜸 𝟐,𝟐 𝟏 𝜶 𝟑 + 𝜷 𝟑 + 𝜸 𝟐,𝟑 𝟏 𝜶 𝟐 + 𝜷 𝟐 + 𝜸 𝟏𝟔,𝟐 𝟏 𝜶 𝟑 + 𝜷 𝟑 + 𝜸 𝟏𝟔,𝟑 𝟏 𝑞 𝑖,𝑘 𝑔 = 𝜙 𝑖,𝑘 𝑔 𝑘 𝜙 𝑖,𝑘 𝑔 ; 𝑙𝑜𝑔 𝑞 𝑖,𝑘 𝑔 𝑞 𝑖, 1 𝑔 = 𝜶 𝒌 + 𝜷 𝒌 + 𝜸 𝒊,𝒌 𝒈 𝜶 𝒌 : Baseline difference between 1st and 𝑘 𝑡ℎ signature 𝜷 𝒌 : Difference between two groups in 𝑘 𝑡ℎ signature 𝜸 𝒋𝒌 𝒈 : Variation for 𝑘 𝑡ℎ signature of 𝑖 𝑡ℎ tumor in 𝑔 𝑡ℎ group 4 Group 𝒈 Tumor 𝒊 log(fractions) of Signature 𝒌 1st sig 2nd sig 3rd sig Trunk 1 𝜶 𝟐 + 𝜸 𝟏,𝟐 𝟎 𝜶 𝟑 + 𝜸 𝟏,𝟑 𝟎 2 𝜶 𝟐 + 𝜸 𝟐,𝟐 𝟎 𝜶 𝟑 + 𝜸 𝟐,𝟑 𝟎 … 16 𝜶 𝟐 + 𝜸 𝟏𝟔,𝟐 𝟎 𝜶 𝟑 + 𝜸 𝟏𝟔,𝟑 𝟎 Branch 𝜶 𝟐 + 𝜷 𝟐 + 𝜸 𝟏,𝟐 𝟏 𝜶 𝟑 + 𝜷 𝟑 + 𝜸 𝟏,𝟑 𝟏 𝜶 𝟐 + 𝜷 𝟐 + 𝜸 𝟐,𝟐 𝟏 𝜶 𝟑 + 𝜷 𝟑 + 𝜸 𝟐,𝟑 𝟏 𝜶 𝟐 + 𝜷 𝟐 + 𝜸 𝟏𝟔,𝟐 𝟏 𝜶 𝟑 + 𝜷 𝟑 + 𝜸 𝟏𝟔,𝟑 𝟏 Group 𝒈 Tumor 𝒊 log(fractions) of Signature 𝒌 1st sig 2nd sig 3rd sig Trunk 1 𝜶 𝟐 + 𝜸 𝟏,𝟐 𝟎 𝜶 𝟑 + 𝜸 𝟏,𝟑 𝟎 2 𝜶 𝟐 + 𝜸 𝟐,𝟐 𝟎 𝜶 𝟑 + 𝜸 𝟐,𝟑 𝟎 … 16 𝜶 𝟐 + 𝜸 𝟏𝟔,𝟐 𝟎 𝜶 𝟑 + 𝜸 𝟏𝟔,𝟑 𝟎 Branch 𝜶 𝟐 + 𝜷 𝟐 + 𝜸 𝟏,𝟐 𝟏 𝜶 𝟑 + 𝜷 𝟑 + 𝜸 𝟏,𝟑 𝟏 𝜶 𝟐 + 𝜷 𝟐 + 𝜸 𝟐,𝟐 𝟏 𝜶 𝟑 + 𝜷 𝟑 + 𝜸 𝟐,𝟑 𝟏 𝜶 𝟐 + 𝜷 𝟐 + 𝜸 𝟏𝟔,𝟐 𝟏 𝜶 𝟑 + 𝜷 𝟑 + 𝜸 𝟏𝟔,𝟑 𝟏 Branch - Trunk 2nd sig 3rd sig Coefficient -0.786 0.984 SE 0.152 0.587 P value <0.001 0.094

Results: Two-step Method v.s. HiLDA Branch-Trunk 2nd Sig 3rd Sig Coefficient -0.786 0.984 -0.795 3.417 SE 0.152 0.587 0.179 1.424 P value <0.001 0.094 0.016 The new signatures (3rd signature) tend to appear significantly more often in the branch mutations (𝑝=0.016) by using the new model (HiLDA) after considering uncertainty. 5 Branch - Trunk 2nd sig 3rd sig Coefficient -0.786 0.984 SE 0.152 0.587 P value <0.001 0.094