Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overviews of Multifactor Dimensionality Reduction, Structural Equation Modeling, and Stochastic Modeling BST 764 – FALL 2014 – DR. CHARNIGO – UNIT FOUR.

Similar presentations


Presentation on theme: "Overviews of Multifactor Dimensionality Reduction, Structural Equation Modeling, and Stochastic Modeling BST 764 – FALL 2014 – DR. CHARNIGO – UNIT FOUR."— Presentation transcript:

1 Overviews of Multifactor Dimensionality Reduction, Structural Equation Modeling, and Stochastic Modeling BST 764 – FALL 2014 – DR. CHARNIGO – UNIT FOUR

2 Contents: Multifactor dimensionality reduction… 3-11 Stochastic modeling… 12-27 Structural equation modeling… 28-48

3 Multifactor dimensionality reduction Consider the following scenario: 1. We have a dichotomous outcome variable Y, for example a person either does (Y=1) or does not (Y=0) have a certain disease. 2. We have several categorical predictor variables X 1, X 2, …, X M. For simplicity here, we will suppose that they are all dichotomous, but that is not necessary for multifactor dimensionality reduction. A given predictor variable may represent the alleles at a particular genetic locus (for instance, X 1 = 1 for AA and Aa, X 1 = 0 for aa), an environmental risk factor, or a behavioral risk factor.

4 Multifactor dimensionality reduction Exercise: Suppose you decide to fit a logistic regression model. Is there any merit to including, say, X 1 2 in the model ? What about including X 1 X 2 ? Exercise: In practice, how do you decide whether to include an interaction term in a regression model (logistic or other) ? Exercise: What other methods (besides logistic regression and the multifactor dimensionality reduction to be discussed presently) might you consider for data analysis if interactions were potentially of interest ?

5 Multifactor dimensionality reduction The basic idea behind multifactor dimensionality reduction (MDR) Ref1 is as follows. Prepare a 2 x 2 contingency table describing the distribution of Y in relation to the values of X 1 and X 2. Here is a hypothetical example: Ref1 Ritchie, M. et al (2001). Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. American Journal of Human Genetics, 69, 138. X 1 / X 2 01 020/80 (25%)45/100 (45%) 1 90/120 (75%)

6 Multifactor dimensionality reduction Identify all cells in the table for which more than half of the observations had Y=1. Then define a new dichotomous variable, which I will call Z 12, by setting Z 12 = 1 when X 1 and X 2 values correspond to cells so identified and Z 12 = 0 otherwise. X 1 / X 2 01 020/80 (25%)45/100 (45%) 1 90/120 (75%)

7 Multifactor dimensionality reduction Thus, we have collapsed X 1 and X 2 into a single variable Z 12, which constitutes a reduction from two dimensions to one. Exercise: If we had highlighted the two bottom cells in our table, what could you say about Z 12 ? Exercise: What is “lost” in the reduction from two dimensions ? Exercise: What is the estimated odds ratio describing the relationship between Y and Z 12 ?

8 Multifactor dimensionality reduction Similar computations can be performed based on X 1 and X 3, to yield a new variable Z 13. In fact, we can go through all possible pairs of X 1, X 2, …, X M. Exercise: How many such pairs are there ? Exercise: In a way, we seem to have made matters worse; unless M < 3 (in which case we probably didn’t need MDR in the first place), there are more Z’s than X’s. But we can prioritize the Z’s and choose only the “best” one with which to work. How might we do this ?

9 Multifactor dimensionality reduction Suppose Z* is “best”. We can then adopt Z* as our predictor of Y. However, assessing the statistical significance of Z* is a delicate matter, first because the values of Y were consulted in the definition of Z* and second because the values of Y were consulted in the selection of Z* as “best”. The former issue means that: (i) different cells might have been highlighted with a different sample; and, (ii) the observations in any stratum defined by Z* are not statistically independent. Point (ii) has been a “deal-breaker” for conventional methodology like chi-square testing or making a Wald interval for the (log) odds ratio, but resampling has been employed.

10 Multifactor dimensionality reduction To understand the latter issue, let us perform a thought experiment about 16 stock traders during the past 4 years… In fact, the latter issue applies (along with a concern about multiple testing) whenever we employ an automated variable selection procedure in regression, such as backward elimination. Yet, for most data analysts, the latter issue by itself would not be a “deal-breaker” for applying conventional methodology. Exercise: Assuming that Z* really is “best” and we can ascertain its statistical significance, can you see any problem with choosing Z* as our predictor of Y ?

11 Multifactor dimensionality reduction One can proceed similarly in considering three-way interactions. First define a variable W 123 based on a 2 x 2 x 2 contingency table derived from X 1, X 2, and X 3. Then define a variable W 124 based on X 1, X 2, and X 4. In fact, one can define lots of W variables, and identify a best one W*. Exercise: How might we decide whether to use Z* or W* as our predictor of Y ? Let us now discuss Ref1.

12 Stochastic modeling If you took a course in ordinary differential equations, you may be familiar with the linear equation, dy/dt = -c y, and the logistic equation, dy/dt = c y(k – y). Above, c and k are positive constants. The former is a model for the (macroscopic) amount of a substance present under radioactive decay, and the latter is a model for the population in an environment that cannot sustain growth indefinitely.

13 Stochastic modeling In each case, you can easily write out an exact formula for y as a function of t, given the value of y when t = 0 (an “initial condition”). Exercise: Do so for the first model. How does the constant c relate to the “half life” of the substance ? Of course, whether the models comport with reality is another question, but the point is that the models do not explicitly account for randomness. Thus, these models may be called deterministic.

14 Stochastic modeling In contrast, a stochastic model is one that explicitly accounts for randomness. If we interpret this broadly, just about any statistical model used for data analysis is a stochastic model. Exercise: In nonparametric regression, we are trying to estimate a mean response function μ(x). But μ(x) itself is not random. Can we still regard this as a stochastic model ? Exercise: If considering an individual atom subject to radioactive decay, the time at which decay occurs may be described by an exponential random variable. Can this be reconciled with our earlier representation of such decay ?

15 Stochastic modeling Yet, books and semester-long courses in stochastic modeling may emphasize a concept called a Markov chain. Ref2 A (discrete time) Markov chain with a finite state space S is a sequence X 0, X 1, X 2, etc., supported on S and observed at successive time points, with the following property: P( X n+1 = s n+1 | X n = s n, X n-1 = s n-1, …, X 0 = s 0 ) = P( X n+1 = s n+1 | X n = s n ) for all s n+1, s n, …, s 0 in S. Ref2 Bremaud, P. (1999). Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. Springer, New York.

16 Stochastic modeling The aforementioned property means that where you go next depends on where you are now but not (also) on where you were before. A rather frivolous example is the random walk of an inebriated individual. A somewhat less frivolous (but still something of a toy) example is the movement of a laboratory rat through a maze. Exercise: What can you do if you want X n+1 to depend on X n and X n-1, not just X n ?

17 Stochastic modeling More creative examples arise in music. Ref3 If you are a musician and you see an F minor chord followed by a Bb major chord, that information alone may allow you to guess that you may next see another F minor chord or an Eb major chord. An example of interest and relevance in public health is how elderly people may transition among states such as healthy, mild cognitive impairment, dementia, and death. Ref4 Ref3 Franz, D. (1998). Markov Chains as Tools for Jazz Improvisation Analysis. Master’s thesis, Virginia Tech. Ref4 Tyas et al (2007). Transitions to mild cognitive impairments, dementia, and death: Findings from the Nun Study. American Journal of Epidemiology, 165 : 1231-1238.

18 Stochastic modeling

19 A transient state is one to which a return occurs with less than probability 1, while a recurrent state is one to which a return occurs with probability 1. A state from which one can never escape is called absorbing. Exercise: Which states in the first Markov chain are transient, recurrent, and absorbing ? The second Markov chain ? Often one is interested in the length of time required to “hit” a particular state. For instance, in the second Markov chain, we may ask how much time will be required to “hit” state 2.

20 Stochastic modeling Clearly, the answer may depend on the state in which one begins. Moreover, the answer may be random, so we may be content to reported an expected value. Considering the second Markov chain, if we begin in state 1, there is a 50% chance we’ll move to state 2 in just one step, a 25% chance we’ll move to state 2 in exactly two steps, a 12.5% chance we’ll move to state 2 in exactly three steps, and so forth. Exercise: With what type of distribution are we dealing, and what is its expected value ?

21 Stochastic modeling Still considering the second Markov chain, if we begin in state 0, addressing the question is more difficult, but we can use a technique called first step analysis, which is basically a form of conditional expectation. Let T be the first nonzero time that the Markov chain is in state 2. We have E[ T | X 0 = 0 ] = 0.6 E[T|X 0 = 0,X 1 = 0]+0.2 E[T|X 0 = 0,X 1 = 1]+0.2 E[T|X 0 = 0,X 1 = 2] = 0.6 E[ T | X 1 = 0 ] + 0.2 E[ T | X 1 = 1] + 0.2 = 0.6 E[ (T+1) | X 0 = 0 ] + 0.2 E[ (T+1) | X 0 = 1] + 0.2.

22 Stochastic modeling We know from a previous slide that E[ T| X 0 = 1] = 2, whence E[ (T+1) | X 0 = 1] = 3, which yields E[ T | X 0 = 0 ] = 0.6 E[ T| X 0 = 0 ] + 0.6 + 0.6 + 0.2. We then have a linear equation for E[ T | X 0 = 0 ], which is then solved to yield 3.5.

23 Stochastic modeling Sometimes there is interest in knowing whether the probability distribution associated with a Markov chain stabilizes over time. For example, for a Markov chain with state space S = {0,1,2}, does π(s) := lim n→∞ ( P(X n = 0|X 0 = s), P(X n = 1|X 0 = s), P(X n = 2|X 0 = s)) T exist, and does this limit actually depend on s ? Exercise: Answer “by inspection” for the Markov chain whose transition probability matrix was P 1.

24 Stochastic modeling

25 When π does exist, it will satisfy the following matrix-vector equation (why ?): π = P T π Writing the equation as (I – P T ) π = 0, where I denotes the identity matrix of appropriate dimension, we see that π is an appropriately scaled (how ?) nonzero eigenvector of the matrix I – P T corresponding to the eigenvalue 0. So ascertaining π amounts to solving a linear algebra problem. Using, e.g., the eigen function in R, one can find that π = (0.51, 0.20, 0.29) T for the Markov chain whose transition probability matrix is P 2 as defined on a previous slide.

26 Stochastic modeling Note that P is not in general symmetric, so writing π = P π is a big mistake. If one makes this mistake, one calculates (wrongly in general) that all components of the vector π are equal. Exercise: In what special case will such a calculation actually be correct ? One can, however, write π = π P if π is defined as a row vector rather than as a column vector.

27 Stochastic modeling

28 Structural equation modeling Structural equation modeling is useful for data analysis entailing one or both of two features that render ordinary regression modeling inapplicable: 1.There is not a clear dichotomy of “independent variables” versus “dependent variables” because some variables both predict and are predicted by other variables in the modeling.

29 Structural equation modeling This may happen in a longitudinal study if the value of a variable at an intermediate time point both predicts its value at a later time point and is predicted by its value at an earlier time point. This may happen in a cross-sectional study or a longitudinal study if a variable is a “mediator”. (Can you think of an example from your own research or reading experience ?) So, rather than using “independent variable” and “dependent variable”, we will refer to variables predicted by other variables in the modeling as “endogenous” and to variables not predicted by other variables in the modeling as “exogenous”.

30 Structural equation modeling 2. What we are mainly interested in is not directly observable but is measured, with error, using one or more instruments. Such instruments are often scales obtained by summations of Likert items. (Can you think of an example from your own research or reading experience ?) In this context, that which is not directly observable is often called a “latent construct”, while a corresponding instrument is often referred to as an “observable indicator”.

31 Structural equation modeling Let X be a vector of observed exogenous variables; ξ a vector of latent exogenous variables; δ a vector of error terms in relating X to ξ; Y a vector of observed endogenous variables; η a vector of latent endogenous variables; ε a vector of error terms in relating Y to η; and ζ a vector of error terms in relating η to both itself and ξ.

32 Structural equation modeling Consider a longitudinal study of college students who have a history of drinking in high school. (Why consider having that as an inclusion criterion ?) The goal of the study is to ascertain whether sensation seeking may influence drinking, whether drinking may influence sensation seeking, or both. Let ξ 1 denote sensation seeking at the beginning of college and ξ 2 drinking at the beginning of college. These are exogenous variables in that they will not be predicted by any other variables that we will include in our structural equation model. They are also latent variables because we do not observe them.

33 Structural equation modeling Let η 1 denote sensation seeking after one year of college and η 2 drinking after one year of college. Let η 3 denote sensation seeking after two years of college and η 4 drinking after two years of college. These are endogenous variables in that they will be predicted by other variables that we will include in our structural equation model. They are also latent variables because we do not observe them.

34 Structural equation modeling Let X 1 be the score observed on a scale of novelty seeking; X 2 the score observed on a scale of impulsivity; X 3 the score observed on a scale of drinking frequency; and X 4 the score observed on a scale of drinking intensity, with all of these scales administered at the beginning of college. Thus, X 1 and X 2 are observable indicators of the latent construct ξ 1, while X 3 and X 4 are observable indicators of the latent construct ξ 2.

35 Structural equation modeling Let Y 1 be the score observed on a scale of novelty seeking; Y 2 the score observed on a scale of impulsivity; Y 3 the score observed on a scale of drinking frequency; and Y 4 the score observed on a scale of drinking intensity, with all of these scales administered after one year of college. Let Y 5 be the score observed on a scale of novelty seeking; Y 6 the score observed on a scale of impulsivity; Y 7 the score observed on a scale of drinking frequency; and Y 8 the score observed on a scale of drinking intensity, with all of these scales administered after two years of college.

36 Structural equation modeling The next slide depicts relationships among ξ, η, X, and Y as well as the error terms in δ, ε, and ζ. Assuming for convenience that all variables are centered to have zero means, we have X 1 = β 1 ξ 1 + δ 1 and X 2 = β 2 ξ 1 + δ 2 for some coefficients β 1 and β 2 ; X 3 = β 3 ξ 2 + δ 3 and X 4 = β 4 ξ 2 + δ 4 ; Y 1 = α 1 η 1 + ε 1, Y 2 = α 2 η 1 + ε 2, Y 3 = α 3 η 2 + ε 3, and Y 4 = α 4 η 2 + ε 4 ; Y 5 = α 5 η 3 + ε 5, Y 6 = α 6 η 3 + ε 6, Y 7 = α 7 η 4 + ε 7, and Y 8 = α 8 η 4 + ε 8 ; η 1 = λ 1 ξ 1 + λ 2 ξ 2 +ζ 1 and η 2 = λ 3 ξ 1 + λ 4 ξ 2 +ζ 2 ; and η 3 = κ 1 η 1 + κ 2 η 2 +ζ 3 and η 4 = κ 3 η 1 + κ 4 η 2 +ζ 4.

37 Structural equation modeling ξ1ξ1 ξ2ξ2 η1η1 η2η2 η3η3 η4η4 Y8Y8 Y7Y7 Y4Y4 Y3Y3 X4X4 X3X3 Y2Y2 Y1Y1 Y6Y6 Y5Y5 X2X2 X1X1 δ1δ1 δ2δ2 ζ3ζ3 ζ1ζ1 ε1ε1 ε2ε2 ε5ε5 ε6ε6 ε8ε8 ε7ε7 ε4ε4 ε3ε3 δ4δ4 δ3δ3 ζ4ζ4 ζ2ζ2

38 In general, let β, α, λ, and κ denote coefficient matrices such that X = βξ+δ, Y = αη+ε, and η = κη+λξ+ζ. Also, let Φ denote the covariance matrix of ξ ; let Ψ denote the covariance matrix of ζ ; and let Θ denote the covariance matrix of the concatenation of δ and ε.

39 Structural equation modeling People generally assume that ξ, ζ, δ, and ε are normally distributed; that ξ is uncorrelated with the others; and that ζ is uncorrelated with the others. People sometimes assume that δ and ε are uncorrelated with each other, so that Θ is block diagonal. (Is this realistic ?) Letting I denote an identity matrix of appropriate dimension, we also assume I – κ to be nonsingular. (Why is that needed ?)

40 Structural equation modeling In addition, constraints are generally necessary to ensure model identifiability. Typically, one of the coefficients linking a latent construct to its observable indicators is fixed at unity. While constraining some of the components of α and β may seem untoward, the scientific questions of interest are generally more related to whether the components of κ and λ are nonzero. In our example, testing κ 3 = λ 3 = 0 is asking whether sensation seeking influences drinking, while testing κ 2 = λ 2 = 0 is asking… what ?

41 Structural equation modeling Coefficients and covariance matrix parameters are typically estimated by maximum likelihood. Coefficient estimates are then divided by standard errors to obtain approximate Z statistics from which the statistical significance of the coefficient estimates or lack thereof may be ascertained. (What method of testing is this ?)

42 Structural equation modeling Let Σ denote the covariance matrix of the concatenation of X and Y. Note that, according to the structural equation model, Σ is a function of β, α, λ, κ, Φ, Ψ, and Θ. So, there are actually two ways to estimate Σ. One, we can plug in the estimates of β, α, λ, κ, Φ, Ψ, and Θ. This is referred to as a “model-implied” estimate of Σ. Two, we can use the ordinary sample covariance matrix.

43 Structural equation modeling The discrepancy between the model-implied estimate and the sample covariance matrix may be used to assess the model’s “goodness of fit”. In particular, a chi-square statistic may be defined and compared to an upper quantile of a chi-square distribution on degrees of freedom determined by the dimensions of X, Y, β, α, λ, κ, Φ, Ψ, and Θ.

44 Structural equation modeling Caution: In my experience, such a comparison almost invariably results in rejection of the implicit null hypothesis at the conventional 0.05 significance level. So, I generally prefer to look at the ratio of the chi-square statistic to its degrees of freedom. As a rough rule of thumb, I worry about goodness of fit when this ratio exceeds 2 and especially when it exceeds 3. Also, several other measures for goodness of fit are routinely reported by software implementing structural equation modeling. They are usually calibrated between 0 and 1, with less than 0.05 or greater than 0.95 representing satisfactory goodness of fit according as whether 0 or 1 is optimal.

45 Structural equation modeling The chi-square statistic also provides a mechanism for testing hypotheses involving multiple coefficients such as κ 3 = λ 3 = 0 or κ 2 = λ 2 = 0. In particular, one can re-fit the structural equation model imposing the constraints required by the hypotheses and then compare the change in the chi-square statistic to an upper quantile of a chi-square distribution on degrees of freedom equal to the number of constraints.

46 Structural equation modeling In my experience, goodness of fit is often substantially affected by the assumptions one makes on which elements of Φ, Ψ, and Θ are zero. This is problematic because, while subject matter theory may suggest the locations of zeroes in κ and λ, subject matter theory may not be informative about the locations of zeroes in Φ, Ψ, and Θ. On the other hand, leaving Φ, Ψ, and Θ unconstrained may lead to over-fitting or to failures in the estimation algorithm. (What sort of failures ?)

47 Structural equation modeling A possible strategy is to begin structural equation modeling with a fairly sparse configuration of nonzero elements in Φ, Ψ, and Θ. Then, using “modification indices” provided from the software implementing structural equation modeling, one can add nonzero elements to Φ, Ψ, and Θ until a point of diminishing returns is reached. This is analogous to stepwise selection of independent variables in ordinary regression modeling and is therefore subject to the same criticism. (What criticism ?)

48 Structural equation modeling Some references are as follows: Bollen and Long (1993). Testing Structural Equation Models. SAGE, Newbury Park CA. Hoyle (1995). Structural Equation Modeling: Concepts, Issues, and Applications. SAGE, Thousand Oaks CA. Kline (2010). Principles and Practice of Structural Equation Modeling, 3 rd Edition. Guilford, New York NY. Mueller (1996). Basic Principles of Structural Equation Modeling. Springer-Verlag, New York NY.


Download ppt "Overviews of Multifactor Dimensionality Reduction, Structural Equation Modeling, and Stochastic Modeling BST 764 – FALL 2014 – DR. CHARNIGO – UNIT FOUR."

Similar presentations


Ads by Google