Download presentation
Presentation is loading. Please wait.
Published byGregory Banks Modified over 8 years ago
1
1 Identifying Differentially Regulated Genes Nirmalya Bandyopadhyay, Manas Somaiya, Sanjay Ranka, and Tamer Kahveci Bioinformatics Lab., CISE Department, University of Florida
2
2 Gene interaction through regulatory networks Gene networks: The genes are nodes and the interactions are directed edges. Neighbors – incoming neighbors and outgoing neighbors. A gene can changes the state of other genes – Activation – Inhibition K-RasRafMEK ERK JNK RalGDSRalRalBP1 PLD1 Cob42Rac
3
Perturbation experiments 3 K-RasRafMEK ERK JNK RalGDSRalRalBP1 PLD1 Cob42Rac Perturbation In a perturbation experiment stimulant (radiation, toxic element, medication), also known as perturbation, is applied on tissues. Gene expression is measured before and after the perturbation. A gene can change its expression as a result of perturbation. Differentially expressed gene (DE). Equally expressed gene (EE). Differentially expressed genes
4
4 Perturbation experiment : single dataset Primarily affected genes : Directly affected by perturbation. Secondarily affected genes : Primarily affected genes affect some other genes. K-RasRafMEK ERK JNK RalGDSRalRalBP1 PLD1 Cob42Rac Perturbation Primarily affected genes Secondarily affected genes
5
Differentially and Equally regulated Some dataset inherently has two groups. – Fasting vs non-fasting, Caucasian American vs African American For these datasets, a gene is – Differentially regulated: DE in one group and EE in another. – Equally regulated: DE or EE in both the groups. – Here, gene g 1 is DE in data D A and EE in D B. Hence, it is DR. 5 g1g1 g4g4 g5g5 g2g2 g3 g1g1 g4g4 g5g5 g2g2 DADA DBDB Differentially expressed Equally expressed
6
6 66 Two datasets: Primary and secondary effects Primarily differentially regulated genes (PDR): Directly affected by perturbation. Secondarily differentially regulated genes (SDR): Primarily affected genes affect some other genes. g1g1 g4g4 g5g5 g2g2 g3g3 g1g1 g4g4 g5g5 g2g2 g3g3 g0g0 DADA DBDB Primarily differentially expressed Secondarily differentially expressed Equally expressed
7
7 Problem & method Input: Gene expression (control and non-control) of two data groups D A and D B. Problem: Analyzing the primary and secondary affects of the perturbation – Estimate probability that a gene is differentially regulated because of the perturbation or because of the other genes (incoming neighbors)? – What are the primarily differentially regulated genes? Method – Probabilistic Bayesian method, where we employ Markov Random Field to leverage domain knowledge.
8
Notation Observed variables – Microarray datasets: Two data groups: D A, D B A single gene g i in group C, (C ϵ A,B): For All genes in group A: – Neighborhood variables Hidden variables – State variables: – Regulation variables: Z i – Interaction variables: X ij 8 S Ai S Bi S Aj S Bj ZiZi ZjZj X ij DE 111 EE122 DE EEDE133 EE 144 DEEEDE 215 EEDEEE226 DEEE DE237 EE 248 DE 319 EEDE EE3210 EEDEEEDE3311 EEDEEE 3412 EE DE 4113 EE DEEE4214 EE DE4315 EE 4416
9
9 Problem formulation Input to the problem: – Microarray expression: Y – Gene network V = {G, W} G = {g 0, g 1, g 2, …, g M } where g 0 is metagene. Goal: – Estimate the density p(X ij | X- X ij, Y, V, W ij = 1 ) for all W ij. This gene estimates the probability that a gene is DR due to the perturbation or due to an incoming neighbor gene. – Note: A higher value for p(X ij ={2, 3}| X- X ij, Y, V, W ij = 1 ) indicates a higher chance that g j is affected by g i
10
10 Bayesian distribution We propound a Bayesian model as it allows us to incorporate our beliefs into the model. – The joint probability distribution over X – We can derivate the density of X ij, p(X ij | X- X ij, Y, V, W ij =1) from the joint density function. Posterior density Likelihood densityPrior density
11
11 Prior density function : Markov random field MRF is an undirected graph Ψ = (X, E). – X = {X ij } represents an edge in the gene network. – E = {(X ij, X pj )| W pi = W ij = 1} U {(X ij, X ik ) | W jk = W ij = 1} An edge in MRF corresponds to two edges in the gene network. – (X 23, X 25 ) corresponds to (g 2, g 3 ) and (g 3, g 5 ) g1g4g4 g5 g2 g3g3 g1g4g5 g2 g3 g0g0 DADA DBDB X 01 (2)X 02 (1) X 03 (1)X 05 (3) X 04 (4) X 12 (5) X 23 (1)X 35 (3) X 14 (8)X 13 (5) X 25 (7) (a) Gene network (b) Markov random field
12
12 Prior density function: Feature functions Three beliefs relevant to our model: – In a data group, the meta gene g 0 can affect the states of all other genes. (modeled by adding directed edges from g 0 to all other genes.) – In a data group, a gene can affect the state of its outgoing neighbors. – A gene has high probability of being equally regulated. We incorporate these beliefs into the MRF graph using seven feature functions. Feature function: Unary or Binary function over the nodes of MRF. A feature function allows us to introduce our belief on the graph.
13
13 Feature Functions Unary: Capture the frequency of X ij. Binary: Encapsulates the second belief that In a data group, a gene can affect the state of its outgoing neighbors. Unary: Capture the third belief that a gene has high probability of being equally regulated. Prior density function Left External Equality Right External Equality Feature functions Left Internal Equality Right Internal Equality
14
Binary: External feature functions The external feature functions encapsulate the belief that in a data group, a gene can affect the state of its outgoing neighbors. Left Equality – X ij = X pj Z i = Z p Right Equality – X ij = X ik Z j = Z k 14 g1g1 g2g2 g3g3 g4g4 X 23 X 12 X 34 X 13 X 24 (a) Gene network (a) MRF network Left equality for X 23 Right equality for X 23
15
Unary: Internal feature functions The internal feature function represents the belief that a gene has high probability of being equally regulated. g i is equally regulated. – X ij = {1,2,3,4} Z i = 1 (DE) – X ij = {13,14,15,15} Z i = 4 (EE) g j is equally regulated. – X ij = {1,5,9,13} Z j = 1 (DE) – X ij = {4,8,12,16} Z j = 4 (EE) 15
16
16 Objective function optimization Obtain an initial estimate of state variables. Estimate parameters for likelihood density. Estimate parameters that maximize the prior density. Estimate parameters that maximize the pseudo-likelihood density. ICM Differential evolution Student’s t Rank the DE genes based on the likelihood w.r.t the metagene.
17
17 Dataset and experimental setup DataSet – Real: Adapted from Smirnov et al. generated using 10 Gy ionizing radiation over immortalized B cells obtained from 155 doner. – Real/Synthetic: We created synthetic data to simulate the perturbation experiment based on the real dataset. The simulated model is taken from “Modeling of Multiple Valued Gene Regulatory Networks,” by Garg et. al. – Gene regulatory network: 24,663 genetic interactions over 2,335 genes collected from KEGG database. Experimental setup – Implemented our method in MATLAB and java. – Ran our code on a quad core AMD Opteron 2 Ghz workstation with 32GB memory.
18
Comparison with other methods We compared our method with three other methods: – SMRF: Our old method, developed to analyze the effect of external perturbation on a single data group. – SSEM: A method to differentiate between primary and secondary effect of perturbation on gene expression dataset. – Two sample t-test (Student’s t test) 18
19
Comparison with other methods 19
20
20 Conclusions Our method could find primarily affected genes with high accuracy. It achieved significantly better accuracy than SMRF, SSEM and the student’s t test method. Our method produces a probability distribution rather than a fixed binary decision.
21
21 Acknowledgement This work was supported partially by NSF under grants CCF-0829867 and IIS-0845439.
22
22 Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.