1 Identifying Differentially Regulated Genes Nirmalya Bandyopadhyay, Manas Somaiya, Sanjay Ranka, and Tamer Kahveci Bioinformatics Lab., CISE Department,

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Bayesian network for gene regulatory network construction
Learning on the Test Data: Leveraging “Unseen” Features Ben Taskar Ming FaiWong Daphne Koller.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Modelling and Identification of dynamical gene interactions Ronald Westra, Ralf Peeters Systems Theory Group Department of Mathematics Maastricht University.
1 Parametric Empirical Bayes Methods for Microarrays 3/7/2011 Copyright © 2011 Dan Nettleton.
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Probabilistic Inference Lecture 1
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Visual Recognition Tutorial
Pattern Recognition and Machine Learning
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Lecture 5: Learning models using EM
Simulation and Application on learning gene causal relationships Xin Zhang.
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
6. Gene Regulatory Networks
Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case
Structure Learning for Inferring a Biological Pathway Charles Vaske Stuart Lab.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Random Sampling, Point Estimation and Maximum Likelihood.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Markov Random Fields Probabilistic Models for Images
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Chapter 3 (part 2): Maximum-Likelihood and Bayesian Parameter Estimation Bayesian Estimation (BE) Bayesian Estimation (BE) Bayesian Parameter Estimation:
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.
Dependency networks Sushmita Roy BMI/CS 576 Nov 25 th, 2014.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
SRCOS Summer Research Conf, 2011 Multiple Testing Under Dependency, with Applications to Genomic Data Analysis Zhi Wei Department of Computer Science New.
Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
1 Using Graph Theory to Analyze Gene Network Coherence José A. Lagares Jesús S. Aguilar Norberto Díaz-Díaz Francisco A. Gómez-Vela
A Cooperative Coevolutionary Genetic Algorithm for Learning Bayesian Network Structures Arthur Carvalho
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Learning Deep Generative Models by Ruslan Salakhutdinov
Modeling Perturbations using Gene Networks
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Binarization of Low Quality Text Using a Markov Random Field Model
SiS: Significant Subnetworks in Massive Number of Network Topologies
Shashi Shekhar Weili Wu Sanjay Chawla Ranga Raju Vatsavai
Regulation Analysis using Restricted Boltzmann Machines
GANG: Detecting Fraudulent Users in OSNs
Discriminative Probabilistic Models for Relational Data
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presentation transcript:

1 Identifying Differentially Regulated Genes Nirmalya Bandyopadhyay, Manas Somaiya, Sanjay Ranka, and Tamer Kahveci Bioinformatics Lab., CISE Department, University of Florida

2 Gene interaction through regulatory networks Gene networks: The genes are nodes and the interactions are directed edges. Neighbors – incoming neighbors and outgoing neighbors. A gene can changes the state of other genes – Activation – Inhibition K-RasRafMEK ERK JNK RalGDSRalRalBP1 PLD1 Cob42Rac

Perturbation experiments 3 K-RasRafMEK ERK JNK RalGDSRalRalBP1 PLD1 Cob42Rac Perturbation In a perturbation experiment stimulant (radiation, toxic element, medication), also known as perturbation, is applied on tissues. Gene expression is measured before and after the perturbation. A gene can change its expression as a result of perturbation. Differentially expressed gene (DE). Equally expressed gene (EE). Differentially expressed genes

4 Perturbation experiment : single dataset Primarily affected genes : Directly affected by perturbation. Secondarily affected genes : Primarily affected genes affect some other genes. K-RasRafMEK ERK JNK RalGDSRalRalBP1 PLD1 Cob42Rac Perturbation Primarily affected genes Secondarily affected genes

Differentially and Equally regulated Some dataset inherently has two groups. – Fasting vs non-fasting, Caucasian American vs African American For these datasets, a gene is – Differentially regulated: DE in one group and EE in another. – Equally regulated: DE or EE in both the groups. – Here, gene g 1 is DE in data D A and EE in D B. Hence, it is DR. 5 g1g1 g4g4 g5g5 g2g2 g3 g1g1 g4g4 g5g5 g2g2 DADA DBDB Differentially expressed Equally expressed

6 66 Two datasets: Primary and secondary effects Primarily differentially regulated genes (PDR): Directly affected by perturbation. Secondarily differentially regulated genes (SDR): Primarily affected genes affect some other genes. g1g1 g4g4 g5g5 g2g2 g3g3 g1g1 g4g4 g5g5 g2g2 g3g3 g0g0 DADA DBDB Primarily differentially expressed Secondarily differentially expressed Equally expressed

7 Problem & method Input: Gene expression (control and non-control) of two data groups D A and D B. Problem: Analyzing the primary and secondary affects of the perturbation – Estimate probability that a gene is differentially regulated because of the perturbation or because of the other genes (incoming neighbors)? – What are the primarily differentially regulated genes? Method – Probabilistic Bayesian method, where we employ Markov Random Field to leverage domain knowledge.

Notation Observed variables – Microarray datasets: Two data groups: D A, D B A single gene g i in group C, (C ϵ A,B): For All genes in group A: – Neighborhood variables Hidden variables – State variables: – Regulation variables: Z i – Interaction variables: X ij 8 S Ai S Bi S Aj S Bj ZiZi ZjZj X ij DE 111 EE122 DE EEDE133 EE 144 DEEEDE 215 EEDEEE226 DEEE DE237 EE 248 DE 319 EEDE EE3210 EEDEEEDE3311 EEDEEE 3412 EE DE 4113 EE DEEE4214 EE DE4315 EE 4416

9 Problem formulation Input to the problem: – Microarray expression: Y – Gene network V = {G, W} G = {g 0, g 1, g 2, …, g M } where g 0 is metagene. Goal: – Estimate the density p(X ij | X- X ij, Y, V, W ij = 1 ) for all W ij. This gene estimates the probability that a gene is DR due to the perturbation or due to an incoming neighbor gene. – Note: A higher value for p(X ij ={2, 3}| X- X ij, Y, V, W ij = 1 ) indicates a higher chance that g j is affected by g i

10 Bayesian distribution We propound a Bayesian model as it allows us to incorporate our beliefs into the model. – The joint probability distribution over X – We can derivate the density of X ij, p(X ij | X- X ij, Y, V, W ij =1) from the joint density function. Posterior density Likelihood densityPrior density

11 Prior density function : Markov random field MRF is an undirected graph Ψ = (X, E). – X = {X ij } represents an edge in the gene network. – E = {(X ij, X pj )| W pi = W ij = 1} U {(X ij, X ik ) | W jk = W ij = 1} An edge in MRF corresponds to two edges in the gene network. – (X 23, X 25 ) corresponds to (g 2, g 3 ) and (g 3, g 5 ) g1g4g4 g5 g2 g3g3 g1g4g5 g2 g3 g0g0 DADA DBDB X 01 (2)X 02 (1) X 03 (1)X 05 (3) X 04 (4) X 12 (5) X 23 (1)X 35 (3) X 14 (8)X 13 (5) X 25 (7) (a) Gene network (b) Markov random field

12 Prior density function: Feature functions Three beliefs relevant to our model: – In a data group, the meta gene g 0 can affect the states of all other genes. (modeled by adding directed edges from g 0 to all other genes.) – In a data group, a gene can affect the state of its outgoing neighbors. – A gene has high probability of being equally regulated. We incorporate these beliefs into the MRF graph using seven feature functions. Feature function: Unary or Binary function over the nodes of MRF. A feature function allows us to introduce our belief on the graph.

13 Feature Functions Unary: Capture the frequency of X ij. Binary: Encapsulates the second belief that In a data group, a gene can affect the state of its outgoing neighbors. Unary: Capture the third belief that a gene has high probability of being equally regulated. Prior density function Left External Equality Right External Equality Feature functions Left Internal Equality Right Internal Equality

Binary: External feature functions The external feature functions encapsulate the belief that in a data group, a gene can affect the state of its outgoing neighbors. Left Equality – X ij = X pj Z i = Z p Right Equality – X ij = X ik Z j = Z k 14 g1g1 g2g2 g3g3 g4g4 X 23 X 12 X 34 X 13 X 24 (a) Gene network (a) MRF network Left equality for X 23 Right equality for X 23

Unary: Internal feature functions The internal feature function represents the belief that a gene has high probability of being equally regulated. g i is equally regulated. – X ij = {1,2,3,4} Z i = 1 (DE) – X ij = {13,14,15,15} Z i = 4 (EE) g j is equally regulated. – X ij = {1,5,9,13} Z j = 1 (DE) – X ij = {4,8,12,16} Z j = 4 (EE) 15

16 Objective function optimization Obtain an initial estimate of state variables. Estimate parameters for likelihood density. Estimate parameters that maximize the prior density. Estimate parameters that maximize the pseudo-likelihood density. ICM Differential evolution Student’s t Rank the DE genes based on the likelihood w.r.t the metagene.

17 Dataset and experimental setup DataSet – Real: Adapted from Smirnov et al. generated using 10 Gy ionizing radiation over immortalized B cells obtained from 155 doner. – Real/Synthetic: We created synthetic data to simulate the perturbation experiment based on the real dataset. The simulated model is taken from “Modeling of Multiple Valued Gene Regulatory Networks,” by Garg et. al. – Gene regulatory network: 24,663 genetic interactions over 2,335 genes collected from KEGG database. Experimental setup – Implemented our method in MATLAB and java. – Ran our code on a quad core AMD Opteron 2 Ghz workstation with 32GB memory.

Comparison with other methods We compared our method with three other methods: – SMRF: Our old method, developed to analyze the effect of external perturbation on a single data group. – SSEM: A method to differentiate between primary and secondary effect of perturbation on gene expression dataset. – Two sample t-test (Student’s t test) 18

Comparison with other methods 19

20 Conclusions Our method could find primarily affected genes with high accuracy. It achieved significantly better accuracy than SMRF, SSEM and the student’s t test method. Our method produces a probability distribution rather than a fixed binary decision.

21 Acknowledgement This work was supported partially by NSF under grants CCF and IIS

22 Thank you!