Microarrays to Functional Genomics: Generation of Transcriptional Networks from Microarray experiments Joshua Stender December 3, 2002 Department of Biochemistry.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
BioInformatics (3).
Basic Gene Expression Data Analysis--Clustering
DREAM4 Puzzle – inferring network structure from microarray data Qiong Cheng.
Cluster Analysis: Basic Concepts and Algorithms
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
K Means Clustering , Nearest Cluster and Gaussian Mixture
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Introduction to Bioinformatics
Seeing the forest for the trees : using the Gene Ontology to restructure hierarchical clustering Dikla Dotan-Cohen, Simon Kasif and Avraham A. Melkman.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Mutual Information Mathematical Biology Seminar
SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Biological Gene and Protein Networks
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Fuzzy K means.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
6. Gene Regulatory Networks
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Clustering Unsupervised learning Generating “classes”
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD Université de Liege - Institut Montefiore
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Networks and Interactions Boo Virk v1.0.
More on Microarrays Chitta Baral Arizona State University.
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
Microarrays.
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.
IMPROVED RECONSTRUCTION OF IN SILICO GENE REGULATORY NETWORKS BY INTEGRATING KNOCKOUT AND PERTURBATION DATA Yip, K. Y., Alexander, R. P., Yan, K. K., &
Introduction to biological molecular networks
Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Journal club Jun , Zhen.
Semi-Supervised Clustering
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Predicting Gene Expression from Sequence
Presentation transcript:

Microarrays to Functional Genomics: Generation of Transcriptional Networks from Microarray experiments Joshua Stender December 3, 2002 Department of Biochemistry

What is a genetic network? Gene networks are usually represented as directed graphs where the nodes are defined as the genes and the edges represent regulation. Networks summarized a limited relationship between a subset of genes in both positive and negative feedback loops. Jenssen et al. 2001

Why interested in Genetic Networks? Drug therapies for complex diseases Gain insights for stimulus-response interactions Identify novel pathways Understand cell physiology Understand multifactor gene-gene or gene- protein relationships in normal and disease states

Modeling Network Framework Need to define a map from sequence space to functional space Stage of Regulation (RNA, Protein) Temporal Regulation Spatial Regulation(Nucleus,Cytoplasm, etc)

Prazhnik et al. Gene networks:how to put the function in genomics. Trends in Biochem 20:

Methods for Developing Gene Networks Two types of experiments used for network design: Time series and Steady-State gene knock- out Co-expression clustering Cis acting elements in promoters(Amy Creekmore) Reverse Engineering: use of algorithms to generate new networks

Time-Series Approach Expression level of a certain gene at a time point can be modeled as some function of previous time points. Problem exists with dimensionality where more genes then time points. Better results require more time points Solution in the literature: Basic Linear Model, Singular Value Decomposition, and Bayesian Networks

Steady-State Approach Takes advantage of gene deletions or over expression If gene A goes up after gene B deleted, perhaps gene B is negative modulator of A and so on Microarrays offer opportunities to identify gene deletion consequences on entire genomes

Genetic Network Generation Schematic Jong Modeling and simulation of genetic regulatory systems: a literature review. J. Comput Biol 2002;9(1):67-103

Algorithmic Approach to Network Design Boolean Binary State along with co expression clustering Continuous Steady-State(Non- Linear):Assumes genes can have intermediate states Singular Value Decomposition

Methods for Generating Gene Networks D’Haeseleer et al. Genetic network inference: from co-expression clustering to reverse engineering. Bioinformatics 16(8): Fuente et al. Linking the genes: inferring quantitative gene networks from microarray data. Trends in Genetics 18(8): Toh et al. Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinformatics 18(2):

Types of Clustering Non-hierarchical- clusters N objects into K Groups until a preset threshold is established. Examples include: K-means, SOM, and Expectation-maximization Hierarchical- returns a hierarchy of nested clusters (agglomerative vs. divisive)

Why use clustering? Wealth of data from microarray is overwhelming Cluster to limit gene list to one that has genes that change significantly Inference of functional annotation Extraction of regulatory motifs Molecular signature for distinguishing cell or tissue types Use of learning machines to characterize unknown genes

Determining Distances Between Genes Majority of clustering algorithms use matrix of pair wise distances between genes Distances can be calculated based on: 1.Similarity according to positive correlations 2.Similarity based on positive and negative correlations 3.Similarity based on mutual information

Guilt-by Association(GBA) Gene selected at random and determine its nearest neighbor Genes are clustered based on arbitrary cut- off distances in expression space Assumes that genes regulated in the same pattern participate in similar processes

K-means Clustering Partitions N genes into K groups Centroids are weighted center of a cluster Each gene is assigned to a cluster and the centroid is calculated Centroid continuously recalculated and genes reassigned

Self-Organizing Maps (SOM) Very similar to K-means, however cluster centers are placed on a grid At each iteration, gene pattern chosen at random and nearest cluster neighbor and cluster center updated Requires user to define number cluster and grid size

Expectation-Maximization Clustering similar to K-means, however genes assigned to multiple categories Membership to a cluster is based on Gaussian distribution of probabilities Continuously update membership and the 3 following parameters are assigned for each cluster: centroid, covariance, and mixture weight

Determining which clustering analysis to use Each combination of distance measure and clustering algorithm will emphasize different types of regularities associated with data Best to complement data with more than one clustering analysis due to variety of algorithms and the multiple functions of each gene

Brazhnik et al. Construction of a Simple Network Clustering

Boolean Networks Simplification: each gene represented in the binary ON/Off state Each gene is regulated by other genes using Boolean functions Most genes are in an intermediate state and therefore are continuous

Example of a Boolean Network Jong Modeling and simulation of genetic regulatory systems: a literature review. J. Comput Biol 2002;9(1):67-103

Limitations of Boolean Networks Fail to reveal causality Non-Quantitative Does not take into account multiple gene states In the future Protein-Protein interaction maps need to be included

Graphical Gaussian Model Toh et al. Inference of a Genetic Network by a Combined Approach of Cluster Analysis and Graphical Gaussian Modeling. Bioinformatics 18(2): Goal: To establish a method to combine Clustering and GGM for genetic network predictions.

Graphical Gaussian Modeling GGM is a multivariate analysis to infer or test a statistical model for the relationship among a plural of variables where a partial correlation is used Data: 2467 Saccharomyces cerevisiae genes under 79 different conditions

Graphical Gaussian Method Genes were clustered into 34 distinct clusters To reduce dimensionality, each cluster was averaged for each condition

Step 0: Complete Graph generated with M nodes and every node connected to each other. Step 1: Calculate partial correlation Matrix P(  from correlated Coefficient Matrix C(  where  indicates iteration. Step 2:  Find element with smallest absolute value in P(  and replace it with 0. Step 3: Reconstruct C(  from P(  Step 4: Termination is dependent on deviance Dev1= Nlog ( | C(  C(0)|) Dev2= Nlog ( | C(  C(  )|) Calculate dev1 and dev2. If either dev <.05 iteration stopped. Else go to step1 Stepwise iterative algorithm developed by Wermuth and Scheidht(1977)

Graphical Gaussian Method Sub graph of the independence graph corresponding to partial correlation coefficient matrix

Graphical Gaussian Method Results and conclusions Algorithm stopped after 189 iterations SUC2(sucrose hydrolyzing enzyme) was used as model to evaluate accuracy of method: Among 40 known correlations for other genes, method identified 3 to be of same cluster,8 to have correlation of 0 and 29 to interact. Conclude that about 75% accurate. Could be a highly effective method for gene network generation if combined with previous knowledge

Linear Additive Method Fuente et al. “Linking the genes: inferring quantitative gene networks from microarray data.” Trends in Genetics 18(8): Goal: To establish a method for inferring gene networks and the corresponding gene interaction strengths Represent gene networks that consider expression levels as continuous variables

Linear Additive Method Co-control coefficient FR=Fluorescence Intensities

Linear Additive Method Conclusions In Silico approach is useful in testing inferred networks Can be used with experiments with one gene disruption at a time Generated method for developing gene networks that include quantitative interaction strengths

New and Improved Network Designs Continuous-value network inference: uses differential equations and allows genes to be continuous variables Gene Duplication: Network nodes are randomly duplicated to help network connections evolve Many computer simulations are being developed to help mimic real data to aid in the design of new algorithms

Conclusion and Outlook Integration of large amount of biological data and computational power increasing our knowledge of complex systems Increasing need to standardize microarray experiments and create databases Gradual improvement of cluster and gene inference algorithms Addition of differential proteomics and also incorporation of multiple regulation steps