MSc GBE Course: Genes: from sequence to function Brief Introduction to Systems Biology Sven Bergmann Department of Medical Genetics University of Lausanne.

Slides:



Advertisements
Similar presentations
Basic Gene Expression Data Analysis--Clustering
Advertisements

Gene Shaving – Applying PCA Identify groups of genes a set of genes using PCA which serve as the informative genes to classify samples. The “gene shaving”
Cluster analysis for microarray data Anja von Heydebreck.
MSc GBE Course: Genes: from sequence to function Brief Introduction to Systems Biology Sven Bergmann Department of Medical Genetics University of Lausanne.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Gene selection using Random Voronoi Ensembles Stefano Rovetta Department of Computer and Information Sciences, University of Genoa, Italy Francesco masulli.
University at BuffaloThe State University of New York Interactive Exploration of Coherent Patterns in Time-series Gene Expression Data Daxin Jiang Jian.
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.
Exhaustive Signature Algorithm
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Introduction to DNA Microarrays Todd Lowe BME 88a March 11, 2003.
Gene expression analysis summary Where are we now?
Mutual Information Mathematical Biology Seminar
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Clustering (Part II) 10/07/09. Outline Affinity propagation Quality evaluation.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Systems Biology Biological Sequence Analysis
Indiana University Bloomington, IN Junguk Hur Computational Omics Lab School of Informatics Differential location analysis A novel approach to detecting.
Introduction to Bioinformatics - Tutorial no. 12
 Goal A: Find groups of genes that have correlated expression profiles. These genes are believed to belong to the same biological process and/or are co-regulated.
Fuzzy K means.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Gene Set Enrichment Analysis (GSEA)
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Analyzing transcription modules in the pathogenic yeast Candida albicans Elik Chapnik Yoav Amiram Supervisor: Dr. Naama Barkai.
Master’s Degrees in Bioinformatics in Switzerland: Past, present and near future Patricia M. Palagi Swiss Institute of Bioinformatics.
Microarrays to Functional Genomics: Generation of Transcriptional Networks from Microarray experiments Joshua Stender December 3, 2002 Department of Biochemistry.
Agent-based methods for translational cancer multilevel modelling Sylvia Nagl PhD Cancer Systems Science & Biomedical Informatics UCL Cancer Institute.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Gene expression analysis
Analysis of the yeast transcriptional regulatory network.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Systems Biology ___ Toward System-level Understanding of Biological Systems Hou-Haifeng.
Overview of Bioinformatics 1 Module Denis Manley..
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Cluster validation Integration ICES Bioinformatics.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
GO enrichment and GOrilla
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Clustering Approaches Ka-Lok Ng Department of Bioinformatics Asia University.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Clustering Manpreet S. Katari.
Statistical Data Analysis
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
CISC 841 Bioinformatics (Spring 2006) Inference of Biological Networks
1 Department of Engineering, 2 Department of Mathematics,
Statistical Data Analysis
Building Valid, Credible, and Appropriately Detailed Simulation Models
Predicting Gene Expression from Sequence
Presentation transcript:

MSc GBE Course: Genes: from sequence to function Brief Introduction to Systems Biology Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27 - DGM 328  CH-1005 Lausanne  Switzerland   work: ++41-21-692-5452 cell: ++41-78-663-4980 http://serverdgm.unil.ch/bergmann

Course Overview Basics: What is Systems Biology? Standard analysis tools for large datasets Advanced analysis tools Systems approach to “small” networks

What is Systems Biology? To understand biology at the system level, we must examine the structure and dynamics of cellular and organismal function, rather than the characteristics of isolated parts of a cell or organism. Properties of systems, such as robustness, emerge as central issues, and understanding these properties may have an impact on the future of medicine. Hiroaki Kitano

What is Systems Biology? To me, systems biology seeks to explain biological phenomenon not on a gene by gene basis, but through the interaction of all the cellular and biochemical components in a cell or an organism. Since, biologists have always sought to understand the mechanisms sustaining living systems, solutions arising from systems biology have always been the goal in biology. Previously, however, we did not have the knowledge or the tools. Edison T Liu Genome Institute of Singapore

What is Systems Biology? addresses the analysis of entire biological systems interdisciplinary approach to the investigation of all the components and networks contributing to a biological system [involves] new dynamic computer modeling programs which ultimately might allow us to simulate entire organisms based on their individual cellular components Strategy of Systems Biology is dependent on interactive cycles of predictions and experimentation. Allow[s Biology] to move from the ranks of a descriptive science to an exact science. (Quotes from SystemsX.ch website)

What is Systems Biology?

What is Systems Biology? identify elements (genes, molecules, cells, …) ascertain their relationships (co-expressed, interacting, …) integrate information to obtain view of system as a whole Large (genomic) systems many uncharacterized elements relationships unknown computational analysis should: improve annotation reveal relations reduce complexity Small systems elements well-known many relationships established quantitative modeling of systems properties like: Dynamics Robustness Logics A systems approach to a biological system might be likened to what is required to understand our medical healthcare system: As a first step one has to identify the elements and their groups: patients, physicians, nurses, hospitals, insurance companies, government insurance, etc. To understand the whole healthcare system, the relationships of the elements (e.g. individual patients, physicians, nurses, etc.) within a group must be ascertained with respect to one another and the elements in other groups. This information must be integrated together to obtain a view of how the system works. So it is with systems biology—the types of biological information (DNA, RNA, protein, protein interactions, biomodules, cells, tissues, etc.) also have their individual elements (e.g. specific genes or proteins) and the relationships of these with respect to one another and the elements of other types of biological information must be determined and, once again, all of this information integrated to obtain a view (model) of the system as a whole.

Part 1: Basics Motivation: What is a “systems biology approach”? Why to take such an approach? How can one study systems properties? Practical Part: First look at a set of genomic expression data How to have a global look at such datasets? Distributions, mean-values, standard deviations, z-scores T-tests and other statistical tests Correlations and similarity measures Simple Clustering

First look at a set of genomic expression data

DNA microarray experiments monitor expression levels of thousands of genes simultaneously: test control allows for studying the genome-wide transcriptional response of a cell to interior and exterior changes provide us with a first step towards understanding gene function and regulation on a global scale

Microarrays generate massive data

Log-ratios of expression values + Log ratios indicate differential expression!

Consolidate data from multiple chips into one table and use color-coding 4 1000 3 2 Knock Out (KO) 2000 1 log-ratio r genes 3000 -1 4000 -2 -3 5000 -4 6000 Many KOs (conditions) 1 2 3 4 5 conditions

Rosetta data: The real world … genes conditions Most genes exhibit little differential expression!

Histogram shows distribution # ade1 deletion mutant exhibits small differential expression in most genes!

Rosetta data: Zooming in … genes conditions Only few genes exhibit large differential expression!

Histogram shows distribution # ssn6 deletion mutant exhibits large differential expression in many genes!

Quantification of distribution Mean: μ= =x Std: = # Outliers Mean and Standard Deviation (Std) characterize distribution

? Comparing distributions μ = -5.5 · 10-5  = 0.1434 μ = 0.2366  = 1.9854 ade1 ssn1 # # Are the expression values of ade1 different from those of ssn6?

Quantifying Significance

Student’s T-test t-statistic: difference between means in units of average error Significance can be translated into p-value (probability) assuming normal distributions http://www.physics.csbsju.edu/stats/t-test.html

History: W. S. Gossett [1876-1937] The t-test was developed by W. S. Gossett, a statistician employed at the Guinness brewery. However, because the brewery did not allow employees to publish their research, Gossett's work on the t-test appears under the name "Student" (and the t-test is sometimes referred to as "Student's t-test.") Gossett was a chemist and was responsible for developing procedures for ensuring the similarity of batches of Guinness. The t-test was developed as a way of measuring how closely the yeast content of a particular batch of beer corresponded to the brewery's standard. http://ccnmtl.columbia.edu/projects/qmss/t_about.html

Comparing profiles X and Y (not distributions!): Pearson correlations (Graphic) Each dot is a pair (xi,yi) Y = (yi) X = (xi) Comparing profiles X and Y (not distributions!): What is the tendency that high/low values in X match high/low values in Y? http://davidmlane.com/hyperstat/A34739.html

Pearson correlations: Formulae (complicated version) (simple version using z-scores)

Similarity according to all conditions (“Democratic vote”) Pearson correlations: Intuition Similarity according to all conditions (“Democratic vote”) conditions 1 2 3 4 5 r12 ~ 1 gene Clustering-coefficient 1 2 3 4 5 r12 ~ 0 gene 1 2 3 4 5 r12 ~ -1 gene

Pearson correlations: Caution! High correlation does not necessarily mean co-linearity!

(Hierarchical Agglomerative) Clustering Join most correlated samples and replace correlations to remaining samples by average, then iterate … http://gepas.bioinfo.cipf.es/cgi-bin/tutoX?c=clustering/clustering.config

Clustering of the real expression data

Further Reading

K-means Clustering “guess” k=3 (# of clusters) 1. Start with random positions of centroids ( ) 2. Assign each data point to closest centroid http://en.wikipedia.org/wiki/K-means_algorithm

Hierachical Clustering Plus: Shows (re-orderd) data Gives hierarchy Minus: Does not work well for many genes (usually apply cut-off on fold-change) Similarity over all genes/conditions Clusters do not overlap

Overview of “modular” analysis tools Cheng Y and Church GM. Biclustering of expression data. (Proc Int Conf Intell Syst Mol Biol. 2000;8:93-103) Getz G, Levine E, Domany E. Coupled two-way clustering analysis of gene microarray data. (Proc Natl Acad Sci U S A. 2000 Oct 24;97(22):12079-84) Tanay A, Sharan R, Kupiec M, Shamir R. Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. (Proc Natl Acad Sci U S A. 2004 Mar 2;101(9):2981-6) Sheng Q, Moreau Y, De Moor B. Biclustering microarray data by Gibbs sampling. (Bioinformatics. 2003 Oct;19 Suppl 2:ii196-205) Gasch AP and Eisen MB. Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. (Genome Biol. 2002 Oct 10;3(11):RESEARCH0059) Hastie T, Tibshirani R, Eisen MB, Alizadeh A, Levy R, Staudt L, Chan WC, Botstein D, Brown P. 'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns. (Genome Biol. 2000;1(2):RESEARCH0003.) … and many more! http://serverdgm.unil.ch/bergmann/Publications/review.pdf

How to “hear” the relevant genes? Song A Song B

Coupled two-way Clustering

Inside CTWC: Iterations Depth Genes Samples Init G1 S1 1 G1(S1) G2,G3,…G5 S1(G1) S2,S3 2 G1(S2) G1(S3) G6,G7,….G13 G14,…G21 S1(G2) … S1(G5) S4,S5,S6 S10,S11 None 3 G2(S1)…G2(S3) G5(S1)…G5(S3) G22… …G97 S2(G1)…S2(G5) S3(G1)…S3(G5) S12,… …S51 4 G1(S4) G1(S11) G98,..G105 G151,..G160 S1(G6) S1(G21) S52,... S67 5 G2(S4)...G2(S11) G5(S4)...G5(S11) G161… …G216 S2(G6)...S2(G21) S3(G6)…S3(G21) S68… …S113 Two-way clustering

The (Iterative) Signature Algorithm: One example in more detail: The (Iterative) Signature Algorithm: No need for correlations! decomposes data into “transcription modules” integrates external information allows for interspecies comparative analysis J Ihmels, G Friedlander, SB, O Sarig, Y Ziv & N Barkai Nature Genetics (2002)

Trip to the “Amazon”:

customers with similar choice How to find related items? customers with similar choice items 10 re- commended items 20 30 40 50 60 your choice 70 80 90 100 5 10 15 20 25 30 35 40 45 50 customers

similarly expressed genes How to find related genes? relevant conditions genes 10 similarly expressed genes 20 30 40 50 60 your guess 70 80 90 100 5 10 15 20 25 30 35 40 45 50 conditions J Ihmels, G Friedlander, SB, O Sarig, Y Ziv & N Barkai Nature Genetics (2002)

Signature Algorithm: Score definitions The signature algorithm uses a set of genes as INPUT. These genes could be a putative group of related genes, Or just a random selection of genes. The algorithm uses the expression data to generate its OUTPUT. Genes that are not correlated with the bulk of the input-genes are rejected. And other co-regulated genes that were not part of the INPUT are added.

How to find related genes? Scores and thresholds! condition scores thresholding: initial guesses (genes)

How to find related genes? Scores and thresholds! condition scores gene scores thresholding:

How to find related genes? Scores and thresholds! condition scores thresholding: gene scores

Iterative Signature Algorithm INPUT OUTPUT = INPUT “Transcription Module” An extension of this algorithm is the “Iterative Signature Algorithm”. Here the OUPUT is re-used as INPUT, such that further iterations can bring in more co-regulated genes. We repeat this procedure until the OUTPUT equals to the INPUT. We refer to the final OUTPUT as a “transcription module”. As before it contains both the set of co-regulated genes and the conditions that induce their co-regulation. By definition, all genes outside the module are less co-regulated than the module genes under these conditions. OUTPUT SB, J Ihmels & N Barkai Physical Review E (2003)

Transcription modules Identification of transcription modules using many random “seeds” random “seeds” Transcription modules Independent identification: Modules may overlap! In order to identify the transcription modules encode in the expression data we proceed as follows: We assemble a large number of random INPUT seeds. Applying the iterative signature algorithm to each seed converges into a specific transcription module. Since each transcription module is identified independently, it may overlap with other modules.

New Tools: Module Visualization http://serverdgm.unil.ch/bergmann/Fibroblasts/visualiser.html

Gene enrichment analysis The hypergeometric distribution f(M,A,K,T) gives the probability that K out of A genes with a particular annotation match with a module having M genes if there are T genes in total. http://en.wikipedia.org/wiki/Hypergeometric_distribution

Decomposing expression data into annotated transcriptional modules identified >100 transcriptional modules in yeast: high functional consistency! many functional links “waiting” to be verified experimentally J Ihmels, SB & N Barkai Bioinformatics 2005

Higher-order structure correlated C Let me go one step further and compare higher order correlations between organisms. Since each module comes with an associated set of conditions, we investigate correlations between module according to these conditions. For example, in yeast the regulation pattern of the ribosomal genes is very similar to that of the genes involved in ribosomal RNA processing. A positive correlation is indicated in red and these modules are placed close to each other. In contrast the heat-shock module is anti-correlated with these two modules, which is indicated here in blue. anti- correlated