Gene Ontology as a tool for the systematic analysis of large-scale gene-expression data Stefan Bentink Joint groupmeeting Klipp/Spang 11-20-2002.

Slides:

Advertisements

Similar presentations

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.

Advertisements

ECS 289A Presentation Jimin Ding Problem & Motivation Two-component Model Estimation for Parameters in above model Define low and high level gene expression.

Bivariate Analyses.

Introduction to Microarry Data Analysis - II BMI 730

Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular.

Timothy H. W. Chan, Calum MacAulay, Wan Lam, Stephen Lam, Kim Lonergan, Steven Jones, Marco Marra, Raymond T. Ng Department of Computer Science, University.

Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.

Microarray technology and analysis of gene expression data Hillevi Lindroos.

Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol

Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.

Using Gene Ontology Models and Tests Mark Reimers, NCI.

1 Using Gene Ontology. 2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented.

Microarray Data Preprocessing and Clustering Analysis

27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.

Differentially expressed genes

‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.

Statistical Analysis of Microarray Data

Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.

ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics

09/05/2005 סמינריון במתמטיקה ביולוגית Dimension Reduction - PCA Principle Component Analysis.

Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.

Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.

Cluster analysis  Function  Places genes with similar expression patterns in groups.  Sometimes genes of unknown function will be grouped with genes.

ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.

Gene Expression Based Tumor Classification Using Biologically Informed Models ISI 2003 Berlin Claudio Lottaz und Rainer Spang Computational Diagnostics.

Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.

Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.

Analysis of microarray data

Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.

Introduction The goal of translational bioinformatics is to enable the transformation of increasingly voluminous genomic and biological data into diagnostics.

Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Proteomics Informatics – Data Analysis and Visualization (Week 13)

Week 9 Chapter 9 - Hypothesis Testing II: The Two-Sample Case.

Multiple testing in high- throughput biology Petter Mostad.

1 Identifying differentially expressed sets of genes in microarray experiments Lecture 23, Statistics 246, April 15, 2004.

Frédéric Schütz Statistics and bioinformatics applied to –omics technologies Part II: Integrating biological knowledge Center.

Gene Set Enrichment Analysis (GSEA)

Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.

CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.

Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.

A Short Overview of Microarrays Tex Thompson Spring 2005.

Statistical Principles of Experimental Design Chris Holmes Thanks to Dov Stekel.

MRNA Expression Experiment Measurement Unit Array Probe Gene Sequence n n n Clinical Sample Anatomy Ontology n 1 Patient 1 n Disease n n ProjectPlatform.

Statistics for Differential Expression Naomi Altman Oct. 06.

A Knowledge-Based Clustering Algorithm Driven by Gene Ontology Jill Cheng Affymetrix, Inc. Jan 15, 2004.

Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.

Statistical Testing with Genes Saurabh Sinha CS 466.

Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.

Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.

While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.

Statistics in IB Biology Error bars, standard deviation, t-test and more.

Statistical Analysis of Microarray Data By H. Bjørn Nielsen.

Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.

Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.

Comp. Genomics Recitation 10 4/7/09 Differential expression detection.

Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.

The Broad Institute of MIT and Harvard Differential Analysis.

GO enrichment and GOrilla

Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.

Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.

Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.

Gene Set Analysis using R and Bioconductor Daniel Gusenleitner

Hypothesis Testing and Statistical Significance

Canadian Bioinformatics Workshops

Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.

Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).

Functional classification and visualization of differentially expressed genes. Functional classification and visualization of differentially expressed.

Clustering analysis of DTC-associated genes.

Presentation transcript:

Gene Ontology as a tool for the systematic analysis of large-scale gene-expression data Stefan Bentink Joint groupmeeting Klipp/Spang

Overview Microarrays and the Gene Ontology (GO) database Scoring differential gene- expression in GO groups Checking scores against different null hypothesises Sample data (two types of Breast Cancer) and results

Overview Microarrays and the Gene Ontology (GO) database Scoring differential gene- expression in GO groups Checking scores against different null hypothesises Sample data (two types of Breast Cancer) and results

Microarrays: sample scheme A B C D Genes mRNA B C Transcription Differential Gene Expression RNA-Isolation and synthesis of cDNA with labeled Nucleotides (reverse Transcription) B C labeled cDNA Hybridisation AB DC Fluorescense indicates that gene B and gene C are transcribed

Microarrays: comparative analysis sample tissue I 1,2,... tissue II 1,2,... gene 1meanmean => t-value gene 2meanmean => t-value gene 3meanmean => t-value... ranking ?

How to interprete the data? Long list of siginficant genes Which genes are of interest? Solution: pooling of genes into functional classes  provides a general overview Gene Ontology database provides such a functional classification

The Gene Ontology database

GO is a database of terms for genes Known genes are annotated to the terms Terms are connected as a directed acyclic graph Levels represent specifity of the terms

The Gene Ontology database Apoptotic protease activator Gene OntologyApoptosis regulatorEnzyme activatorApoptosis activatorProtease activatorMolecular function

The Gene Ontology database Every child-term is a member of its parent-term GO contains three different sub- ontologies:  Molecular function  Biological process  Cellular component Unique identfier for every term:  GO: (root=Gene Ontology)

Gene Ontology and microarrays Hypothesis: Functionally related, differentially expressed genes should accumulate in the corresponding GO-group. Problem: Find a method, which scores accumulation of differential gene expression in a node of the Gene Ontology.

Gene Ontology and microarrays tissue type 1 2 GO:2 GO:3 GO:4 samples genes GO:1 P-value for every gene by a two-sample t-test

Overview Microarrays and the Gene Ontology (GO) database Scoring differential gene- expression in GO groups Checking scores against different null hypothesises Sample data (two types of Breast Cancer) and results

GO: Scoring methods Number of significant genes in a GO- group Sum of negative logarithms of all p- values sup|P (n) -F (n) | according to Kolmogorov- Smirnov p-value Σ 1, 2, 3,... -log P ?

The p-value cdf: cummulative distribution function t t p = cdf t>0 => p = 1-cdf => p(0, 0.5] m(0, 1] m=2*p

Sum of log-score Pavalidis, Lewis, Noble 2001; Zien, Küffner, Zimmer, Lengauer *p -> 1 => -log(2*p) -> 0 Small p-values, high score

Kolmogorov-Smirnov-Score empirical theoretical Hypothesis: the calculated p-values (multiplied by 2) are equally distributed between 0 and 1. 0 x x x x x xx xx x x x x 1 0 n 1 0 xxxx xx x x x x 1 0 n 1 S=sup|P (n) -F (n) | P (n) : p-values for genes that fall into a GO-group. F (n) : equally distributed values between 0 and 1.

Overview Microarrays and the Gene Ontology (GO) database Scoring differential gene- expression in GO groups Checking scores against different null hypothesises Sample data (two types of Breast Cancer) and results

Null hypothesises The significant genes (according to Bonferoni: α=0.05/n) are distributed over the GO-groups by chance The existing differential gene expression is distributed over the GO-groups by chance There is no differential gene expression in a GO-group

Checking H 0 by permutation samples genes Permutation of rows Mapping of p-values into GO-groups is randomized. H 0 : Distribution of differential gene expression Permutation of columns Level of p-values is randomized. H 0 : No differential gene expression in a GO-group

Checking H 0 by permutation 1000 random permutations => background distributions  H 0 : Distr. of significant genes  Randomizing GO-groups (rows)  H 0 : Distr. of all p-values  Randomizing GO-groups (rows)  H 0 : Level of p-values  Permutation of columns

Methods (summary) Data P-values Number of significant genes Sum of –log Psup|P (n) -F (n) | Check against 1000 permutations of rows (GO-groups) Check against 1000 permutations of columns (samples => level of p-values)

Overview Microarrays and the Gene Ontology (GO) database Scoring differential gene- expression in GO groups Checking scores against different null hypothesises Sample data (two types of Breast Cancer) and results

Results: Data (Breast Cancer) Two major subclasses  Estrogen receptor postive (ER+)  Estrogen receptor negative (ER-) Estrogen receptor postive  Succeptible to Tamoxifen  Slightly better survival rate Great molecular differences between the two types

Results: Data (Breast Cancer) Data: 25 ER+, 24 ER- Array: Affymetrix HuGeneFL  ~ 7000 Genes  ~ 4000 annotated to GO-terms Data were normalized by variance stabilization (Heydebreck et. al 2001)

Results: Pre-conditions GO-group considered to be significant if less than 5% of the random permutations exceeds the score Only GO-groups with more than 5 and less than 1000 genes were taken into account

Results: Number of significant genes According to the pre-conditions 16 GO-groups were found

Results: Permutation of rows (distribution hypothesis) Sum of –log PKolmogorov-Smirnov

Results: Permutation of columns (differential gene-expression hypothesis) Sum of –log PKolmogorov-Smirnov

Results The column-permutation leads to a very low background distribution  Many „significant“ GO-groups  May help to find functional groups without differential gene- expression Different scoring methods seem to be complementary as indicated by the results of the row-permutation

Results: Permutation of the rows Sum of log: 44 GO-groups were found (5% cond.,...) KS-score: 77 GO-groups were found (5% cond.,...) GO: M-Phase of mitotic cell-cycle (37 genes)

Results: Comparing the scoring- methods (from the row-permutation) A: 16 B: 77 C: 43 A and B: 3 A and C: 13 C and B: 13 A, B and C: 3 C without A: 30 B without A: 74 C B A A: counting of significant genes in GO-groups B: Kolomogorov-Smirnov C: sum of logarithms

Browsing the results

Results: Interesting GO-term (M-Phase) Contains a couple of interesting proliferative genes (p-value ~5*10 -4 => „not significant“) E.g.: polo-like kinase  t-value: -3.45; p-value: 5.59*10 -4  would not been found by a single- gene approach  correlation with ER-Receptor could be found in literature (Wolf et al, 2000)

Summary/ outlook GO provides a general view on large-scale gene- expression data Less deregulated but very interesting genes could be found Third null hypothesis => differential gene expression over a wide range of genes (outlook: which GO-groups contain no differential gene- expression) No bias of scores by top-level genes (outlook: leaving out top-level genes for scoring) Possible modification of scoring-methods: up- and downregulation