From genes to functional blocks in the study of biological systems Fátima Al-Shahrour, Joaquín Dopazo National Institute of Bioinformatics, Functional.

Slides:



Advertisements
Similar presentations
Supplementary data Fig 1: Comparison of differential mRNA expression data obtained by qRT-PCR and microarray (relative expression in tumors compared to.
Advertisements

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Relating Gene Expression to a Phenotype and External Biological Information Richard Simon, D.Sc. Chief, Biometric Research Branch, NCI
Cancer-inducing genes - CRGs (cooperation response genes) Paper Presentation Nadine Sündermann.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular.
Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Using Gene Ontology Models and Tests Mark Reimers, NCI.
Microarray GEO – Microarray sets database
Public data - available for projects 6 data sets: –Human Tissues –Leukemia –Spike-in –FARO compendium – Yeast Cell Cycle –Yeast Rosetta Find one yourself.
Microarray Data Preprocessing and Clustering Analysis
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Babelomics Functional interpretation of genome-scale experiments Barcelona, 28 November de 2007 Ignacio Medina David Montaner
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Analysis of microarray data
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
1Module 2: Analyzing Gene Lists Canadian Bioinformatics Workshops
Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit.
1 Identifying differentially expressed sets of genes in microarray experiments Lecture 23, Statistics 246, April 15, 2004.
Frédéric Schütz Statistics and bioinformatics applied to –omics technologies Part II: Integrating biological knowledge Center.
Gene Set Enrichment Analysis (GSEA)
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Networks and Interactions Boo Virk v1.0.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
It is only the beginning: Putting microarrays into context Matthias E. Futschik Institute for Theoretical Biology Humboldt-University, Berlin, Germany.
Course on Functional Analysis
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
UBio Training Courses Micro-RNA web tools Gonzalo
Gene Ontology as a tool for the systematic analysis of large-scale gene-expression data Stefan Bentink Joint groupmeeting Klipp/Spang
1 Gene Ontology Javier Cabrera. 2 Outline Goal: How to identify biological processes or biochemical pathways that are changed by treatment.Goal: How to.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
PaLS: Pathways and Literature Strainer Filtering common literature, ontology terms and pathway information. Andrés Cañada Pallarés Instituto Nacional de.
Introduction to Statistics Alastair Kerr, PhD. Think about these statements (discuss at end) Paraphrased from real conversations: – “We used a t-test.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Statistical Testing with Genes Saurabh Sinha CS 466.
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
Cluster validation Integration ICES Bioinformatics.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
The Broad Institute of MIT and Harvard Differential Analysis.
GO enrichment and GOrilla
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Microarray data analysis using GEPAS and Babelomics Department of Bioinformatics, Centro de Investigación Príncipe Felipe, and Functional genomics node,
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Canadian Bioinformatics Workshops
Nature as blueprint to design antibody factories Life Science Technologies Project course 2016 Aalto CHEM.
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
CAMDA becomes itinerant... CAMDA 07 Valencia Spain.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
GEO (Gene Expression Omnibus) Deepak Sambhara Georgia Institute of Technology 21 June, 2006.
Canadian Bioinformatics Workshops
Module 2: Analyzing gene lists: over-representation analysis
::: Schedule. Biological (Functional) Databases
Volume 125, Issue 1, Pages (April 2006)
Reanalysis of an existing experiment using ADAGE
Figure 1. Identification of three tumour molecular subtypes in CIT and TCGA cohorts. We used CIT multi-omics data ( Figure 1. Identification of.
CREBBP loss-of-function results in gene expression repression signature. CREBBP loss-of-function results in gene expression repression signature. A–D,
Volume 28, Issue 4, Pages e6 (July 2019)
Presentation transcript:

From genes to functional blocks in the study of biological systems Fátima Al-Shahrour, Joaquín Dopazo National Institute of Bioinformatics, Functional Genomics node Department of Bioinformatics, Centro de Investigación Príncipe Felipe, and Functional genomics node, INB, Valencia, Spain

Two-steps functional interpretation statistic - + A Metabolism Transport... Reproduction test A B B Metabolism Transport... Reproduction test t-test Genes are selected based on their experimental values and... 2 Enrichment in functional terms is tested (FatiGO, GoMiner, etc.) 2 2

Two-steps approach reproduces pre-genomics paradigms experiments interpretation test no pass Context and cooperation between genes is ignored experiments test interpretation test.... ::::

Cooperative activity of genes can be detected and related to a macroscopic observation statistic - + A B GO 1 GO 2 GO 3 Ranking: A list of genes is ranked by their differential expression between two experimental conditions A and B (using fold change, a t-test, etc.) Distribution of GO: Rows GO 1, GO 2 and GO 3 represent the position of the genes belonging to three different GO terms across the ranking. The first GO term is completely uncorrelated with the arrangement, while GOs 2 and 3 are clearly associated to high expression in the experimental conditions B and A, respectively. Note that genes can be multi-functional

A previous step of gene selection causes loss of information and makes the test insensitive statistic - + A B GO 1 GO 2 If a threshold based on the experimental values is applied, and the resulting selection of genes compared for over-abundance of a functional term, this migh not be found. t-test with two tails. p<0.05 Significantly over-expressed in B Significantly over-expressed in A Classes expressed as blocks in A and B Very few genes selected to arrive to a significant conclussion on GOs 1 and 2

A previous step of gene selection causes loss of information and makes the test insensitive statistic - + A B GO 1 GO 2 The main problem is that the two-steps approach cannot distinguish between these two different cases. We put both sides of the partition into two bags and destroy the structure of the data. t-test with two tails. p<0.05 Significantly over-expressed in B Significantly over-expressed in A up down GO no GO Same contingency table for GO 1 and GO 2 !!

FatiScan, a segmentation test, provides an easy approach to directly test functional terms statistic - + A B p1p1 p2p2 p3p3 GOs can be directly tested by a segmentation test. A series of partitions of the list are performed (p 1, p 2, p 3 …) and the GO terms for each functional class in the upper part are compared to the corresponding ones in the lower part by a Fisher test. Asymmetrical distributions of terms towards the extremes of the list will produce significant values of the test. Finally, p-values are adjusted by FDR Al-Shahrour et al., 2005 Bioinformatics E.g., term 2, partition p 1 up down GO no GO GO 1 GO 2 GO 3

Obtaining significant results statistic - + A B p1p1 p2p2 p3p3 For each GO term (T), different partitions (P) are tested. TxP p-values of tests to be adjusted for multiple testing. Empirical results suggest that 20 to 50 partitions optimally find significant asymmetrical distributions of terms Al-Shahrour et al., 2005 Bioinformatics term background

Nested inclusive analysis Levels from 9 up to 3 are tested. The deepest significant level is reported

% Genes with the specific GO annotation for each partition U U L - + L

Case study: functional differences in a class comparison experiment B 17 with normal tolerance to glucose (NTG) A 8 with impaired tolerance (IGT) + 18 with type 2 diabetes mellitus (DM2) A B No one single gene shows significant differential expression upon the application of a t-test Nevertheless, many pathways, and functional blocks are significantly activated/deactivated (Mootha et al., 2003)

Beyond discrete variables: Survival data Microarrays 34 samples from tumours of hypopharyngeal cancer (GEO GDS1070) Cox Proportional- Hazards model to study how the expression of each gene across patients is related to their survival Gen risk Gen1 5.8 Gen2 5’6 Gen3 5.4 Gen4 5.2 Gen5 5.2 Gen6 5.0 …… …. Gen Gen Survival + Survival GEPAS t-rex tool Since FatiScan depends only on a list of ordered genes, and not on the original experimental values, it can be applied to different experimental designs

Functional analysis of a time series in P. falciparum -Genes at each time point are ranked from highest (red) to lowest (green) relative expression with respect to time 1. - For each list of ranked genes generated in any time point, the significant over-represented GO terms in the tail corresponding to the highest expression values are recorded. -The partitions used to decide that a given term is significantly over-represented in the upper tail of the list with respect to the lower part are used for the graphical representation....

Beyond arrays: evolutionary systems biology 20,469 known Ensembl human protein-coding genes from the Ensembl v.30.35h were used Comparison of the relative rates of synonymous (Ks) and non- synonymous (Ka) substitutions. The ratio of these values, the (=Ka/Ks) is a widely accepted measure of the selective pressure Mutations occur on single genes but natural selection acts on phenotypes by operating on whole sub-cellular systems (represented by GO). We are interested in the human linage

Fig 6 GO termp-value sensory perception of smell (GO: ) 1.3 x sensory perception of chemical stimulus (GO: ) G-protein coupled receptor protein signalling pathway (GO: ) GO terms positively selected in humans FatiScan is applied to the list of human genes ordered according  values If genes positively selected are firstly detected and then analysed for significant enrichment of GO (two- steps approach), no results are found Log 

Comparison of methods for testing directly GO (or other terms) at a glance Terms from distinc repositories, reported by different methods in the diabetes dataset (Mootha et al., 2003) GSEA 2003 FatiScan 2005 PAGE 2005 Tian 2005

The babelomics suite for functional annotation of experiments Biological information from: GO Interpro motifs KEGG pathways Swissprot keywords Tissues Text-mining Chromosomal location For Human, mouse, rat, chicken, fly, worm, yeast, A. thaliana and bacteria Tests for lists of genes or blocks of functionally related genes

DNMAD Hierarchical SOM SOTAK-means Expresso Preprocessor Affymetrix arrays Two-colour arrays Clustering Class Prediction Raw data Differential expresion GEPAS Functional Annotation FatiGO+ FatiGO Marmite TMT FatiScan GSEA CAAT KNN DLDA SVM Random forest Normalization Prophet T-Rex Two classes Multi classes Correlation Survival Blocks of genes Two sets of genes ISACGH Babelomics Arrays-CGH RIDGE analysis Herrero et al., 2003, 2004; Vaquerizas et al., 2005 NAR; Montaner et al., 2006 NAR; Al-Shahrour et al., 2005, 2006 NAR; 2005 Bioinformatics

Some numbers More than 150,000 experiments analysed during the last year. More than 500 experiments per day. 24h usage map as of June 8, 2006

Summary Methods that directly address functional hipothesis are much more sensitive for the functional interpretation of any type of large-scale experiment. Methods that do not require of the original data (such as the FatiScan) can be applied in a wider range of experimental designs in microarrays (class comparison, survival, etc.) and can be applied to any large-scale experiment or theoretical study in which a value can be assigned to any gen that allows generating a list of ranked genes. Despite the differences in the tests, distinct functional interpretation methods seem to produce comparable results (although a more detailed benchmarking is necessary).

The bioinformatics department at Centro de Investigación Príncipe Felipe (Valencia, Spain)... Joaquín Dopazo Eva Alloza Leonardo Arbiza Fátima Al-Shahrour Jordi Burguet Lucía Conde Hernán Dopazo Toni Gabaldon Jaime Huerta Marc Martí Ignacio Medina Pablo Minguez David Montaner Joaquín Tárraga Juan Manuel Vaquerizas...and the INB, Instituto Nacional de Bioinformática (Functional Genomics Node)