Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Asking translational research questions using ontology enrichment analysis Nigam Shah
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
Pathways analysis Iowa State Workshop 11 June 2009.
Gene Ontology John Pinney
Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Using Gene Ontology Models and Tests Mark Reimers, NCI.
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
Pathway analysis Daniel Hurley Pathway analysis: summary A popular buzzword… but what does it mean? A popular buzzword… but what does it mean? How do.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Introduction The goal of translational bioinformatics is to enable the transformation of increasingly voluminous genomic and biological data into diagnostics.
>>> Korean BioInformation Center >>> KRIBB Korea Research institute of Bioscience and Biotechnology GS2PATH: Linking Gene Ontology and Pathways Jin Ok.
1Module 2: Analyzing Gene Lists Canadian Bioinformatics Workshops
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit.
Automatic methods for functional annotation of sequences Petri Törönen.
1 Identifying differentially expressed sets of genes in microarray experiments Lecture 23, Statistics 246, April 15, 2004.
Gene Set Enrichment Analysis (GSEA)
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Networks and Interactions Boo Virk v1.0.
Fission Yeast Computing Workshop -1- Searching, querying, browsing downloading and analysing data using PomBase Basic PomBase Features Gene Page Overview.
Gene expression analysis
Copyright OpenHelix. No use or reproduction without express written consent1.
UBio Training Courses Micro-RNA web tools Gonzalo
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
Analysis of GO annotation at cluster level by Agnieszka S. Juncker.
GUI GoMiner and High-Throughput GoMiner Analysis of Alternative Splice Variants Barry Zeeberg, Ari Kahn, Michael Ryan, David Kane, Curtis Jamison, Hongfang.
Statistical Testing with Genes Saurabh Sinha CS 466.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
The Broad Institute of MIT and Harvard Differential Analysis.
GO enrichment and GOrilla
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Nature as blueprint to design antibody factories Life Science Technologies Project course 2016 Aalto CHEM.
Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.
1 A Discussion of False Discovery Rate and the Identification of Differentially Expressed Gene Categories in Microarray Studies Ames, Iowa August 8, 2007.
Gene Set Enrichment Analysis. GSEA: Key Features Ranks all genes on array based on their differential expression Identifies gene sets whose member genes.
Canadian Bioinformatics Workshops
Module 2: Analyzing gene lists: over-representation analysis
a Cytoscape plugin to assess enrichment of
Statistical Testing with Genes
Genesets and Enrichment
Gene expression analysis
MAPPFinder and You: An Introductory Presentation
Statistical Testing with Genes
Presentation transcript:

Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010

Overview Interpreting microarray results –Gene lists to biological knowledge The Gene Ontology Consortium –Defined terms to describe gene function Functional analysis tools –Methods –DAVID/GSEA

Microarray Pipeline Design and perform experiment Process and normalise data Statistical analysis Differentially expressed genes Biological interpretation

Biological Interpretation An obvious way to gain biological insight is to assess the differentially expressed genes in terms of their known function(s) Required an automated and objective (statistical) approach Functional profiling or pathway analysis

Early functional analyses Manually annotate list of differentially expressed (DE) genes Extremely time-consuming, not systematic, user- dependent Group together genes with similar function Conclude functional categories with most DE genes important in disease/condition under study BUT may not be the right conclusion

GO and functional analysis Immune response category contains 40% of all significant genes - by far the largest category. Reasonable to conclude that immune response may be important in the condition being studied?

However …. What if 40% of the genes on the array were involved in immune response? Only detected as many significant immune response genes as expected by chance Need to consider not only the number of significant genes for each category, but also total number on the array

Same example, relative to array Expected number of significant genes for category X = (num sig genes ÷ total genes on array)*(num genes in category X on array)

Same example, relative to array Now, transcription and neurotransmission categories appear more interesting as many more significant genes were observed than expected by chance Largest categories are not necessarily the most interesting!

Major bioinformatic developments Requires annotating entire set of genes The Gene Ontology Consortium ( Automated, statistical approaches for annotating gene lists and performing functional profiling

The Gene Ontology Consortium

GO Consortium Developed three structured and controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species- independent manner Has become a major resource for microarray data interpretation

The Gene Ontology Molecular Function: basic activity or task Biological Process: broad objective or goal Cellular Component: location or complex

The Gene Ontology Molecular Function: basic activity or task –e.g. catalytic activity, calcium ion binding Biological Process: broad objective or goal –e.g. signal transduction, immune response Cellular Component: location or complex –e.g. nucleus, mitochondrion

GO Structure Hierarchical tree Annotated with most specific annotation, forming path to top of tree Genes annotated with all relevant terms Annotations based on published studies and also electronic inferences

GO Terms GO ID: GO: GO term: synaptic transmission Ontology: biological process Definition: The process of communication from a neuron to a target (neuron, muscle, or secretory cell) across a synapse

Graphical view

Functional Profiling Tools

Functional profiling tools Identify GO categories with significantly more DE genes than expected by chance (i.e. over- represented among DE genes relative to representation on array as a whole) Correct for testing multiple GO categories Hypergeometric Distribution or Fisher’s Exact Test

Khatri and Draghici. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics (2005) 21(18): Functional profiling tools

Freely-available stand-alone/web-based tools –User-friendly graphical interface and simple to use –Extensive documentation, plus tutorials/technical support Reduces a large number of DE genes to a smaller number of significantly enriched GO categories –more easily interpreted in biological context Considering sets of genes increases power –individual genes could be false positives but a set of functionally related genes all showing significant changes is more robust

DAVID Results

Advantages Increasingly support data (probe IDs) from different microarray platforms Accept various probe/gene identifiers Web-based tools automatically retrieve most up-to- date GO annotations Most automatically map from probe IDs to a gene ID - multiple significant probes for one gene could otherwise skew results

Further considerations Reference list must be appropriate for accurate statistical analysis Up/down regulated genes can be submitted separately or as a combined list Unannotated genes cannot be used in the analysis; gene ontology evolving; well-studied systems over-represented

Gene set enrichment analysis Majority of tools based on idea of identifying GO categories significantly enriched in list of differentially expressed genes Requires some threshold to define genes as ‘significant’ Recent tool called GSEA takes a different approach by considering all assayed genes

GSEA: Key Features Ranks all genes on array based on their differential expression Identifies gene sets whose member genes are clustered either towards top or bottom of the ranked list (i.e. up- or down regulated) Enrichment score calculated for each category Permutation test to identify significantly enriched categories Extensive gene sets provided via MolSig DB – GO, chromosome location, KEGG pathways, transcription factor or microRNA target genes

GSEA Each gene category tested by traversing ranked list Enrichment score starts at 0, weighted increment when a member gene encountered, weighted decrement otherwise Enrichment score – point where most different from zero Most significantly up-regulated genes Unchanged genes Most significantly down-regulated genes Disease Control

GSEA algorithm

Null distribution of enrichment scores Actual ES GSEA: Permutation Test Randomise data (groups), rank genes again and repeat test 1000 times Null distribution of 1000 ES for geneset FDR q-value computed – corrected for gene set size and testing multiple gene sets

Biological Interpretation Due to GO hierarchy, several related categories may contain a subset of genes that is driving the significant enrichment score so will all be significant Interpretation still requires substantial work –search literature and public databases –likely functional consequences of the changes –are the genes identified as significant within each GO category up- or down-regulated? –genes within a category can have opposite effects e.g. apoptosis would include genes that induce or repress apoptosis

Biological Interpretation Too many categories found significant –Size filter –More stringent significance threshold –Related categories (redundancy) No significant categories –Relax significance level slightly –e.g recommended by GSEA as exploratory analysis No significant genes –GSEA most suitable

Commercial Tool Suites Ingenuity Pathway Analysis (Ingenuity Systems, CA) –Developed own extensive ontology over past 10 years –Includes gene interactions, disease/drug information –PhD-level curators mining the literature –Used by many pharmaceutical companies

For more information Gene Ontology: Affymetrix: DAVID: GSEA: Ingenuity: ml ml NCBI: