Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Gene Ontology John Pinney
Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
Using Gene Ontology Models and Tests Mark Reimers, NCI.
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
Gene Expression And Regulation Bioinformatics January 11, 2006 D. A. McClellan
10 Hypothesis Testing. 10 Hypothesis Testing Statistical hypothesis testing The expression level of a gene in a given condition is measured several.
Gene Expression Data Analyses (3)
ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Internet tools for genomic analysis: part 2
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Pathway Analysis. Goals Characterize biological meaning of joint changes in gene expression Organize expression (or other) changes into meaningful ‘chunks’
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
A Bioinformatics Meta-analysis of Differentially Expressed Genes in Colorectal Cancer Simon Chan, Thursday Trainee Seminar – October 11.
EnrichNet: network-based gene set enrichment analysis Presenter: Lu Liu.
Frédéric Schütz Statistics and bioinformatics applied to –omics technologies Part II: Integrating biological knowledge Center.
Gene Set Enrichment Analysis (GSEA)
Introduction To Biological Research. Step-by-step analysis of biological data The statistical analysis of a biological experiment may be broken down into.
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Networks and Interactions Boo Virk v1.0.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 6: Case Study.
Copyright OpenHelix. No use or reproduction without express written consent1.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Integrating Biology and Statistics: Gene Set Methods BIOS Winter/Spring 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
Analysis of GO annotation at cluster level by Agnieszka S. Juncker.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Statistical Testing with Genes Saurabh Sinha CS 466.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
An overview of Bioinformatics. Cell and Central Dogma.
Flat clustering approaches
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
The Broad Institute of MIT and Harvard Differential Analysis.
GO enrichment and GOrilla
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Canadian Bioinformatics Workshops
Nature as blueprint to design antibody factories Life Science Technologies Project course 2016 Aalto CHEM.
Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.
ARCH/VCDE F2F BoF And the Presentation Subtitle Goes Here Ravi Madduri December 2008.
1 A Discussion of False Discovery Rate and the Identification of Differentially Expressed Gene Categories in Microarray Studies Ames, Iowa August 8, 2007.
David Amar, Tom Hait, and Ron Shamir
Clustering Manpreet S. Katari.
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
Statistical Testing with Genes
Gene-set analysis Danielle Posthuma & Christiaan de Leeuw
Overview Gene Ontology Introduction Biological network data
Statistical Testing with Genes
Functional classification and visualization of differentially expressed genes. Functional classification and visualization of differentially expressed.
Identification of acetylated peptides and proteins by LC-MS/MS.
Presentation transcript:

Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels

Scenario You have a gene expression dataset containing data from normal colon and adenoma samples. - Which pathways are differentially regulated between normal and CRC samples? -Do products of significantly differently expressed genes have specific functions (Gene Ontology)? -Is there a significant overlap with published expression signatures (mutations, response to treatment,...)?

Overview Mapping probe sets to functional annotation Hypergeometric test (Fisher’s exact test) Gene Set Enrichment Analysis Global test

Mapping probe sets to functional annotation

Examples of functional annotation Pathway databases (e.g. KEGG, Pathway Interaction Database, ConsensusPathDB, Functional categories (e.g. Gene Ontology, FunCat) Enzyme Commission numbers, disease associations, protein domains, … Published gene signatures

Example KEGG pathway

Gene Ontology Collection of three separate ontologies: biological process, molecular function, cellular component Organized in a graph structure, i.e. each term (concept, category) can have several parents

Gene Ontology (II)

Gene Ontology (III) Annotations with GO terms are assigned an evidence code: G protein alpha subunit; GO: activation of phospholipase C …; ISS Different categories of evidence codes: experimental, computational, Author/Curator statement, fully automatic (IEA) Details at

The true path rule If a gene product is annotated with term A, all annotations with ancestors of A must also be valid. Gene product annotated with this term  It can also be annotated with the term‘s ancestors Different gene products are usually not annotated on the same level of the hierarchy

Hands on Time

The hypergeometric test / Fisher’s exact test

Basics Enrichment test Analysis steps: 1.Single gene test (e.g. t-test for finding differentially expressed genes) 2.Do list (step 1) and gene sets overlap significantly? diff. Expressednot diff. expressed in gene set not in gene set

Example Microarray: 20000, MAPK: 100, diff. expressed: 200  Fisher‘s exact test p = 0.26 diff. Expressed not diff. expressed total MAPK not MAPK total

Example Microarray: 20000, MAPK: 100, diff. expressed: 200  Fisher‘s exact test p = diff. Expressed not diff. expressed total MAPK not MAPK total

Another Example Consider having data on treatment response and gene mutation for samples in a dataset ! Choose threshold for resistance/sensitivity ResistantSensitivetotal Mutated WT total

Problem with this approach Null hypothesis: Genes in the gene set are randomly drawn  Significant result means that genes in the gene set are more alike than random genes Problem: Gene set has been selected such that the genes have something in common  False positives

Hands on Time

PAGE: Parametric Analysis of Gene Set Enrichment

Basics For each gene set and each sample: –How different is the mean expression of all genes in a gene set from the overall mean expression? Applied to full expression matrix –No need for selecting interesting genes (based on e.g. t-test)

Basics

Problem with this approach What happens if one part of the pathway is up-regulated and the another part is down-regulated?

Hands on Time

The global test

Basics Group test Can the genes in the gene set predict the response? What is needed? –Clinical variablee.g. normal vs. CRC –Gene expressione.g. GSE8671 –Gene setse.g. KEGG pathways

Interpretation Interpretation of significant test result (w.r.t. genes): –Gene set is associated with clinical variable –“On average“ the genes in the set are associated with the clinical variable –Not every gene needs to be associated

Interpretation

Interpretation of significant test result (w.r.t. samples): –Expression profile in the gene set differs for different values of the clinical variable –Samples with similar value (clinical variable) have relatively similar expression profiles

Interpretation