Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.

Similar presentations


Presentation on theme: "Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels."— Presentation transcript:

1 Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels

2 Scenario You have a gene expression dataset containing data from normal colon and adenoma samples. - Which pathways are differentially regulated between normal and CRC samples? -Do products of significantly differently expressed genes have specific functions (Gene Ontology)? -Is there a significant overlap with published expression signatures (mutations, response to treatment,...)?

3 Overview Mapping probe sets to functional annotation Hypergeometric test (Fisher’s exact test) Gene Set Enrichment Analysis Global test

4 Mapping probe sets to functional annotation

5 Examples of functional annotation Pathway databases (e.g. KEGG, Pathway Interaction Database, ConsensusPathDB, www.pathguide.org/) Functional categories (e.g. Gene Ontology, FunCat) Enzyme Commission numbers, disease associations, protein domains, … Published gene signatures

6 Example KEGG pathway http://www.genome.jp/kegg/kegg2.html

7 Gene Ontology Collection of three separate ontologies: biological process, molecular function, cellular component Organized in a graph structure, i.e. each term (concept, category) can have several parents

8 Gene Ontology (II)

9 Gene Ontology (III) Annotations with GO terms are assigned an evidence code: G protein alpha subunit; GO:0060158 activation of phospholipase C …; ISS Different categories of evidence codes: experimental, computational, Author/Curator statement, fully automatic (IEA) Details at http://www.geneontology.org/GO.evidence.shtmlhttp://www.geneontology.org/GO.evidence.shtml

10 The true path rule If a gene product is annotated with term A, all annotations with ancestors of A must also be valid. Gene product annotated with this term  It can also be annotated with the term‘s ancestors Different gene products are usually not annotated on the same level of the hierarchy

11 Hands on Time

12 The hypergeometric test / Fisher’s exact test

13 Basics Enrichment test Analysis steps: 1.Single gene test (e.g. t-test for finding differentially expressed genes) 2.Do list (step 1) and gene sets overlap significantly? diff. Expressednot diff. expressed in gene set not in gene set

14 Example Microarray: 20000, MAPK: 100, diff. expressed: 200  Fisher‘s exact test p = 0.26 diff. Expressed not diff. expressed total MAPK298100 not MAPK1981970219900 total2001980020000

15 Example Microarray: 20000, MAPK: 100, diff. expressed: 200  Fisher‘s exact test p = 0.0005 diff. Expressed not diff. expressed total MAPK694100 not MAPK1941970619900 total2001980020000

16 Another Example Consider having data on treatment response and gene mutation for samples in a dataset ! Choose threshold for resistance/sensitivity ResistantSensitivetotal Mutated WT total

17 Problem with this approach Null hypothesis: Genes in the gene set are randomly drawn  Significant result means that genes in the gene set are more alike than random genes Problem: Gene set has been selected such that the genes have something in common  False positives

18 Hands on Time

19 PAGE: Parametric Analysis of Gene Set Enrichment

20 Basics For each gene set and each sample: –How different is the mean expression of all genes in a gene set from the overall mean expression? Applied to full expression matrix –No need for selecting interesting genes (based on e.g. t-test)

21 Basics

22 Problem with this approach What happens if one part of the pathway is up-regulated and the another part is down-regulated?

23 Hands on Time

24 The global test

25 Basics Group test Can the genes in the gene set predict the response? What is needed? –Clinical variablee.g. normal vs. CRC –Gene expressione.g. GSE8671 –Gene setse.g. KEGG pathways

26 Interpretation Interpretation of significant test result (w.r.t. genes): –Gene set is associated with clinical variable –“On average“ the genes in the set are associated with the clinical variable –Not every gene needs to be associated

27 Interpretation

28 Interpretation of significant test result (w.r.t. samples): –Expression profile in the gene set differs for different values of the clinical variable –Samples with similar value (clinical variable) have relatively similar expression profiles

29 Interpretation


Download ppt "Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels."

Similar presentations


Ads by Google