Www.bioinformatics.ca NCRI Cancer Conference November 1, 2015.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Journal Club Jenny Gu October 24, Introduction Defining the subset of Superfamilies in LUCA Examine adaptability and expansion of particular superfamilies.
Statistical methods and tools for integrative analysis of perturbation signatures Mario Medvedovic Laboratory for Statistical Genomics and Systems Biology.
Asking translational research questions using ontology enrichment analysis Nigam Shah
Pathways analysis Iowa State Workshop 11 June 2009.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Gene Ontology John Pinney
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
Class Projects. Future Work and Possible Project Topic in Gene Regulatory network Learning from multiple data sources; Learning causality in Motifs; Learning.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
>>> Korean BioInformation Center >>> KRIBB Korea Research institute of Bioscience and Biotechnology GS2PATH: Linking Gene Ontology and Pathways Jin Ok.
1Module 2: Analyzing Gene Lists Canadian Bioinformatics Workshops
Ch10. Intermolecular Interactions and Biological Pathways
Gene Set Enrichment Analysis (GSEA)
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Bioinformatics Dr. Víctor Treviño BT4007
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Copyright OpenHelix. No use or reproduction without express written consent1.
Networks and Interactions Boo Virk v1.0.
Methodological Foundations of Biomedical Informatics (BMSC-GA 4449) Kelly Ruggles & David Fenyo.
CellFateScout step- by-step tutorial for a case study Version 0.94.
Copyright OpenHelix. No use or reproduction without express written consent1.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Tutorial session 3 Network analysis Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
COMPUTATIONAL ANALYSIS OF MULTILEVEL OMICS DATA FOR THE ELUCIDATION OF MOLECULAR MECHANISMS OF CANCER Presented by Azeez Ayomide Fatai Supervisor: Junaid.
Statistical Testing with Genes Saurabh Sinha CS 466.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
A curated database of biological pathways.
GO enrichment and GOrilla
CBioPortal Web resource for exploring, visualizing, and analyzing multidimentional cancer genomics data.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Module 5: Future 1 Canadian Bioinformatics Workshops
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
NCode TM miRNA Analysis Platform Identifies Differentially Expressed Novel miRNAs in Adenocarcinoma Using Clinical Human Samples Provided By BioServe.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Reactome pathway knowledgebase Connecting pathways, networks, and disease Robin Haw, PhD Project Manager and Outreach Coordinator Ontario Institute for.
Canadian Bioinformatics Workshops
Module 2: Analyzing gene lists: over-representation analysis
a Cytoscape plugin to assess enrichment of
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Pathway Analysis June 13, 2017.
GO : the Gene Ontology & Functional enrichment analysis
Statistical Testing with Genes
Canadian Bioinformatics Workshops
Department of Genetics • Stanford University School of Medicine
Overview Gene Ontology Introduction Biological network data
Pathway Visualization
Statistical Testing with Genes
Presentation transcript:

NCRI Cancer Conference November 1, 2015

2Module #: Title of Module

Module 2: Pathway and network analysis Irina Kalatskaya, PhD Ontario Institute for Cancer Research, Canada

NCRI Workshop 2015 bioinformatics.ca Content Introduction to pathway and network analysis in cancer genomics. Sources of pathway and network information: GO biological process, network databases, pathway databases. Overview of enrichment analysis to find over- represented pathways. Pathway analysis of large-scale cancer genomics data sets.

NCRI Workshop 2015 bioinformatics.ca Why Pathway Analysis? Dramatic data size reduction: 1000’s of genes => dozens of pathways; Increase statistical power by reducing multiple hypotheses; Genes seldom operate on it's own Find meaning in the “long tail” of rare cancer mutations; Generate biologically meaningful hypothesis and helps to identify the mechanism.

NCRI Workshop 2015 bioinformatics.ca What do we need for pathway analysis? List of altered genes, proteins, RNA, etc A source of pathways or networks (publicly or commercially available) Biological question/hypothesis !

NCRI Workshop 2015 bioinformatics.ca 1. Biological Question/Hypothesis What do you want to accomplish with your list (hopefully part of experiment design! ) – Summarize biological processes or other aspects of gene function; – Perform differential analysis – what pathways are different between samples, naïve/treated cell lines? – Find a controller for a process (TF, miRNA); – Find new pathways or new pathway members; – Discover new gene function; – Correlate with a disease, clinical data attributes.

NCRI Workshop 2015 bioinformatics.ca 2. Where Do Gene Lists Come From? From high-throughput studies: gene expression profiling, DNA/RNA sequencing, genome-wide association studies (GWAS), ChIP-Seq studies, etc; From public data portals like ICGC, TCGA, Cosmic, etc based on user’s search queries; From the manual and/or automated (PubTator) literature review (gene lists describing disease or condition). Other examples?

NCRI Workshop 2015 bioinformatics.ca Content Introduction to pathway and network analysis in cancer genomics. Sources of pathway and network information: a) GO biological process, b) pathway databases and c) network databases. Overview of enrichment analysis to find over- represented pathways. Pathway analysis of large-scale cancer genomics data sets.

NCRI Workshop 2015 bioinformatics.ca What is the Gene Ontology (GO)? Dictionary: term definitions Set of biological phrases (terms) which are applied to genes: like protein kinase, apoptosis, membrane; GO is not static!!!! All major eukaryotic model organism species are covered; The GO ontology files are freely available from the GO website What is ‘Ontology’? A data model that represents knowledge as a set of concepts within a domain and the relationships between these concepts

NCRI Workshop 2015 bioinformatics.ca What GO Covers? GO terms divided into three aspects: – cellular component – molecular function – biological process glucose-6-phosphate isomerase activity Cell division

NCRI Workshop 2015 bioinformatics.ca GO Structure Terms are related within a hierarchy – is-a – part-of Describes multiple levels of detail of gene function Terms can have more than one parent or child

NCRI Workshop 2015 bioinformatics.ca Pathway Databases Advantages: – Usually curated. – Biochemical view of biological processes. – Cause and effect captured. – Human-interpretable visualizations. Disadvantages: – Sparse coverage of genome. – Different databases disagree on boundaries of pathways.

NCRI Workshop 2015 bioinformatics.ca PATHWAY DATABASE EXAMPLE: Reactome Hand-curated pathways in human. Rigorous curation standards – every reaction traceable to primary literature. As October 2015, there are 1887 human pathways; 8609 human proteins (version 54). Open access.

NCRI Workshop 2015 bioinformatics.ca G1/S DNA damage checkpoint

NCRI Workshop 2015 bioinformatics.ca Pathways vs. Networks - Detailed, high-confidence consensus - Biochemical reactions - Small-scale, fewer genes - Concentrated from decades of literature - Simplified cellular logic, noisy - Abstractions: directed, undirected - Large-scale, genome-wide - Constructed from omics data integration

NCRI Workshop 2015 bioinformatics.ca Network Databases Can be built automatically or via curation. More extensive coverage of biological systems. Relationships and underlying evidence more tentative. Popular sources of curated networks: – BioGRID – Curated interactions from literature; 529,000 genes, 167,000 interactions. – InTact – Curated interactions from literature; 60,000 genes, 203,000 interactions. – MINT – Curated interactions from literature; 31,000 genes, 83,000 interactions. – Reactome FI network – Curated + machine learning, ~11,000 human genes, 180,000 interactions.

Reactome Functional Interaction (FI) Network ~5% of the network is shown

NCRI Workshop 2015 bioinformatics.ca Takeaway message: There are GO-, pathway- and network-based ways to analyze your gene list. DO ALL THREE!!!

NCRI Workshop 2015 bioinformatics.ca Content Introduction to pathway and network analysis in cancer genomics. Sources of pathway and network information: GO biological process, network databases, pathway databases. Overview of enrichment analysis to find over- represented pathways. Pathway analysis of large-scale cancer genomics data sets.

NCRI Workshop 2015 bioinformatics.ca Enrichment Test (ICGC portal)

NCRI Workshop 2015 bioinformatics.ca Enrichment Test (introduction) PATHWAYSP-value Cell cycle Apoptosis Microarray, RNA-seq, CNV, WES experiments (gene list) Gene-set (pathway) databases {Reactome, KEGG} ENRICHMENT TEST ENRICHMENT TEST Enrichment Table Background list (all genes test)

NCRI Workshop 2015 bioinformatics.ca Hypergeometrical test My gene list N = 1000 m = 100 n = 5 k = 3 Background list: 1000 genes of those 100 belong to EGFR- signaling Null hypothesis: list is a random sample from population Alternative hypothesis: more “pathway” genes than expected p-value =

NCRI Workshop 2015 bioinformatics.ca Hypergeometrical test (on-line) Online tools (just google “hypergeometrical test calculator”):

NCRI Workshop 2015 bioinformatics.ca Multiple test corrections Random draws 109,890 draws later p-value = 9.1e-6 Expect a random draw with observed enrichment once every 1 / P-value draws Background list: 1000 genes of those 100 belong to EGFR- signaling

NCRI Workshop 2015 bioinformatics.ca FDR vs Bonferroni correction FDR is the expected proportion of the observed enrichments due to random chance. Compare to Bonferroni correction which is a bound on the probability that any one of the observed enrichments could be due to random chance; Bonferroni correction is very stringent and can “wash away” real enrichments leading to false negatives.

NCRI Workshop 2015 bioinformatics.ca Takeaway message 2: Hypergeometrical test is a powerful statistical tool: use it (not only for the pathway analysis); Don’t forget multiple test correction: FDR or q-score should drive your decision (not p-value); Keep in mind N: number of genes/proteins in your total population. Might influence your final output.

NCRI Workshop 2015 bioinformatics.ca Content Introduction to pathway and network analysis in cancer genomics. Sources of pathway and network information: GO biological process, network databases, pathway databases. Overview of enrichment analysis to find over- represented pathways. Pathway analysis of large-scale cancer genomics data sets.

NCRI Workshop 2015 bioinformatics.ca Christina Yung

NCRI Workshop 2015 bioinformatics.ca Pathway/network analysis workflow overview Browse significant pathways in Reactome Run enrichment analysis using g-Profiler, ICGC, Reactome, gtools, etc Run enrichment analysis using g-Profiler, ICGC, Reactome, gtools, etc Browse significant pathways in Reactome Build protein interaction subnetwork Run clustering algorithm Run enrichment analysis of each module individually Run enrichment analysis of each module individually Drill down to understand molecular mechanism Validate your model (in wet lab) Reactome FI network cytoscape plugin: Reactome- FIViz

Module 1: Hedgehog, TGFβ signaling Module 2: p53 signaling Module 0: ERBB, FGFR, EGFR signaling, Axon guidance Module 4: Translation Module 7: ECM, focal adhesion, integrin signaling Module 3: Wnt & Cadherin signaling Module 6: Ca2+ signaling Module 5: Axon guidance Module 8: MHC class II antigen presentation Module 10: Spliceosome Module 9: Rho GTPase signaling Pancreatic cancer specific subnetwork:

NCRI Workshop 2015 bioinformatics.ca Takeaway message 3: Try different tools: gtool, g-profiler, GeneMania, Reactome FI network, etc Issue of non-relevant enriched pathways! If no significant pathways were detected (and all possible mistakes were excluded), please, don’t get disappointment. Maybe your pathway hasn’t been curated yet. All lectures on pathway- and network-based analysis are available here (free access): network-analysis-omic-data-2015