Exploring Pan-Cancer Network Relationships Between Somatic Changes and Expression Profiles with PACMEN Presented by Ms Shila Ghazanfar School of Mathematics.

Slides:



Advertisements
Similar presentations
Test-tube or keyboard? Computation in the life sciences.
Advertisements

Network inference from repeated observations of node sets Neil Clark, Avi Ma'ayan.
Molecular Systems Biology 3; Article number 140; doi: /msb
Data Visualization in Molecular Biology Alexander Lex July 29, 2013.
Zhen Shi June 2, 2010 Journal Club. Introduction Most disease-causing mutations are thought to confer radical changes to proteins (Wang and Moult, 2001;
MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
High-dimensional data analysis: Microarrays and multiple testing Mark van de Wiel 1,2 1. Dep. of Mathematics, VU University Amsterdam 2. Dep. of Biostatistics.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Thoughts on Biomarker Discovery and Validation Karla Ballman, Ph.D. Division of Biostatistics October 29, 2007.
Whole Genome Expression Analysis
Radiogenomics in glioblastoma multiforme
Gene Set Enrichment Analysis (GSEA)
Bioinformatics and medicine: Are we meeting the challenge?
Scenario 6 Distinguishing different types of leukemia to target treatment.
Agent-based methods for translational cancer multilevel modelling Sylvia Nagl PhD Cancer Systems Science & Biomedical Informatics UCL Cancer Institute.
The Use of Predictive Biomarkers in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
COMPUTATIONAL ANALYSIS OF MULTILEVEL OMICS DATA FOR THE ELUCIDATION OF MOLECULAR MECHANISMS OF CANCER Presented by Azeez Ayomide Fatai Supervisor: Junaid.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Statistical Testing with Genes Saurabh Sinha CS 466.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
CBioPortal Web resource for exploring, visualizing, and analyzing multidimentional cancer genomics data.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.
Advances and challenges in computational modeling and statistical learning of biological systems Qi Liu Department of Biomedical Informatics Vanderbilt.
Simultaneous identification of causal genes and dys-regulated pathways in complex diseases Yoo-Ah Kim, Stefan Wuchty and Teresa M Przytycka Paper to be.
INFERENCE FOR BIG DATA Mike Daniels The University of Texas at Austin Department of Statistics & Data Sciences Department of Integrative Biology.
Multi-scale network biology model & the model library 多尺度网络生物学模型 -- 兼论模型库的建立与应用 Jianghui Xiong 熊江辉
Seojin Bang. The goal of this review paper is.. To address problems and computational solutions that arise in analysis of omics data. To highlight fundamental.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
High-throughput genomic profiling of tumor-infiltrating leukocytes
San Antonio Breast Cancer Symposium – December 6-10, 2016
BME435 BIOINFORMATICS.
A graph-based integration of multiple layers of cancer genomics data (Progress Report) Do Kyoon Kim 1.
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
Effective Connectivity: Basics
Biological networks CS 5263 Bioinformatics.
Statistical Testing with Genes
Data challenges in the pharmaceutical industry
Pathway Visualization
Statistical Data Analysis
Functional Genomics Analysis Reveals a MYC Signature Associated with a Poor Clinical Prognosis in Liposarcomas  Dat Tran, Kundan Verma, Kristin Ward,
 The human genome contains approximately genes.  At any given moment, each of our cells has some combination of these genes turned on & others.
Ingenuity Knowledge Base
Richa Batra Jamboree meeting Dresden,
1 Department of Engineering, 2 Department of Mathematics,
Mutual exclusivity analysis identifies oncogenic
A Short Tutorial on Causal Network Modeling and Discovery
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Topic: Medicine of the future Reading: Harbron, Chris (2006)
The Impact of Network Medicine in Gastroenterology and Hepatology
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Statistical Data Analysis
Longitudinal Study of Recurrent Metastatic Melanoma Cell Lines Underscores the Individuality of Cancer Biology  Zoltan Pos, Tara L. Spivey, Hui Liu, Michele.
Volume 58, Issue 4, Pages (May 2015)
Pathway Visualization
Loyola Marymount University
Altered Caspase-8 Expression
Statistical Testing with Genes
Network-Based Coverage of Mutational Profiles Reveals Cancer Genes
Loyola Marymount University
Loyola Marymount University
Presentation transcript:

Exploring Pan-Cancer Network Relationships Between Somatic Changes and Expression Profiles with PACMEN Presented by Ms Shila Ghazanfar School of Mathematics and Statistics @shazanfar

Complex biomedical data and network analysis of cancers Exploring Pan-Cancer Network Relationships Between Somatic Changes and Expression Profiles Complex biomedical data and network analysis of cancers Mutation-expression networks PACMEN Shiny application for exploration The focus of this talk is on finding meaningful biological relationships between multiple datasets. In particular, I will describe two specific aspects of my thesis work, which looks at the relationships between gene expression data and mutation data, and a software tool that I have developed.   But before I do that, I will offer some motivation and context to the research area.

What does biomedical data look like? DNA RNA Protein Samples (n) Samples (n) Samples (n) Samples (n) Somatic Change Binary, continuous or integer Gene expression continuous measurements Protein cts ~ missing values Clinical continuous and factors Genes (p1) Proteins (p3) Variables (p4) Genes (p2) All these data can boil down to matrices. All with their own characteristics. How can we find meaningful biological relationships between these multiple datasets? What other experimental evidence can we use to achieve this?

Other experimental evidence: protein-protein interaction networks 6/05/2018 Other experimental evidence: protein-protein interaction networks BioGRID, Human Interactome (HI-II-14), Human Protein Reference Database (HPRD), iRefWeb, MetaCore Union PPI with 2237 nodes and 68832 edges Human Protein Reference Database Keshava Prasad et al. 2009 iRefWeb Turner et al. 2010 BioGRID Chatr-aryamontri et al. 2013 MetaCore From GeneGo Inc. …and independently curated information about human protein-protein interactions from sources such as HPRD, iRefWeb, BioGRID, and MetaCore. Additional Information: As I mentioned before, CSB is about integrating different levels of data. In my study, I integrated the gene expression data with large-scale protein-protein interaction networks: the fundamental tenet being that by doing so, the expression information is organised and contextualised in a global , biologically meaningful way. So, where does one find a comprehensive list of all the known, protein-protein interactions that occur in a one-step, direct binding manner in humans? Getting some traction on this part of the project took several months and here’s why. So, because these large scale molecular interaction networks typically include hundreds or thousands of molecules and tens of thousands of interactions and it can therefore be useful, for example to save computational time, to focus in on specific aspects of the network. In my case, I was interested in the most highly-connected proteins in the network: so-called ‘hubs’. A B edge node node Prognostic biomarkers in melanoma

Detailed clinical and pathological data More genetic information on the same patients. Available for multiple cancers Downloaded data for 4443 samples over 19 cancers Samples eligible for this study (n.79) were obtained from lymph node specimens in which macroscopic tumor was observed, obtained from patients believed to be without distant metastases at the time of tumor banking based on clinical examination and computerized axial tomographic scanning of the brain, chest, abdomen, and pelvis. Specimens were macrodissected at the time of banking and subsequently reviewed to meet minimum criteria for tumor cell content (480%) and amount of necrosis (o30%)

Existing network analysis Single Platform Multiple Platform Ideker et al (2002) Gene Expression Taylor et al (2009) Has focused only on gene expression, with networks. Finding significantly disturbed Mutation Ciriello et al (2009) Bashashati et al (2012)

Complex biomedical data and network analysis of cancers Exploring Pan-Cancer Network Relationships Between Somatic Changes and Expression Profiles Complex biomedical data and network analysis of cancers Mutation-expression networks PACMEN Shiny application for exploration The focus of this talk is on finding meaningful biological relationships between multiple datasets. In particular, I will describe two specific aspects of my thesis work, which looks at the relationships between gene expression data and mutation data, and a software tool that I have developed.   But before I do that, I will offer some motivation and context to the research area.

Our idea To incorporate information from gene expression and mutation in a more direct manner Look for potential differences due to specific mutated genes, via repeated partitioning Outcome: directed networks of mutated genes and differentially expressed genes

Mutation-expression network construction: Gene expression matrix Filtered mutation matrix samples samples i genes UPPI partners genes Now this is some of my latest work, that I will be presenting at an international bioinformatics conference in Jan. The idea is to to construct a directed network linking mutated genes to differentially expressed genes potentially leading to interesting connections. Among all these, how to deconvolute to identify most useful mutations? In particular, ones that appear to change some aspect of the biology in the cell. In our study we considered gene expression as a measure of this, and aimed to identify a network, made of genes that are mutated, and those whose gene expression differs between those mutated and nonmutated. UPPI i

Mutation-expression network construction: Gene expression matrix Filtered mutation matrix samples samples i genes UPPI partner genes Now this is some of my latest work, that I will be presenting at an international bioinformatics conference in Jan. The idea is to to construct a directed network linking mutated genes to differentially expressed genes potentially leading to interesting connections. Among all these, how to deconvolute to identify most useful mutations? In particular, ones that appear to change some aspect of the biology in the cell. In our study we considered gene expression as a measure of this, and aimed to identify a network, made of genes that are mutated, and those whose gene expression differs between those mutated and nonmutated. UPPI i Samples without gene i mutated Samples with gene i mutated Test for differential expression

Mutation-expression network construction: For each mutated gene j and UPPI partner k, the linear model is formed for sample i, where Mij is 1 if gene j is mutated in sample i, Yik is the gene expression of sample i and gene k, equal to αk for samples with no mutation in gene j and αk + βjk for samples with a mutation in gene j, and εi random noise. We are testing the hypothesis

Example: PTPRC in melanoma data down up PTPRC mutated in 32/325 SKCM In top 5% mutated Previously linked to T-cell acute lymphoblastic leukemia (COSMIC) SKCM dataset, focus on just one mutated gene (mutated in about 10% of samples) and look at their gene expression effects library(igraph) dat = matrix(c("PTPRC","JAK2","PTPRC","MAPK3","PTPRC","IFNAR1","PTPRC","FCGR3A"),ncol=2,byrow=TRUE) net = graph.edgelist(dat) E(net)$color = c("red","blue","red","red") V(net)$color = c("grey",rep("lightpink",4)) V(net)$size = 40 plot(net,layout=layout.davidson.harel)

Mutation-expression network considerations Parameter 1: What top percentile of mutated genes should be interrogated? Parameter 2: Do we include highly connected (but lowly mutated) genes to be interrogated? Parameter 3: Significance threshold (FDR). How much evidence do we observe of differential gene expression? As we know, with any statistical procedure or model, there are a number of parameters that can affect the results, and we need to understand how these can change the results.

Mutation-expression networks are enriched for cancer-related genes compared to randomly permuted networks Highlight one row The heatmap displays comparisons of the non-empty networks generated for each cancer against networks generated through randomly permuting the sample and gene labels. The top heatmap is for the HIPPIE network and bottom is for UPPI. Cancers are ordered in increasing overall mutation rate. Enrichment of genes among the NCG 4.0 [38], CGC [39] and DrugBank [40] databases, as well as the driver nodes identied through DriverNet algorithm are displayed, using all nodes in the directed network, the tail nodes and head nodes respectively. Cell colors correspond to empirical p- values, blue closer to 1 and white closer to 0, with empirical p-values less than 0.1 enumerated. 28 Overall mutation load

Results: pan-cancer observations More mutated genes shared among cancers than DE genes Lowly mutated ---- highly mutated

Complex biomedical data and network analysis of cancers Exploring Pan-Cancer Network Relationships Between Somatic Changes and Expression Profiles Complex biomedical data and network analysis of cancers Mutation-expression networks PACMEN Shiny application for exploration The focus of this talk is on finding meaningful biological relationships between multiple datasets. In particular, I will describe two specific aspects of my thesis work, which looks at the relationships between gene expression data and mutation data, and a software tool that I have developed.   But before I do that, I will offer some motivation and context to the research area.

shiny.maths.usyd.edu.au/PACMEN/

Acknowledgements Melanoma program at MIA/WMI/RPA Graham Mann 6/05/2018 Acknowledgements Melanoma program at MIA/WMI/RPA Graham Mann Sarah-Jane Schramm School of Mathematics and Statistics Statistical Bioinformatics Group Jean Yang John Ormerod Samuel Mueller Dario Strbenac Emi Tanaka Kevin Wang Sarah Romanes Mark Greenaway Weichang Yu This work wouldn’t be possible without a large team of interdisciplinary researchers working in a team. prof rs and his team provide the samples and assoc. clinic-path data Many people are involved with the generation of multiple sets of omics data Many collegue and students were involved with the anlaysis of different individual omics platform as it comes before integration. This highlights the power of interdisciplinary collaboration how statistics and statistical bioinformatics contributes to biomedical research. Reference: Ghazanfar, S., & Yang, J. Y. H. (2016). Characterizing mutation-expression network relationships in multiple cancers. Computational Biology and Chemistry. Prognostic biomarkers in melanoma

University of Sydney Statistical Bioinformatics Group Statistical Bioinformatics Group consists of senior and junior statisticians and bioinformaticians who share the interests in developing statistical and computational methodologies to study biology and complex diseases at the systems level. The group has joint strengths and expertise ranging from genomics, epigenomics, proteomics and beyond uniquely positions the individuals and the group as a whole to tackle the foremost significant challenges posed by modern biology and medicine. Meet our senior and junior research leaders: Our group also consists: Senior research associates: 2; PhD candidates: 8; Honours and TSP students: 5 Contact us: bioinformatics@maths.usyd.edu.au Jean Yang John Ormerod Samuel Müller Lamiae Azizi Pengyi Yang Uri Keich