Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploring Pan-Cancer Network Relationships Between Somatic Changes and Expression Profiles with PACMEN Presented by Ms Shila Ghazanfar School of Mathematics.

Similar presentations


Presentation on theme: "Exploring Pan-Cancer Network Relationships Between Somatic Changes and Expression Profiles with PACMEN Presented by Ms Shila Ghazanfar School of Mathematics."— Presentation transcript:

1 Exploring Pan-Cancer Network Relationships Between Somatic Changes and Expression Profiles with PACMEN Presented by Ms Shila Ghazanfar School of Mathematics and Statistics @shazanfar

2 Complex biomedical data and network analysis of cancers
Exploring Pan-Cancer Network Relationships Between Somatic Changes and Expression Profiles Complex biomedical data and network analysis of cancers Mutation-expression networks PACMEN Shiny application for exploration The focus of this talk is on finding meaningful biological relationships between multiple datasets. In particular, I will describe two specific aspects of my thesis work, which looks at the relationships between gene expression data and mutation data, and a software tool that I have developed. But before I do that, I will offer some motivation and context to the research area.

3 What does biomedical data look like?
DNA RNA Protein Samples (n) Samples (n) Samples (n) Samples (n) Somatic Change Binary, continuous or integer Gene expression continuous measurements Protein cts ~ missing values Clinical continuous and factors Genes (p1) Proteins (p3) Variables (p4) Genes (p2) All these data can boil down to matrices. All with their own characteristics. How can we find meaningful biological relationships between these multiple datasets? What other experimental evidence can we use to achieve this?

4 Other experimental evidence: protein-protein interaction networks
6/05/2018 Other experimental evidence: protein-protein interaction networks BioGRID, Human Interactome (HI-II-14), Human Protein Reference Database (HPRD), iRefWeb, MetaCore Union PPI with 2237 nodes and edges Human Protein Reference Database Keshava Prasad et al. 2009 iRefWeb Turner et al. 2010 BioGRID Chatr-aryamontri et al. 2013 MetaCore From GeneGo Inc. …and independently curated information about human protein-protein interactions from sources such as HPRD, iRefWeb, BioGRID, and MetaCore. Additional Information: As I mentioned before, CSB is about integrating different levels of data. In my study, I integrated the gene expression data with large-scale protein-protein interaction networks: the fundamental tenet being that by doing so, the expression information is organised and contextualised in a global , biologically meaningful way. So, where does one find a comprehensive list of all the known, protein-protein interactions that occur in a one-step, direct binding manner in humans? Getting some traction on this part of the project took several months and here’s why. So, because these large scale molecular interaction networks typically include hundreds or thousands of molecules and tens of thousands of interactions and it can therefore be useful, for example to save computational time, to focus in on specific aspects of the network. In my case, I was interested in the most highly-connected proteins in the network: so-called ‘hubs’. A B edge node node Prognostic biomarkers in melanoma

5 Detailed clinical and pathological data
More genetic information on the same patients. Available for multiple cancers Downloaded data for 4443 samples over 19 cancers Samples eligible for this study (n.79) were obtained from lymph node specimens in which macroscopic tumor was observed, obtained from patients believed to be without distant metastases at the time of tumor banking based on clinical examination and computerized axial tomographic scanning of the brain, chest, abdomen, and pelvis. Specimens were macrodissected at the time of banking and subsequently reviewed to meet minimum criteria for tumor cell content (480%) and amount of necrosis (o30%)

6 Existing network analysis
Single Platform Multiple Platform Ideker et al (2002) Gene Expression Taylor et al (2009) Has focused only on gene expression, with networks. Finding significantly disturbed Mutation Ciriello et al (2009) Bashashati et al (2012)

7 Complex biomedical data and network analysis of cancers
Exploring Pan-Cancer Network Relationships Between Somatic Changes and Expression Profiles Complex biomedical data and network analysis of cancers Mutation-expression networks PACMEN Shiny application for exploration The focus of this talk is on finding meaningful biological relationships between multiple datasets. In particular, I will describe two specific aspects of my thesis work, which looks at the relationships between gene expression data and mutation data, and a software tool that I have developed. But before I do that, I will offer some motivation and context to the research area.

8 Our idea To incorporate information from gene expression and mutation in a more direct manner Look for potential differences due to specific mutated genes, via repeated partitioning Outcome: directed networks of mutated genes and differentially expressed genes

9 Mutation-expression network construction:
Gene expression matrix Filtered mutation matrix samples samples i genes UPPI partners genes Now this is some of my latest work, that I will be presenting at an international bioinformatics conference in Jan. The idea is to to construct a directed network linking mutated genes to differentially expressed genes potentially leading to interesting connections. Among all these, how to deconvolute to identify most useful mutations? In particular, ones that appear to change some aspect of the biology in the cell. In our study we considered gene expression as a measure of this, and aimed to identify a network, made of genes that are mutated, and those whose gene expression differs between those mutated and nonmutated. UPPI i

10 Mutation-expression network construction:
Gene expression matrix Filtered mutation matrix samples samples i genes UPPI partner genes Now this is some of my latest work, that I will be presenting at an international bioinformatics conference in Jan. The idea is to to construct a directed network linking mutated genes to differentially expressed genes potentially leading to interesting connections. Among all these, how to deconvolute to identify most useful mutations? In particular, ones that appear to change some aspect of the biology in the cell. In our study we considered gene expression as a measure of this, and aimed to identify a network, made of genes that are mutated, and those whose gene expression differs between those mutated and nonmutated. UPPI i Samples without gene i mutated Samples with gene i mutated Test for differential expression

11 Mutation-expression network construction:
For each mutated gene j and UPPI partner k, the linear model is formed for sample i, where Mij is 1 if gene j is mutated in sample i, Yik is the gene expression of sample i and gene k, equal to αk for samples with no mutation in gene j and αk + βjk for samples with a mutation in gene j, and εi random noise. We are testing the hypothesis

12 Example: PTPRC in melanoma data
down up PTPRC mutated in 32/325 SKCM In top 5% mutated Previously linked to T-cell acute lymphoblastic leukemia (COSMIC) SKCM dataset, focus on just one mutated gene (mutated in about 10% of samples) and look at their gene expression effects library(igraph) dat = matrix(c("PTPRC","JAK2","PTPRC","MAPK3","PTPRC","IFNAR1","PTPRC","FCGR3A"),ncol=2,byrow=TRUE) net = graph.edgelist(dat) E(net)$color = c("red","blue","red","red") V(net)$color = c("grey",rep("lightpink",4)) V(net)$size = 40 plot(net,layout=layout.davidson.harel)

13 Mutation-expression network considerations
Parameter 1: What top percentile of mutated genes should be interrogated? Parameter 2: Do we include highly connected (but lowly mutated) genes to be interrogated? Parameter 3: Significance threshold (FDR). How much evidence do we observe of differential gene expression? As we know, with any statistical procedure or model, there are a number of parameters that can affect the results, and we need to understand how these can change the results.

14 Mutation-expression networks are enriched for cancer-related genes compared to randomly permuted networks Highlight one row The heatmap displays comparisons of the non-empty networks generated for each cancer against networks generated through randomly permuting the sample and gene labels. The top heatmap is for the HIPPIE network and bottom is for UPPI. Cancers are ordered in increasing overall mutation rate. Enrichment of genes among the NCG 4.0 [38], CGC [39] and DrugBank [40] databases, as well as the driver nodes identied through DriverNet algorithm are displayed, using all nodes in the directed network, the tail nodes and head nodes respectively. Cell colors correspond to empirical p- values, blue closer to 1 and white closer to 0, with empirical p-values less than 0.1 enumerated. 28 Overall mutation load

15 Results: pan-cancer observations
More mutated genes shared among cancers than DE genes Lowly mutated ---- highly mutated

16 Complex biomedical data and network analysis of cancers
Exploring Pan-Cancer Network Relationships Between Somatic Changes and Expression Profiles Complex biomedical data and network analysis of cancers Mutation-expression networks PACMEN Shiny application for exploration The focus of this talk is on finding meaningful biological relationships between multiple datasets. In particular, I will describe two specific aspects of my thesis work, which looks at the relationships between gene expression data and mutation data, and a software tool that I have developed. But before I do that, I will offer some motivation and context to the research area.

17 shiny.maths.usyd.edu.au/PACMEN/

18 Acknowledgements Melanoma program at MIA/WMI/RPA Graham Mann
6/05/2018 Acknowledgements Melanoma program at MIA/WMI/RPA Graham Mann Sarah-Jane Schramm School of Mathematics and Statistics Statistical Bioinformatics Group Jean Yang John Ormerod Samuel Mueller Dario Strbenac Emi Tanaka Kevin Wang Sarah Romanes Mark Greenaway Weichang Yu This work wouldn’t be possible without a large team of interdisciplinary researchers working in a team. prof rs and his team provide the samples and assoc. clinic-path data Many people are involved with the generation of multiple sets of omics data Many collegue and students were involved with the anlaysis of different individual omics platform as it comes before integration. This highlights the power of interdisciplinary collaboration how statistics and statistical bioinformatics contributes to biomedical research. Reference: Ghazanfar, S., & Yang, J. Y. H. (2016). Characterizing mutation-expression network relationships in multiple cancers. Computational Biology and Chemistry. Prognostic biomarkers in melanoma

19 University of Sydney Statistical Bioinformatics Group
Statistical Bioinformatics Group consists of senior and junior statisticians and bioinformaticians who share the interests in developing statistical and computational methodologies to study biology and complex diseases at the systems level. The group has joint strengths and expertise ranging from genomics, epigenomics, proteomics and beyond uniquely positions the individuals and the group as a whole to tackle the foremost significant challenges posed by modern biology and medicine. Meet our senior and junior research leaders: Our group also consists: Senior research associates: 2; PhD candidates: 8; Honours and TSP students: 5 Contact us: Jean Yang John Ormerod Samuel Müller Lamiae Azizi Pengyi Yang Uri Keich

20


Download ppt "Exploring Pan-Cancer Network Relationships Between Somatic Changes and Expression Profiles with PACMEN Presented by Ms Shila Ghazanfar School of Mathematics."

Similar presentations


Ads by Google