GraDe-SVM: Graph-Diffused Classification for the Analysis of Somatic Mutations in Cancer Morteza H.Chalabi, Fabio Vandin mchalabi@imada.sdu.dk Hello.

Slides:



Advertisements
Similar presentations
(SubLoc) Support vector machine approach for protein subcelluar localization prediction (SubLoc) Kim Hye Jin Intelligent Multimedia Lab
Advertisements

Control Case Common Always active
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Discovery Challenge Gene expression datasets On behalf of Olivier Gandrillon.
Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
Network-based stratification of tumor mutations Matan Hofree.
Classification: Support Vector Machine 10/10/07. What hyperplane (line) can separate the two classes of data?
Supplementary Figure 1. Somatic mutation spectrum # Substitutions # Substitutions per Mb b c a Repeats Pseudogenes Whole genome Splice sites Non-coding.
Classification of multiple cancer types by multicategory support vector machines using gene expression data.
Whole Genome Expression Analysis
Evaluation of Supervised Learning Algorithms on Gene Expression Data CSCI 6505 – Machine Learning Adan Cosgaya Winter 2006 Dalhousie University.
Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.
From Genomic Sequence Data to Genotype: A Proposed Machine Learning Approach for Genotyping Hepatitis C Virus Genaro Hernandez Jr CMSC 601 Spring 2011.
Integration II Prediction. Kernel-based data integration SVMs and the kernel “trick” Multiple-kernel learning Applications – Protein function prediction.
Genetics-multistep tumorigenesis genomic integrity & cancer Sections from Weinberg’s ‘the biology of Cancer’ Cancer genetics and genomics Selected.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
+ Get Rich and Cure Cancer with Support Vector Machines (Your Summer Projects)
COMPUTATIONAL ANALYSIS OF MULTILEVEL OMICS DATA FOR THE ELUCIDATION OF MOLECULAR MECHANISMS OF CANCER Presented by Azeez Ayomide Fatai Supervisor: Junaid.
Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci
Consensus Group Stable Feature Selection
A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt,
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
(1) Genotype-Tissue Expression (GTEx) Largest systematic study of genetic regulation in multiple tissues to date 53 tissues, 500+ donors, 9K samples, 180M.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Seojin Bang. The goal of this review paper is.. To address problems and computational solutions that arise in analysis of omics data. To highlight fundamental.
A comparison of somatic mutation callers in breast cancer samples and matched blood samples THOMAS BRETONNET BIOINFORMATICS AND COMPUTATIONAL BIOLOGY UNIT.
CtDNA NGS testing identified a high-level MET amplification (copy number of 53.6 in circulation) (Figure 1A). The test was repeated on a second tube of.
The ALS Online Database ALSoD
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
David Amar, Tom Hait, and Ron Shamir
Sungkyunkwan University, School of Medicine.
Nucleotide variation in the human genome
A graph-based integration of multiple layers of cancer genomics data (Progress Report) Do Kyoon Kim 1.
Disease risk prediction
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
CSCI2950-C Genomes, Networks, and Cancer
An Artificial Intelligence Approach to Precision Oncology
Optimizing Biological Data Integration
CLASSIFICATION OF TUMOR HISTOPATHOLOGY VIA SPARSE FEATURE LEARNING Nandita M. Nayak1, Hang Chang1, Alexander Borowsky2, Paul Spellman3 and Bahram Parvin1.
Gene expression.
 The human genome contains approximately genes.  At any given moment, each of our cells has some combination of these genes turned on & others.
Avdesh Mishra, Manisha Panta, Md Tamjidul Hoque, Joel Atallah
Claudio Lottaz and Rainer Spang
University of California at San Diego
Discrete Kernels.
Content and Labeling of Tests Marketed as Clinical “Whole-Exome Sequencing” Perspectives from a cancer genetics clinician and clinical lab director Allen.
Learning with information of features
I. TOPICS WE INTEND TO COVER
Mutational burden of somatic, protein-altering mutations per subject from WES for patients with advanced colon cancer who participated in PD-1 blockade.
Schedule for the Afternoon
AHED Automatic Human Emotion Detection
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee
Volume 5, Issue 6, Pages e3 (December 2017)
Xin Qi, Matthew Keally, Gang Zhou, Yantao Li, Zhen Ren
Altered Caspase-8 Expression
Session 3: Coverage and Reimbursement for Genetic Testing
Evaluating Classifiers for Disease Gene Discovery
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
A, unsupervised hierarchical clustering of the expression of probe sets differentially expressed in the oral mucosa of smokers versus never smokers. A,
Claudio Lottaz and Rainer Spang
Germline variants influencing primary tumor type.
Molecular characterization of esophagogastric tumors.
Presenter: Donovan Orn
Presentation transcript:

GraDe-SVM: Graph-Diffused Classification for the Analysis of Somatic Mutations in Cancer Morteza H.Chalabi, Fabio Vandin mchalabi@imada.sdu.dk Hello everyone My name is Morteza Chalabi from the university of southern Denmark I am going to talk about cancer types classification using a new form of support vector machines which we named it as GraDe-SVM

Challenge Motivation Problem Given a feature (=gene mutations) vector, how to predict cancer type? Make these data applicable to clinical and therapeutic purposes What would be the cancer type, if we knew mutations in cancer genome? Promising for circulating tumor cells/DNAs in liquid biopsies Recent advances in next generation sequencing data have allowed the collection of somatic mutations from a large number of patients from several cancer types <Motivation> -recent advances -TCGA & ICGC --------------- <challenge> -make data applicable clinical and therapeutic -the natural question is then if mutations in cancer genome were known, what would be the cancer type? _answering this question would be useful for CTCs in liquid biopsies _I suggest reading this MIT review article ...

Related Work Similar projects mainly have used gene expression and/or for a limited number of cancer types Asgharzadeh S. et al, J. Natl. Cancer Inst., 2006; Herschkowitz J. et al, Genome Biol., 2007; Hwang T. et al, Proc 8th IEEE Int. Conf. Data Mining, 2008; Lee E.S. et al, Cancer Res., 2008; Paik S., J. Clin. Oncol., 2006; Pawitan Y. et al, Breast Cancer Res., 2005, etc. Lavi, O. et al., 2012, Journal of Computational Biology Network-Induced Classification Kernels for Gene Expression Profile Analysis -say and many more -Lavi et al use heuristic ways to incorporate network into SVM -this work is worth mentioning -say why net interactions improve classification, concept of network, SVM

Our Contribution New method integrating network local topology into classification (SVM) local network topology: captured by diffusion process We tested GraDe-SVM on somatic mutation sequence data copy number variation (CNV) & single nucleotide variation (SNV) from 3450 samples 11 cancer types from The Cancer Genome Atlas

GraDe-SVM Taking network topology into account idea: genes with similar function should have similar weights in SVM similar function (= interactions on a network) is captured by diffusion process (used in HotNet21) not only by direct interactions strategy I transforming input feature/attribute vectors using diffusion process (random walk) strategy II regularizing SVM optimization problem using diffusion process capturing interactions (random walk matrix) NICK2: a similar approach capturing immediate interactions (adjacency matrix) _ strategy I: say feature vector is mapped on the network and gets diffused over it _the blinking red rectangle is the regularization part A typical feature vector (FV): 𝑔 0 𝑔 1 𝑔 2 𝑔 3 … 𝑔 𝑛−1 𝑔 𝑛 0 0 1 1 … 0 1 FV: 𝑔 0 𝑔 1 𝑔 2 𝑔 3 … 𝑔 𝑛 0 1.4 0.04 0.5…1 Diffusion Process Map on Network 1: Leiserson, M. D. M., Vandin, F., et al, 2015, Nature Genetics 2: Lavi, O., et al, 2012, Journal of Computational Biology

Future Work & Conclusion Results Future Work & Conclusion Future: There are many directions esp. considering non-coding region variations: Intron, IGR how to find a small set of genes returning acceptable performance Conclusion we introduced GraDe-SVM to capture local network topology tested on real data, we achieved higher accuracy GraDe-SVM was evaluated and tested on a cohort of 3424 cancer samples from 11 cancer types from The Cancer Genome Atlas (TCGA) using both single nucleotide variants (SNVs) and copy number variants (CNVs) 9786 genes in the network (HINT+HI20121) 10-fold cross validation Results Improved classification of cancer types vs. no network or network but no diffusion process finds a number of known driver genes & genes with mutations distinguishing cancer types <Results> _talk about accuracy reduction using 269 and 18985 genesets ---------------- <Future> _currently may not be possible to measure all mutations in gnome, it’s important to find a small set of genes returning good performance 1: Leiserson, M. D. M., Vandin, F., et al, 2015, Nature Genetics