An Artificial Intelligence Approach to Precision Oncology

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.
A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I.
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
WRSTA, 13 August, 2006 Rough Sets in Hybrid Intelligent Systems For Breast Cancer Detection By Aboul Ella Hassanien Cairo University, Faculty of Computer.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
A REVIEW OF FEATURE SELECTION METHODS WITH APPLICATIONS Alan Jović, Karla Brkić, Nikola Bogunović {alan.jovic, karla.brkic,
Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Journal Club Meeting Sept 13, 2010 Tejaswini Narayanan.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
NTNU Speech and Machine Intelligence Laboratory 1 Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models 2016/05/31.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Introduction to Machine Learning, its potential usage in network area,
Brief Intro to Machine Learning CS539
Ensemble Classifiers.
David Amar, Tom Hait, and Ron Shamir
A comparison of PLS-based and other dimension reduction methods for tumour classification using microarray data Cameron Hurst Institute of Health and Biomedical.
Simplified Representation of Vector Fields
Big data classification using neural network
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
The Impact of Functional Redundancy on Molecular Signatures
Machine Learning for Computer Security
A graph-based integration of multiple layers of cancer genomics data (Progress Report) Do Kyoon Kim 1.
Support Feature Machine for DNA microarray data
Debesh Jha and Kwon Goo-Rak
Applying Neural Networks
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
School of Computer Science & Engineering
Rule Induction for Classification Using
School of Computer Science & Engineering
Bag-of-Visual-Words Based Feature Extraction
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Chapter 6 Classification and Prediction
Robust Lung Nodule Classification using 2
CH. 1: Introduction 1.1 What is Machine Learning Example:
Basic machine learning background with Python scikit-learn
An Enhanced Support Vector Machine Model for Intrusion Detection
Data Mining (and machine learning)
Vincent Granville, Ph.D. Co-Founder, DSC
Introduction Artificial Intelligent.
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Perceptron as one Type of Linear Discriminants
iSRD Spam Review Detection with Imbalanced Data Distributions
Ying Dai Faculty of software and information science,
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Learning Chapter 18 and Parts of Chapter 20
Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.
Machine Learning Support Vector Machine Supervised Learning
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Machine Learning for Space Systems: Are We Ready?
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
What's New in eCognition 9
Medical Informatics and Explainable AI
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Patterson: Chap 1 A Review of Machine Learning
Shengcong Chen, Changxing Ding, Minfeng Liu 2018
Presentation transcript:

An Artificial Intelligence Approach to Precision Oncology Alexandru Floares, MD-PhD President of SAIA Institute & Artificial Intelligence Expert

Motivation: Precision Oncology Goals We want highly accurate OMICS tests,  for diagnosis, prognosis, & response to treatment prediction, based on various omics - microarray, NGS - data. Increasing measurements’ precision is a necessary but not sufficient condition for Precision Oncology (Medicine). We also need to include ML & AI (Deep Learning) tools in our bioinformatics workflows. 

Bioinformatics Workflow Steps Usually: quality control, preprocessing, batch effects removal, clustering, and differentially expressed genes (DEG).  For most studies, DEG is the end result! Why? Easy to obtain, but with low clinical impact. For high translational impact we need Accurate, Robust (generalizing well) and Transparent predictive models.  Machine Learning and Artificial Intelligence can be used to develop predictive models satisfying ART Criteria. 

ML Predictive Models The highest performance needs advanced ML methods: Parameter tuning: searching for the best parameters of the ML algorithms. Ensemble Methods: combine multiple models which are voting for the final prediction. Bad news: Needs programming and ML knowledge. Good news: We automated the whole workflow. With a few clicks, one can obtain the highest performance (usually, > 95%), Programming and ML knowledge is not needed.

AI vs. ML Predictive Models I In our ML workflow, data preprocessing is needed but is automated and combine multiple methods. The preprocessing methods depend on technology (e.g., array, NGS) and platform (e.g., Illumina, Agilent). However, Our AI workflow (OmicsBrain™), does NOT needed preprocessing. AI proved capable to learn omics data preprocessing. This sounds crazy! but in Computer Vision it's known that AI could learn to preprocess images (much more sophisticated). AI recently outperformed humans in image classification.  

AI vs. ML Predictive Models II Both ML and AI approaches reached 95-100% accuracy.  However, AI OmicsBrain™, learned to integrate microarray & NGS data on different platforms.  Thus, it is independent on both technology and platform. This allows us to mix various datasets in a large database for training. OmicsBrain™ can use this knowledge to easily learn accurate & robust models from just a few (≤ 10) new unseen NGS or array cases. Usually it's hard if not impossible to obtain the best model from such small studies!

OmicsBrain Learned Data Integr. & Preprocess OmicsBrain Learned Data Integr. & Preprocess.: microarray circulating & NGS tissue miRNA PCA Raw Data PCA OmicsBrain Preprocessing

OmicsBrain learned to integrate microarray circulating miRNA with NGS tissue miRNA data (TCGA data). The raw data Dx are initially mixed and not linearly separable. OmicsBrain learned to make them linearly separable.  Using both tissue and circulating miRNA, not from the same patients, we 'forced' AI to mirror better the cellular situation from liquid biopsy It can used what it learned to model or predict both liquid biopsy data and cellular data.

OmicsBrain™ Transfer Learning: OmicsBrain™ has the capability to transfer the knowledge learned from one problem to a similar but different one: Eg., it learned to discriminate between various cancers and normal. Than, it used this knowledge to easily learn the discrimination between other unseen cancers and normal. Moreover, it only needs a few cases (e.g., ≤ 10) to reach 99-100%! It generalizes well, because the main knowledge was extracted from a much larger dataset (> 5000). No other ML approach can do this.

OmicsBrain™ Multi-Omics I A key aspect of our AI approach is omics data encoding. Ordinary ML accepts only vectorial data as input, where each sample is a vector of genes, either a row or a column. We generalized to tensorial encoding: a vector is a 1-D tensor, a matrix is a 2-D tensor and so on. This allows us not only to include multi-omics data but also various types of domain knowledge, e.g., Expression, methylation, mutations can be taken as inputs, together with GO, Pathways, etc.

OmicsBrain™ Multi-Omics II Properly designed AI systems ca take as learning input such tensorial data structures. While multiple omics data are thus mixed we can also ask questions like these: For a highly relevant gene, which are the most important aspects,  Expression, methylation or mutation? And/or It's pathway, or GO terms, etc.

Transparency & Feature Selection We introduced and followed the ART Criteria: Accuracy, Robustness and Transparency. 2 drawbacks of Deep Learning Neural Networks are: No feature selection, e.g., all genes are used for prediction. Black-Box, non-transparent, not-interpretable models. OmicsBrain™ combines AI with ML for both feature selection and interpretable models developing.  The final models are developed by NN but interpreted either as rules or decision trees.

Functional Redundancy We believe that functional redundancy is a fundamental property of living systems, related to their robustness. Thus, instead of trying to find, from the beginning, the minimal subset of relevant genes, We are first finding all relevant genes, meaning that some of them are somehow equivalent. This open the door for choosing multiple subsets from all relevant genes in a biomedically meaningful way: E.g., actionable genes, genes from a certain pathway...

Interpretability and Individualization Combining decision trees with AI, we are obtaining general, population rules. Using other techniques, we can obtain individualized rules, allowing to ask questions like: Why this patient was diagnosed as cancer or as normal? Why this patient will progress or not? Why this patient is responding or not responding to a certain drug?

Conclusion OmicsBrain™ is and end-to-end solution capable of: Automatically preprocess your microarray, or NGS data from any platform (Illumina, Agilent, etc.), Developing highly accurate, robust and interpretable molecular tests, at the population and individual level, Using just a couple of cases, and without requiring programming or AI knowledge  OmicsBrain™ will be available soon at www.aie.com

Thank You!