Machine Learning 101 Intro to AI, ML, Deep Learning Intro to Machine learning Intro to Deep learning Neural networks Convoluted neural networks Computational biology/genomics My try at ML and deep learning I’ll put together a few slides to present my understanding of various machine learning and deep learning algorithms and how it can be useful to us. I also gave 7 different algorithms a go and applied them to two different datasets. Richard gave a talk last week: Machine learning applications in genetics and genomics (2015); Nicola gave one on ‘How the machine ‘thinks’’; I gave one last year on DeepSEA – which utilises convoluted neural networks to make predictions on the functional effects of a variant I’ve read some books on this, I hope to show that we’re actually in the ML world maybe without realising. Computerphile – Deep Learning URL: https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/ NVIDIA Blog
AI: anything that mimics human intelligence i. e AI: anything that mimics human intelligence i.e. computations, calculations ML: Give the PC an algorithm/paradigm/model (i.e. distinguishing features) and it’ll try and predict/classify – this is what we’re doing with our regression models DL: Give it (clean) data and it’ll extract the features and do the classification Source URL: towardsdatascience.com
Machine Learning 101 “Computers’ ability to learn without being explicitly programmed” Make predictions using (extracted features from) data e.g. Email filters – spam or not? Supervised v unsupervised methods Consideration: Categorical v continuous data We’re surrounded by ML. Just to give a few examples of where ML is used: email filters, smart watches, you may be interested, anything with prediction Data
1- Predicts category/cluster: Classification method (e.g. SVM, CART) Hidden structure Labelled data 1- Predicts category/cluster: Classification method (e.g. SVM, CART) 2- Quantify: Regression method (e.g. Linear Regression, LASSO) 1- Group data: Cluster analysis method (e.g. K-means, Hierarchical) 2- Assign value to each data point: Dimensionality reduction methods (e.g. PCA, ICA) 3- Predict outcome probability from categorical data: Bayesian methods (e.g. HMMs) PCA: Converts a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables CART: Classification And Regression Trees SVM: Support vector machine LASSO: Least Absolute Shrinkage and Selection Operator ICA: Independent component analysis
Many ML algorithms HMMs Mention SVM procedure
Deep learning 101 Inspired by axons in a biological brain Application of artificial neural networks to learning tasks that contain more than one hidden layer Amount of data* α performance Unlike traditional methods (e.g. logistic regression, SVM) where you observe a plateauing after a certain threshold Due to larger neural networks being formed Today: ML becoming synonymous with Deep Learning (e.g. Google, Kaggle) – because DL is outperforming traditional approaches in prediction *labelled, clean/precise and relevant data
Neural networks 101 A simple example: univariate linear regression => one neuron ReLU: Rectified linear unit DeepSEA utilises what is called a CNN – and to understand CNNs, you need to know something about neural networks A. Ng
More complicated housing price predictor: multivariate linear regression
Convoluted Neural Networks CNNs are widely used in image recognition Convolution can be thought as a sliding window function applied to a matrix Convolution is the process of adding each element of the image to its local neighbours, weighted by the kernel. In example: Window size 3x3 and stride 1 www.wildml.com *Yellow box: Kernel (convolution matrix)
Averaging each pixel with its neighboring values blurs an image Taking the difference between a pixel and its neighbours detects edges 256 x 256 pixels
Continued… Kernel(s) Kernel(s) Feature extraction Classification Feature extraction output: feature vector -> (hopefully!) a summary of all the data This is then attached to a neural network Kernel(s) Kernel(s) Feature extraction Classification
The classical machine learning workflow: (i) data pre‐processing, (ii) feature extraction, (ii) model learning and (iv) model evaluation. Supervised machine learning methods relate input features x to an output label y, whereas unsupervised method learns factors about x without observed labels. Raw input data are often high‐dimensional and related to the corresponding label in a complicated way, which is challenging for many classical machine learning algorithms (left plot). Alternatively, higher‐level features extracted using a deep model may be able to better discriminate between classes (right plot). Deep networks use a hierarchical structure to learn increasingly abstract feature representations from the raw data. Computational biology: variant effect predictors (e.g. DeepSEA), genome annotations, PC: population stratification, prediction Angermueller et al., 2016
Working examples Is someone athletic or not? COPD or not? 7 ML algorithms Logistic regression LDA CART Naïve Bayes K-nearest neighbour SVM RBM feature extraction with logistic regression classifier
COPD input Could have added pack years, and of course lung function measures Age, sex, ever smoked, air pollution, genetic risk score -> COPD or not
Logistic regression
K-Nearest neighbours classifier
Linear discriminant analysis
Appendix
Slide by: D. Evans. California Pacific Medical Center