New Trends In Machine Learning and Data Science Ricardo Vilalta Dept New Trends In Machine Learning and Data Science Ricardo Vilalta Dept. of Computer Science University of Houston September, 2015
New Trends in Machine Learning and Data Science Introduction to Machine Learning Machine Learning in Geology Transfer Learning Active Learning Deep Learning Summary
Machine Learning
Classification or Supervised Learning Supervised Learning: Training set x = {x1, x2, …, xN} Class or target vector y = {y1, y2, …, yk} Find a function f(x) that takes a vector x and outputs a class y. {(x,y)} What is machine learning? {(x,y)} f(x)
Clustering or Unsupervised Learning Unsupervised Learning: Training set x = {x1, x2, …, xN} No class or target vector available Find natural groups or clusters in the data What is machine learning? {x}
An application of supervised learning Automatic car drive Train computer-controlled vehicle to steer correctly when driving on a variety of road types. computer (learning algorithm) class 1 steer to the left class 2 steer to the right class 3 continue straight
DARPA Challenge Competition for driverless vehicles DARPA – Defense Advanced Research Projects Agency $2 million dollars – First prize in Oct. 2005 What is machine learning?
Other applications of supervised learning Bio-Technology Protein Folding Prediction Micro-array gene expression Computer Systems Performance Prediction Banking Applications Credit Applications Fraud Detection Character Recognition (US Postal Service) Web Applications Document Classification Learning User Preferences
New Trends in Machine Learning and Data Science Introduction to Machine Learning Machine Learning in Geology Transfer Learning Active Learning Deep Learning Summary
Application on the Surface of Mars: Automated Creation of Geomorphic Maps Martian landscape Geomorphic map shows landforms chosen and defined by a domain expert. Digital Elevation Map Geomorphic Map Manually drawn geomorphic map of this landscape
Attribute Representation Represent the surface of Mars as a quantized rectangular space composed of pixels. P1,1 P1,2 ...... P1,n …… ….. Pn,1 F1 …. Fn Pij represent pixels. Fi represents features.
Initial Work: Unsupervised Learning Each pixel has 6 features Clustering of pixels using EM. The number of clusters is calculated using cross-validation. Landform categories are identified with clusters. Stepinski & Vilalta, “Digital Topography Models for Martian Surfaces”, IEEE Geoscience and Remote Sensing Letters, 2(3), p260., 2005
Initial Work: Results 12 resultant clusters Each cluster given a posteriori meaning by domain expert. After meaning is assigned 12 clusters are grouped into 4 super-clusters based on meaning.
Our Approach: Pixel based topographic data (DEMs) Object based topographic data Segmentation Geomorphic Map(s) Supervised Learning
Segmentation
Segmentation: Results 2631 segments homogeneous in slope, curvature and flood. Displayed on an elevation background.
Segmentation: Results
Landforms of Interest (Classes): Crater Floor. Crater Wall. Convex Concave Flat Plain. Ridge.
Classification: Labeling A representative subset of objects are labeled as one of the following six classes: Plain Crater Floor Convex Crater Walls Concave Crater Walls Convex Ridges Concave Ridges 517 labeled segments.
Classification: Results Plain Crater Floor Convex Crater Walls Concave Crater Walls Convex Ridges Concave Ridges
Perspective View
Test Site: EvrovallisW
Classification: Results Plain Crater Floor Convex Crater Walls Concave Crater Walls Convex Ridges Concave Ridges
Application on Seismic Data Construction and Evaluation of Relevant Attributes Attributes are selected based on their capacity to separate one class from another (e.g., salt deposit from background). Methodology: Sample from inside salt deposit Sample from outside salt deposit Training dataset Statistical and Information Theoretic Metrics
Unsupervised Learning of Geological Bodies Methodology New processed training dataset (using data filters) Cube of seismic data Unsupervised Learning Algorithm Clustering
Supervised Learning of Geological Bodies Methodology New processed training dataset (using data filters) Cube of seismic data Learning Algorithm Expert Labels Support Vector Machines Adaboost Random
Supervised Learning of Geological Bodies Challenges: The sheer size of the 3D data cube precludes training predictive models with more than just 1% of the available training. 0.5% of the data corresponds to 2 million voxels. Our experiments were performed on a computer with 64 GB of memory and 12 cores. It took days to complete the entire data processing. node1 node3 node5 node4 node2 High Bayes Error in classification.
Supervised Learning of Geological Bodies Challenges: Single attributes bear incomplete information about the class.
New Trends in Machine Learning and Data Science Introduction to Machine Learning Machine Learning in Geology Transfer Learning Active Learning Deep Learning Summary
Transfer Learning The goal is to transfer knowledge gathered from previous experience. Also called Inductive Transfer or Learning to Learn. Example: Invariant transformations across tasks.
Motivation Transfer Learning Motivation for transfer learning Once a predictive model is built, there are reasons to believe the model will cease to be valid at some point in time. The difference is that now source and target domains can be completely different.
Traditional Approach to Classification DB1 DB2 DBn Learning System Learning System Learning System
Transfer Learning DB1 DB2 Source domain DB new Target domain Learning System Learning System Learning System Knowledge
Knowledge of Parameters Assume prior distribution of parameters Source domain Learn parameters and adjust prior distribution Target domain Learn parameters using the source prior distribution.
Knowledge of Parameters Find coefficients ws using SVMs Find coefficients wT using SVMs initializing the search with ws
Feature Transfer Feature Transfer: Source Target domain domain Shared representation across tasks Minimize Loss-Function( y, f(x)) The minimization is done over multiple tasks (multiple regions on Mars).
Feature Transfer Identify common Features to all tasks
New Trends in Machine Learning and Data Science Introduction to Machine Learning Machine Learning in Geology Transfer Learning Active Learning Deep Learning Summary
Classification: Labeling A representative subset of objects are labeled as one of the following six classes: Plain Crater Floor Convex Crater Walls Concave Crater Walls Convex Ridges Concave Ridges 517 labeled segments.
Active Learning Learning Algo. Pool-Based Sampling Assume a small set of labeled examples and a large set of unlabeled examples. Here we evaluate and rank the whole set of unlabeled examples; we then choose one or more examples. Learning Algo.
Sampling Based on Uncertainty
Sampling Based on Uncertainty 70% accuracy 90% accuracy Figure taken from “Active Learning” by Burr Settles, Morgan & Claypool, 2012.
New Trends in Machine Learning and Data Science Introduction to Machine Learning Machine Learning in Geology Transfer Learning Active Learning Deep Learning Summary
Commercial Planes, Military Planes Deep Learning The idea is to disentangle factors of variation and to attain high level representations. Commercial Planes, Military Planes Engine, Main Fuselage Small Object Parts Edges and Contours Pixel Information
Deep Learning We want to capture compact, high-level representations in an efficient and iterative manner. Learning takes place at several levels of representations. Think about a hierarchy of concepts of increasing complexity. Low levels concepts are the foundation for high level concepts.
Deep Learning Deep Learning is important to avoid the credit-assignment problem in deep neural networks. Who to blame? What is machine learning?
Deep Learning Deep Learning has gained in popularity during the past years. Military Automotive Surveillance Financial Medical What is machine learning?
Deep Learning There are three basic types on Deep Networks: Deep Networks for unsupervised or generative learning. Capture high order correlations of the data (no class labels) Deep Networks for Supervised Learning Model the posterior distribution of the target variable for classification purposes (Discriminative Deep Networks). Hybrid Deep Networks Combine the methods above.
Deep Learning Deep Networks for Unsupervised Learning There are no class labels during the learning process. There are many types of generative or unsupervised deep networks. Energy-based deep networks are the most popular. Example: Deep Auto Encoder.
Deep Learning Auto Encoder
Deep Learning No. of output features = No input features Auto Encoder Intermediate nodes encode the original data.
Deep Learning “Deep” Auto Encoder Key idea: Pre-train each layer as an auto-encoder.
An Example in Deep Learning Learn a “concept” (sedimentary rocks) from many images until a high-level representation is achieved.
An Example in Deep Learning Learn a hierarchy of abstract concepts using deep learning. Global properties Deep Learning Local properties
Deep Learning There are three basic types on Deep Networks: Deep Networks for unsupervised or generative learning. Capture high order correlations of the data (no class labels) Deep Networks for Supervised Learning Model the posterior distribution of the target variable for classification purposes (Discriminative Deep Networks). Hybrid Deep Networks Combine the methods above.
Deep Learning Convolutional Neural Networks Local Weight Update Implies a sparse representation
Deep Learning The idea is still to find a minimum in the space of weights and the error function E: E(W) w1 w2
Deep Learning Output nodes Internal nodes Input nodes
Deep Learning on Seismic Data Methodology New training dataset Deep Learning Cube of seismic data Expert Labels Learning Algorithm
Supervised Learning of Geological Bodies Challenges: Single attributes bear incomplete information about the class.
Supervised Learning of Geological Bodies Challenges: Deep learning can capture “global” features that detect entire geological bodies as the result of the non-linear combination of many local models.
Deep Learning on Seismic Data Decompose seismic cube into small cubes and create a large no. of examples.
Deep Learning on Seismic Data Each cube is an example that we can feed into a deep learning architecture.
New Trends in Machine Learning and Data Science Introduction to Machine Learning Machine Learning in Geology Transfer Learning Active Learning Deep Learning Summary
Summary When we have similar classification tasks but there is indication that the distributions have changed Transfer Learning When we have few training examples, labeling is expensive Active Learning When we need more abstract features Deep Learning
Conclusions Deep Learning can provide new high-level global features. Entire global geological structures can be identified by combining Low level feature representations of seismic data.
THANK YOU