Machine Learning 102 Jeff Heaton.

Slides:



Advertisements
Similar presentations
Machine Learning on.NET F# FTW!. A few words about me  Mathias Brandewinder  Background: economics, operations research .NET developer.
Advertisements

Neural networks Introduction Fitting neural networks
Neural Networks and SVM Stat 600. Neural Networks History: started in the 50s and peaked in the 90s Idea: learning the way the brain does. Numerous applications.
Artificial Neural Networks (1)
Submit Predictions Statistics & Analysis Data Management Hypotheses Goal Get Data Predict whom survived the Titanic Disaster Score = Number of Passengers.
Artificial Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.
Machine Learning Neural Networks
x – independent variable (input)
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
Artificial Neural Networks Artificial Neural Networks are (among other things) another technique for supervised learning k-Nearest Neighbor Decision Tree.
Introduction to Neural Networks Simon Durrant Quantitative Methods December 15th.
Part I: Classification and Bayesian Learning
Chapter 5 Data mining : A Closer Look.
Neural Networks Lecture 8: Two simple learning algorithms
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
CS-424 Gregory Dudek Today’s Lecture Neural networks –Backprop example Clustering & classification: case study –Sound classification: the tapper Recurrent.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
NEURAL NETWORKS FOR DATA MINING
Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.
Classification / Regression Neural Networks 2
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
Ensemble Methods: Bagging and Boosting
Non-Bayes classifiers. Linear discriminants, neural networks.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Titanic: Machine Learning from Disaster
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Chong Ho Yu.  Data mining (DM) is a cluster of techniques, including decision trees, artificial neural networks, and clustering, which has been employed.
COSC 4426 AJ Boulay Julia Johnson Artificial Neural Networks: Introduction to Soft Computing (Textbook)
Artificial Neural Networks (ANN). Artificial Neural Networks First proposed in 1940s as an attempt to simulate the human brain’s cognitive learning processes.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山 助教: 熊信寬
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning with Spark MLlib
Deep Learning Amin Sobhani.
Machine Learning & Deep Learning
Artificial Intelligence (CS 370D)
Predict whom survived the Titanic Disaster
Introduction to Data Science Lecture 7 Machine Learning Overview
Vincent Granville, Ph.D. Co-Founder, DSC
Introduction to Azure Machine Learning Studio
Machine Learning & Data Science
Machine Learning Today: Reading: Maria Florina Balcan
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Overview of Machine Learning
Application of Logistic Regression Model to Titanic Data
Deep Learning for Non-Linear Control
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Intro to Machine Learning
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Neural networks (1) Traditional multi-layer perceptrons
Artificial Intelligence 10. Neural Networks
Machine Learning Algorithms – An Overview
Welcome everyone. Been to good sessions, exciting ones coming up.
COSC 4335: Part2: Other Classification Techniques
Artificial Neural Network learning
Patterson: Chap 1 A Review of Machine Learning
Presentation transcript:

Machine Learning 102 Jeff Heaton

Jeff Heaton Data Scientist, RGA PhD Student, Computer Science Author jheaton@rgare.com

What is Data science? Drew Conway’s Venn Diagram Hacking Skills, Statistics & Real World Knowledge

Artificial Intelligence for Humans (AIFH) My Books Artificial Intelligence for Humans (AIFH)

All links are at my blog: http://www.jeffheaton.com All code is at my GitHub site: https://github.com/jeffheaton/aifh See AIFH volumes 1&3 Where to Get The Code? My Github Page

What is Machine Learning Making sense of potentially huge amounts of data Models learn from existing data to make predictions with new data. Clustering: Group records together that have similar field values. Often used for recommendation systems. (e.g. group customers with similar buying habits) Regression: Learn to predict a numeric outcome field, based on all of the other fields present in each record. (e.g. predict a student’s graduating GPA) Classification: Learn to predict a non-numeric outcome field. (e.g. predict the field of a student’s first job after graduation) What is Machine Learning Machine Learning & Data Science

From Simple Models to State of the Art Evolution of ML From Simple Models to State of the Art

Supervised Training Learning From Data

Simple Linear Relationship class FahrenheitToCelsius { public static void main(String[] args) { double temperatue; Scanner in = new Scanner(System.in); System.out.println("Enter temperature in Celsius: "); temperature = in.nextInt(); temperatue = (temperatue*1.8)+32; System.out.println("Temperature in Fahrenheit = " + temperatue); in.close(); } Conversion Simple Linear Relationship

Simple Linear Relationship public static double regression(double x) { return (x*1.8)+32; } public static void main(String[] args) { double temperature; Scanner in = new Scanner(System.in); System.out.println("Enter temperature in Celsius: "); temperatue = in.nextInt(); System.out.println( "Temperature in Fahrenheit = " + regression(temperature) ); in.close(); Regression Simple Linear Relationship

Simple Linear Relationship Shoe size predicted by height Fahrenheit from Celsius Must fit a line Simple linear relationship Shoe size predicted by height Fahrenheit from Celsius Two coefficients (or parameters) Many ways to get parameters. Linear Regression Simple Linear Relationship

System.out.println(regression(x,param)); public double regression(double[] x, double[] param) { double sum = 0; for(int i=0;i<x.length;i++) { sum+=x[i]*param[i+1]; } sum+=param[0]; return sum; x[0] = in.nextInt(); double[] param = { 32, 1.8 }; System.out.println(regression(x,param)); Multiple Regression Multiple Inputs

MUlti-Linear Regression What if you want to predict shoe size based on height and age? x1 = height, x2 = age, determine the betas. 3 parameters MUlti-Linear Regression Higher Dimension Regression

Generalized Linear Regression public static double sigmoid(double x) { return 1.0 / (1.0 + Math.exp(-1 * x)); } public static double regression(double[] x, double[] param) { double sum = 0; for (int i = 0; i < x.length; i++) { sum += x[i] * param[i + 1]; sum += param[0]; return sigmoid(sum); GLM Generalized Linear Regression

Sigmoid Function S-Shaped Curve

Generalized Linear Model Linear regression using a link function Essentially a single layer neural network. Link function might be sigmoid or other. GLM Generalized Linear Model

Artificial Neural Network (ANN) Multiple inputs (x) Weighted inputs are summed Summation + Bias fed to activation function (GLM) Bias = Intercept Activation Function = Link Function Neural Network Artificial Neural Network (ANN)

Neural Network with Several Layers Multiple layers can be formed Neurons receive their input from other neurons, not just inputs. Multiple Outputs Multi-Layer ANN Neural Network with Several Layers

How do we find the weights/coefficient/beta values? Differentiable or non-differentiable? Gradient Descent Genetic Algorithms Simulated Annealing Nelder-Mead Training/Fitting How do we find the weights/coefficient/beta values?

Finding Optimal Weights Loss function must be differentiable Combines the best of ensemble tree learning and gradient descent One of the most effective machine learning models used on Kaggle gradient Descent Finding Optimal Weights

Neural Network Trying to be Deep Deep Learning Neural Network Trying to be Deep

Finding Optimal Weights Deep Learning Finding Optimal Weights

Deep learning layers can be trained individually. Highly parallel. Data can be both supervised (labeled) and unsupervised. Feature vector must be binary. Very often used for audio and video recognition. Deep Learning Overview

Kaggle tutorial competition. Predict the outcome: Survived Perished From passenger features: Gender Name Passenger class Age Family members present Port of embarkation Cabin Ticket Case Study: Titanic Kaggle tutorial competition.

Titanic Passenger Data Can you predict the survival (outcome) of a Titanic passenger, given these attributes (features) of each passenger?

Is the name field useful? Can it help us extrapolate ages? Can it help us guess passengers with no age? Moran, Mr. James Williams, Mr. Charles Eugene Emir, Mr. Farred Chehab O'Dwyer, Miss. Ellen "Nellie" Todoroff, Mr. Lalio Spencer, Mrs. William Augustus (Marie Eugenie) Glynn, Miss. Mary Agatha Moubarek, Master. Gerios Insights into Data Is the name field useful? Can it help us extrapolate ages?

Beyond age, what can titles tell us about these passengers? Other passengers of the Titanic. Carter, Rev. Ernest Courtenay Weir, Col. John Minahan, Dr. William Edward Rothes, the Countess. of (Lucy Noel Martha Dyer-Edwards) Crosby, Capt. Edward Gifford Peuchen, Major. Arthur Godfrey Sagesser, Mlle. Emma Title Insights Beyond age, what can titles tell us about these passengers?

Baseline Titanic Stats Passengers in Kaggle train set: 891 Passengers that survived: 38% Male survival: 19% Female survival: 74% Baseline Titanic Stats These stats form some baselines for us to compare with other potentially significant features.

Title’s Affect Survival # Survived Survived Survived Age Master 76 58% Mr. 915 16% Miss. 332 71% 21.8 Mrs. 235 79% 36.9 Military 10 40% Clergy 12 0% 41.3 Nobility 60% 33% 100% 41.2 Doctor 13 46% 36% 43.6 Title’s Affect Survival The titles of passengers seemed to affect survival. Baseline male: 38%, female: 74%.

Departure & Survival The departure port seemed to affect survival. # Survived Survived Survived Queenstown 77 39% 7% 75% Southampton 664 33% 17% 68% Cherbourg 168 55% 30% 88% Departure & Survival The departure port seemed to affect survival. Baseline male: 38%, female: 74%.

4th lifeboat launched from the RMS Titanic at 1:05 am The lifeboat had a capacity of 40, but was launched with only 12 aboard 10 men, 2 women Lifeboat #1 caused a great deal of controversy Refused to return to pick up survivors in the water Lifeboat #1 passengers are outliers, and would not be easy to predict Outliers: Lifeboat #1 We should not attempt to predict outliers. Perfect scores are usually bad. Consider Lifeboat #1.

Titanic Model Strategy Use both test & train sets for extrapolation values. Use a feature vector including titles. Use 5-fold cross validation for model selection & training. Model choice RBF neural network. Training strategy: particle swarm optimization (PSO) Submit best model from 5 folds to Kaggle. Titanic Model Strategy This is the design that I used to submit an entry to Kaggle.

Crossvalidation Cross validation uses a portion of the available data to validate out model. A different portion for each cycle.

These are the 13 features I used to encode for Kaggle. Age: The interpolated age normalized to -1 to 1. Sex-male: The gender normalized to -1 for female, 1 for male. Pclass: The passenger class [1-3] normalized to -1 to 1. Sibsp: Value from the original data set normalized to -1 to 1. Parch: Value from the original data set normalized to -1 to 1. Fare: The interpolated fare normalized to -1 to 1. Embarked-c: The value 1 if the passenger embarked from Cherbourg, -1 otherwise. Embarked-q: The value 1 if the passenger embarked from Queenstown, -1 otherwise. Embarked-s: The value 1 if the passenger embarked from Southampton, -1 otherwise. Name-mil: The value 1 if passenger had a military prefix, -1 otherwise. Name-nobility: The value 1 if passenger had a noble prefix, -1 otherwise. Name-Dr.: The value 1 if passenger had a doctor prefix, -1 otherwise. Name-clergy: The value 1 if passenger had a clergy prefix, -1 otherwise. My Feature Vector These are the 13 features I used to encode for Kaggle.

This is the design that I used to submit an entry to Kaggle. Submitting to Kaggle This is the design that I used to submit an entry to Kaggle.

Here are some web resources I’ve found useful. Microsoft Azure Machine Learning http://azure.microsoft.com/en-us/services/machine-learning/ Johns Hopkins COURSERA Data Science https://www.coursera.org/specialization/jhudatascience/1 KDNuggets http://www.kdnuggets.com/ R Studio http://www.rstudio.com/ CARET http://cran.r-project.org/web/packages/caret/index.html scikit-learn http://scikit-learn.org/stable/ Other Resources Here are some web resources I’ve found useful.

www.jeffheaton.com Thank you Any questions?