Machine Learning.

Slides:

Advertisements

Similar presentations

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.

Advertisements

Thomas Trappenberg Autonomous Robotics: Supervised and unsupervised learning.

Machine Learning Week 3 Lecture 1. Programming Competition

Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.

Support vector machine

Machine learning continued Image source:

An Overview of Machine Learning

Supervised Learning Recap

Machine Learning & Data Mining CS/CNS/EE 155 Lecture 2: Review Part 2.

Machine Learning Week 2 Lecture 1.

Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.

Linear Discriminant Functions

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Support Vector Machines and Kernel Methods

x – independent variable (input)

Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.

Support Vector Machines Kernel Machines

November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.

Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.

2806 Neural Computation Support Vector Machines Lecture Ari Visa.

Artificial Neural Networks

CHAPTER 11 Back-Propagation Ming-Feng Yeh.

Part I: Classification and Bayesian Learning

Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.

Radial Basis Function Networks

Crash Course on Machine Learning

Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,

Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

This week: overview on pattern recognition (related to machine learning)

机器学习陈昱北京大学计算机科学技术研究所信息安全工程研究中心. Concept Learning Reference : Ch2 in Mitchell’s book 1. Concepts: Inductive learning hypothesis General-to-specific.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Artificial Neural Networks Shreekanth Mandayam Robi Polikar …… …... … net k   

Support Vector Machine (SVM) Based on Nello Cristianini presentation

Andrew Ng Linear regression with one variable Model representation Machine Learning.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,

Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.

Optimal Bayes Classification

CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.

Support Vector Machines and Kernel Methods Machine Learning March 25, 2010.

Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.

CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.

Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Wine Clustering Ling Lin. Contents ❏ Motivation ❏ Data ❏ Dimensionality Reduction-MDS, Isomap ❏ Clustering-Kmeans, Ncut, Ratio Cut, SCC ❏ Conclustion.

Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.

Data Mining and Decision Support

Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.

Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.

Support Vector Machines Optimization objective Machine Learning.

SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Page 1 CS 546 Machine Learning in NLP Review 1: Supervised Learning, Binary Classifiers Dan Roth Department of Computer Science University of Illinois.

CSSE463: Image Recognition Day 14

CS 9633 Machine Learning Support Vector Machines

Radial Basis Function G.Anuradha.

Data Mining Lecture 11.

Collaborative Filtering Matrix Factorization Approach

Online Learning Kernels

Neuro-Computing Lecture 4 Radial Basis Function Network

Deep Learning for Non-Linear Control

Introduction to Radial Basis Function Networks

What is Artificial Intelligence?

Patterson: Chap 1 A Review of Machine Learning

Presentation transcript:

Machine Learning

Teaser – Zalando’s Next Top Analyst Find him/her from our current info This is what we could extract from independent sources: The candidate is likely to be close to the river Spree. The probability at any point is given by a Gaussian function of its shortest distance to the river. The function peaks at zero and has 95% of its total integral within +/-2730m. A probability distribution centered around the Brandenburg Gate also informs us of the candidate's location. The distribution’s radial profile is log-normal with a mean of 4700m and a mode of 3877m in every direction. A satellite offers further information: with 95% probability she is located within 2400m distance of the satellite’s path (assuming a normal probability distribution)

Practical Stuff Lots of students Lots of teachers it seems Hand ins mandatory – not part of grade – but part of the curriculum. Exam 30 min. No Preparation Introductory course. Focus on understanding not going through as much as possible. Complain and ask question now not (only) after course

Machine Learning Jungle Notation, paradigms, techniques, models, assumptions…. Every book does it differently.

Movie Ratings 902010 Flying Sharks Tom Cruise … 10 902010 902010 Flying Sharks Tom Cruise … 10

Netflix Competition 10% Improvement 1.000.000$ Prize Netflix Matrix User 1 User 2 Johan … Movie 1 Movie 2 Sharknado ? 1 1/1/2012 5 1/2/2013 netflix are lazy and believers in big data, data mining, machine learning. Netflix Matrix 10% Improvement 1.000.000$ Prize

When to learn There is a pattern You do not know the pattern You have data Not random functions, not arbitrary functions If i know then i do not need to learn it I need something to learn from.

Supervised Learning Learning From Examples 5 0 4 1 9 2 1 3 1 4 Data is list of example, target pairs Predict stock market tomorrow from stock market today. Label mails spam based on million other spam and non spam messages. Predict price of a house from historical records on house sales based on size, area, etc. Grade students in machine learning based on their grade in other courses. Learn mapping from examples to targets that generalize to new unseen data ?

Unsupervised Learning Learning About Data Preparing data for supervised learning, structure or probability distribution. Dimensionality reduction. 28 x 28 to 2d, quite impressive. Patterns about people buying beer are more likely to buy pretzels. Finding anomalies, intruder detection or new trends or something like that. Data is list of examples Extract information about data e.g. structure, patterns, anomalies.

Reinforcement Learning Learning By Doing Agent, Enviroment Kid getting burned People who study. negative now payoff later. Baby screaming to get parents attention so it can eat. Data is (state, action, reward) triples Optimize the rewards

Approximate Course Plan 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Linear Models Convex Optimization Learning Theory VC dimension Bias Variance Regularization/Validation Neural Nets, SVMs, Kernels, RBF Hidden Markov Models (Storm, Mailund) Boosting/Aggregation Clustering, Outlier Detection (Assent) Markov Decision Process, Backgammon… Supervised Learning Unsupervised Learning Reinforcement Learning

Today Linear Classification – Perceptron Linear Regression Nonlinear Transforms

Supervised Learning Setup Target Y: Features/Predictors (Inputs) X Unknown pattern Data Learn (from Data) a hypothesis that mimics target on new unknown data Generalization is king Seems impossible and it is in the general case.

Overview Unknown Target Data Set Learning Algorithm Hypothesis Set Hypothesis h h(x) ≈ f(x) Learning Algorithm Leap of faith that this is actually sensible. Formalize with error measure. and what we need to assume to to make learning possible Next week we will elaborate on this a lot. Hypothesis set is not a restriction, error measure must be formalized. Hypothesis Set This Week

Example Wine Classification Alcohol 14.23 Ash 2.43 Malic Acid 1.71 Alcalinity of ash 15.6 Magnesium 127 Total Phenols 2.8 Flavanoids 3.06 Nonflavanoid Phenols .28 Proanthocyanins 2.29 Color Intensity 5.64 Hue 1.04 OD280/OD315 of diluted wines 3.92 Proline 1065 Chemical Analysis of wines from different producers. You already saw the digits example. Now for something completely different. Your job is to come up with hypothesis set and learning algorithm. Learn to distinguish producers from measurements. Data is list of measurements, producer pairs

Linear Classification (2 Classes) x0 x1 x2 … xd y 1 -1 x1 x2 … xd y 1 -1 IF Return +1 Else Return -1 sign(wTx) Hypothesis set: 2D example Leap of faith Learn good hypothesis, w, from data sign(wTx) Given new point x – Classify as

Simplified Example 2D Hyperplane Halfspace < 0 Halfspace >0 w Data Is Linear Separable!!!

Perceptron Algorithm Assume Data Is Linear Separable!!! Theorem: Switch to matlab and show how it works. Read argument for convergence. Consider adding pocket algorithm update. running time is function of margin and magnitude of input points. Describe the data set generation and so on. Theorem: If data linear separable the perceptron algorithm finds a separating hyperplane.

Perceptron Summary If data linear separable, perceptron converges If not it never stops Does not converge toward Optimizing that is np-hard Running time is function of data (exercise 1.3 in book)

Pricing Houses Historical data of real estate market Size 145 Property 666 Rooms 42 Levels 10 House Age 7 Bathrooms 1 ... … Historical data of real estate market Make into prediction house prices. So i have a regression problem. Learn to price houses from property info Data is list of house info, sales price pairs

Linear Regression Learn good hypothesis, w, from data x0 x1 x2 … xd y 1 6 42 3 33 2 4 5 Already included the bias variable Learn good hypothesis, w, from data Given new point x – Output estimate

2D Example (Predict y From x)

Contour Plot of 2D Error Surface Line is determined by 2 parameters. So i can plot the error . Color encodes value Looks easy enough…

Finding Minimum – A Reminder Since E is convex Local minimum is global minimum in our case since E is convex (next time)

Data Representation n x (d+1) matrix each row is an input point n x 1 column vector, each row is input target (d+1) x 1 column vector

Error Measure Manipulation Picture of error measure

Gradient For Linear Regression (L2) Think of w^T (X^TX)w as taking derivaties of a square. Take it slow. Tedious math. Solve for 0:

Result

Summary Linear Regression

Nonlinear Transformations ? Linear in w Still Linear in w

Making Data Linear Separable

Overfitting