What is machine learning

Slides:



Advertisements
Similar presentations
Introduction to Neural Networks Computing
Advertisements

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Computational Methods for Data Analysis
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Active Learning with Support Vector Machines
Review of Matrix Algebra
Introduction to machine learning
Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012.
Ranga Rodrigo April 5, 2014 Most of the sides are from the Matlab tutorial. 1.
Collaborative Filtering Matrix Factorization Approach
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Lecture 2: Introduction to Machine Learning
Little Linear Algebra Contents: Linear vector spaces Matrices Special Matrices Matrix & vector Norms.
Matrices. Given below is a record of all the types of books kept in a class library. TypeFictionGeneral Malay2547 Chinese4072 English8085.
Logistic Regression L1, L2 Norm Summary and addition to Andrew Ng’s lectures on machine learning.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.
Model representation Linear regression with one variable
Andrew Ng Linear regression with one variable Model representation Machine Learning.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Logistic Regression Week 3 – Soft Computing By Yosi Kristian.
Week 1 - An Introduction to Machine Learning & Soft Computing
Online Catalog Tutorial. Introduction Welcome to the Online Catalog Tutorial. This is the place to find answers to all of your online shopping questions.
Matrices: Simplifying Algebraic Expressions Combining Like Terms & Distributive Property.
Introduction Welcome Machine Learning.
Chapter 2-OPTIMIZATION
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Camera Calibration Course web page: vision.cis.udel.edu/cv March 24, 2003  Lecture 17.
Introduction to Machine Learning. Introduce yourself Why you choose this course? Any examples of machine learning you know?
WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
MTH108 Business Math I Lecture 20.
Matrices and Vector Concepts
Matrices Rules & Operations.
Linear Algebra review (optional)
Artificial Intelligence, P.II
Chapter 7 Matrix Mathematics
How to Start This PowerPoint® Tutorial
University of Ioannina
5 Systems of Linear Equations and Matrices
Lecture 3: Linear Regression (with One Variable)
Linear Algebraic Equations and Matrices
Neural Networks for Machine Learning Lecture 1e Three types of learning Geoffrey Hinton with Nitish Srivastava Kevin Swersky.
Multimodal Learning with Deep Boltzmann Machines
Classification with Perceptrons Reading:
CH. 1: Introduction 1.1 What is Machine Learning Example:
Matrix Operations SpringSemester 2017.
Introduction Welcome Machine Learning.
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Logistic Regression Classification Machine Learning.
COMP 175: Computer Graphics February 9, 2016
7.3 Matrices.
Matrices Definition: A matrix is a rectangular array of numbers or symbolic elements In many applications, the rows of a matrix will represent individuals.
Chapter 11-Business and Technology
Collaborative Filtering Matrix Factorization Approach
Logistic Regression Classification Machine Learning.
Machine Learning Algorithms – An Overview
Linear Algebra review (optional)
Math review - scalars, vectors, and matrices
Building Topic/Trend Detection System based on Slow Intelligence
Matrix Operations SpringSemester 2017.
CS+Social Good.
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
Linear regression with one variable
Introduction Welcome Machine Learning.
Presentation transcript:

What is machine learning Introduction What is machine learning Machine Learning

Machine Learning definition Arthur Samuel (1959). Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed. Tom Mitchell (1998) Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

Machine learning algorithms: Supervised learning Unsupervised learning Others: Reinforcement learning, recommender systems. Also talk about: Practical advice for applying learning algorithms.

Introduction Supervised Learning Machine Learning

Housing price prediction. in 1000’s Size in feet2 Supervised Learning “right answers” given Regression: Predict continuous valued output (price)

Breast cancer (malignant, benign) Harmful Not Harmful Breast cancer (malignant, benign) Classification Discrete valued output (0 or 1) 1(Y) Malignant? 0(N) Tumor Size [Feature 1] Another way of representing above graph Tumor Size We have given training data set (past medical records) which tells whether a particular tumor size lead to a breast cancer or not. We want to know , the given tumor size falls in which category

Uniformity of Cell Size Uniformity of Cell Shape … Clump Thickness Uniformity of Cell Size Uniformity of Cell Shape … Age [Feature 2] Tumor Size [Feature 1]

You’re running a company, and you want to develop learning algorithms to address each of two problems. Problem 1: You have a large inventory of identical items. You want to predict how many of these items will sell over the next 3 months. Problem 2: You’d like software to examine individual customer accounts, and for each account decide if it has been hacked/compromised. Should you treat these as classification or as regression problems? Treat both as classification problems. Treat problem 1 as a classification problem, problem 2 as a regression problem. Treat problem 1 as a regression problem, problem 2 as a classification problem. Treat both as regression problems.

Unsupervised Learning Introduction Unsupervised Learning Machine Learning

Supervised Learning x2 x1

Unsupervised Learning x2 x1

Organize computing clusters Social network analysis Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison) Astronomical data analysis image obtained from NASA website. http://www.nasa.gov/multimedia/imagegallery/image_feature_874.html RCW 79 is seen in the southern Milky Way, 17,200 light-years from Earth in the constellation Centaurus. The bubble is 70-light years in diameter, and probably took about one million years to form from the radiation and winds of hot young stars. The balloon of gas and dust is an example of stimulated star formation. Such stars are born when the hot bubble expands into the interstellar gas and dust around it. RCW 79 has spawned at least two groups of new stars along the edge of the large bubble. Some are visible inside the small bubble in the lower left corner. Another group of baby stars appears near the opening at the top. NASA's Spitzer Space Telescope easily detects infrared light from the dust particles in RCW 79. The young stars within RCW79 radiate ultraviolet light that excites molecules of dust within the bubble. This causes the dust grains to emit infrared light that is detected by Spitzer and seen here as the extended red features. Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison) -------------------- From NASA use guidelines: http://www.nasa.gov/multimedia/guidelines/index.html Using NASA Imagery and Linking to NASA Web Sites 10.13.05 Still Images, Audio Files and Video NASA still images, audio files and video generally are not copyrighted. You may use NASA imagery, video and audio material for educational or informational purposes, including photo collections, textbooks, public exhibits and Internet Web pages. This general permission extends to personal Web pages. This general permission does not extend to use of the NASA insignia logo (the blue "meatball" insignia), the retired NASA logotype (the red "worm" logo) and the NASA seal. These images may not be used by persons who are not NASA employees or on products (including Web pages) that are not NASA sponsored. If the NASA material is to be used for commercial purposes, especially including advertisements, it must not explicitly or implicitly convey NASA's endorsement of commercial goods or services. If a NASA image includes an identifiable person, using the image for commercial purposes may infringe that person's right of privacy or publicity, and permission should be obtained from the person. Any questions regarding application of any NASA image or emblem should be directed to: Photo Department NASA Headquarters 300 E St. SW Washington, DC 20546 Tel: (202)358-1900 Fax: (202)358-4333 Linking to NASA Web Sites NASA Web sites are not copyrighted, and may be linked to from other Web sites, including individuals' personal Web sites, without explicit permission from NASA. However, such links may not explicitly or implicitly convey NASA's endorsement of commercial goods or services. NASA images may be used as graphic "hot links" to NASA Web sites, provided they are used within the guidelines above. This permission does not extend to use of the NASA insignia, the retired NASA logotype or the NASA seal. Restrictions Please be advised that: 1) NASA does not endorse or sponsor any commercial product, service, or activity. 2) The use of the NASA name, initials, any NASA emblems (including the NASA insignia, the NASA logo and the NASA seal) which would express or imply such endorsement or sponsorship is strictly prohibited. 3) Use of the NASA name or initials as an identifying symbol by organizations other than NASA (such as on foods, packaging, containers, signs, or any promotional material) is prohibited. 4) NASA does permit the use of the NASA logo and insignia on novelty and souvenir-type items. However, such items may be sold and manufactured only after a proposal has been submitted to and approved by a Visual Identity representative from the Public Outreach Division (Phone: 202/358-1750) in accordance with 14 CFR (Code of Federal Regulations) Part 1221. Permission is granted on a nonexclusive basis as it is not NASA's policy to grant exclusive rights to use any of the agency identities. 5) No approval for use is authorized by NASA when the use can be construed as an endorsement by NASA of a product, service or activity. 6) NASA emblems should be reproduced only from original reproduction proofs, transparencies, or computer files available from NASA Headquarters. Please be advised that approval must be granted by a Visual Identity representative from the Public Outreach Division ( Tel: 202/358-1750) before any reproduction materials can be obtained. Market segmentation Astronomical data analysis

Cocktail party problem Speaker #1 Microphone #1 Speaker #2 Microphone #2

Microphone #1: Microphone #2: Output #1: Output #2: Microphone #1: [Audio clips courtesy of Te-Won Lee.]

Cocktail party problem algorithm [W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x'); [Source: Sam Roweis, Yair Weiss & Eero Simoncelli]

Of the following examples, which would you address using an unsupervised learning algorithm? (Check all that apply.) Given email labeled as spam/not spam, learn a spam filter. Given a set of news articles found on the web, group them into set of articles about the same story. Given a database of customer data, automatically discover market segments and group customers into different market segments. Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients as having diabetes or not.

Model representation Linear regression with one variable Machine Learning

Housing Prices (Portland, OR) (in 1000s of dollars) Size (feet2) Supervised Learning Given the “right answer” for each example in the data. Regression Problem Predict real-valued output

Training set of housing prices (Portland, OR) Size in feet2 (x) Price ($) in 1000's (y) 2104 460 1416 232 1534 315 852 178 … Notation: m = Number of training examples x’s = “input” variable / features y’s = “output” variable / “target” variable

Training Set How do we represent h ? Learning Algorithm Size of house Estimated price Linear regression with one variable. Univariate linear regression.

Linear regression with one variable Cost function Machine Learning

Training Set Hypothesis: ‘s: Parameters How to choose ‘s ? Size in feet2 (x) Price ($) in 1000's (y) 2104 460 1416 232 1534 315 852 178 … Hypothesis: ‘s: Parameters How to choose ‘s ?

Idea: Choose so that is close to for our training examples y x Idea: Choose so that is close to for our training examples

Cost function intuition I Linear regression with one variable Cost function intuition I Machine Learning

Simplified Hypothesis: Parameters: Cost Function: Goal:

(for fixed , this is a function of x) (function of the parameter ) y x

(function of the parameter ) (for fixed , this is a function of x) y x

(function of the parameter ) (for fixed , this is a function of x) y x

Cost function intuition II Linear regression with one variable Cost function intuition II Machine Learning

Hypothesis: Parameters: Cost Function: Goal:

(for fixed , this is a function of x) (function of the parameters ) Price ($) in 1000’s Size in feet2 (x)

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

Linear regression with one variable Gradient descent Machine Learning

Have some function Want Outline: Start with some Keep changing to reduce until we hopefully end up at a minimum

J(0,1) 1 0

J(0,1) 1 0

Gradient descent algorithm Correct: Simultaneous update Incorrect:

Gradient descent intuition Linear regression with one variable Gradient descent intuition Machine Learning

Gradient descent algorithm

If α is too small, gradient descent can be slow. If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge.

at local optima Current value of

Gradient descent can converge to a local minimum, even with the learning rate α fixed. As we approach a local minimum, gradient descent will automatically take smaller steps. So, no need to decrease α over time.

Gradient descent for linear regression Linear regression with one variable Gradient descent for linear regression Machine Learning

Gradient descent algorithm Linear Regression Model

Gradient descent algorithm update and simultaneously

J(0,1) 1 0

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

“Batch”: Each step of gradient descent uses all the training examples. “Batch” Gradient Descent “Batch”: Each step of gradient descent uses all the training examples.

Linear Algebra review (optional) Matrices and vectors Machine Learning

Matrix: Rectangular array of numbers: Dimension of matrix: number of rows x number of columns

Matrix Elements (entries of matrix) “ , entry” in the row, column.

Vector: An n x 1 matrix. 1-indexed vs 0-indexed: element n-dimensional vector

Addition and scalar multiplication Linear Algebra review (optional) Addition and scalar multiplication Machine Learning

Matrix Addition

Scalar Multiplication

Combination of Operands

Matrix-vector multiplication Linear Algebra review (optional) Matrix-vector multiplication Machine Learning

Example

To get , multiply ’s row with elements of vector , and add them up. Details: m x n matrix (m rows, n columns) n x 1 matrix (n-dimensional vector) m-dimensional vector To get , multiply ’s row with elements of vector , and add them up.

Example

House sizes:

Matrix-matrix multiplication Linear Algebra review (optional) Matrix-matrix multiplication Machine Learning

Example

Details: The column of the matrix is obtained by multiplying m x n matrix (m rows, n columns) n x o matrix (n rows, o columns) m x o matrix The column of the matrix is obtained by multiplying with the column of . (for = 1,2,…,o)

Example 7 2 7

House sizes: Have 3 competing hypotheses: 1. 2. 3. Matrix Matrix

Matrix multiplication properties Linear Algebra review (optional) Matrix multiplication properties Machine Learning

Let and be matrices. Then in general, (not commutative.) E.g.

Let Compute Let Compute

Examples of identity matrices: Identity Matrix Denoted (or ). Examples of identity matrices: 2 x 2 3 x 3 4 x 4 For any matrix ,

Inverse and transpose Linear Algebra review (optional) Machine Learning

Not all numbers have an inverse. Matrix inverse: If A is an m x m matrix, and if it has an inverse, Matrices that don’t have an inverse are “singular” or “degenerate”

Matrix Transpose Example: Let be an m x n matrix, and let Then is an n x m matrix, and