Learning with large datasets Machine Learning Large scale machine learning.

Slides:



Advertisements
Similar presentations
Regularization David Kauchak CS 451 – Fall 2013.
Advertisements

Semi-Stochastic Gradient Descent Methods Jakub Konečný (joint work with Peter Richtárik) University of Edinburgh.
Semi-Stochastic Gradient Descent Methods Jakub Konečný University of Edinburgh BASP Frontiers Workshop January 28, 2014.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Optimization Tutorial
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Logistic Regression Classification Machine Learning.
RECITATION 1 APRIL 9 Polynomial regression Ridge regression Lasso.
Recommender Systems Problem formulation Machine Learning.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
Semi-Stochastic Gradient Descent Methods Jakub Konečný (joint work with Peter Richtárik) University of Edinburgh SIAM Annual Meeting, Chicago July 7, 2014.
CS 4700: Foundations of Artificial Intelligence
Distributed Iterative Training Kevin Gimpel Shay Cohen Severin Hacker Noah A. Smith.
Ensemble Learning (2), Tree and Forest
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.
Radial Basis Function Networks
Collaborative Filtering Matrix Factorization Approach
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Artificial Neural Networks
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
F ACTORIZATION M ACHINE : MODEL, OPTIMIZATION AND APPLICATIONS Yang LIU Supervisors: Prof. Andrew Yao Prof. Shengyu Zhang 1.
Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.
Model representation Linear regression with one variable
Andrew Ng Linear regression with one variable Model representation Machine Learning.
Online Learning for Collaborative Filtering
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Machine learning system design Prioritizing what to work on
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Non-Bayes classifiers. Linear discriminants, neural networks.
Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
The problem of overfitting
Regularization (Additional)
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
Chapter 2-OPTIMIZATION
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Collaborative filtering applied to real- time bidding.
Support Vector Machines Optimization objective Machine Learning.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
András Benczúr Head, “Big Data – Momentum” Research Group Big Data Analytics Institute for Computer.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
The Gradient Descent Algorithm
Erasmus University Rotterdam
Machine Learning I & II.
A Simple Artificial Neuron
Classification with Perceptrons Reading:
Tingdan Luo 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo
Machine Learning Feature Creation and Selection
Neural Networks and Backpropagation
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Collaborative Filtering Matrix Factorization Approach
Logistic Regression & Parallel SGD
Overfitting and Underfitting
Machine Learning: UNIT-4 CHAPTER-1
Softmax Classifier.
Machine Learning Algorithms – An Overview
CS639: Data Management for Data Science
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
Linear regression with one variable
Presentation transcript:

Learning with large datasets Machine Learning Large scale machine learning

Andrew Ng Machine learning and data Classify between confusable words. E.g., {to, two, too}, {then, than}. For breakfast I ate _____ eggs. “It’s not who has the best algorithm that wins. It’s who has the most data.” [Figure from Banko and Brill, 2001] Training set size (millions) Accuracy

Andrew Ng Learning with large datasets (training set size) error (training set size) error

Stochastic gradient descent Machine Learning Large scale machine learning

Andrew Ng Linear regression with gradient descent Repeat { (for every ) }

Andrew Ng Linear regression with gradient descent Repeat { (for every ) }

Andrew Ng Batch gradient descent Repeat { (for every ) } Stochastic gradient descent

Andrew Ng Stochastic gradient descent 1.Randomly shuffle (reorder) training examples 2.Repeat { for { (for every ) }

Mini-batch gradient descent Machine Learning Large scale machine learning

Andrew Ng Mini-batch gradient descent Batch gradient descent: Use all examples in each iteration Stochastic gradient descent: Use 1 example in each iteration Mini-batch gradient descent: Use examples in each iteration

Andrew Ng Mini-batch gradient descent Say. Repeat { for { (for every ) }

Large scale machine learning Stochastic gradient descent convergence Machine Learning

Andrew Ng Checking for convergence Batch gradient descent: Plot as a function of the number of iterations of gradient descent. Stochastic gradient descent: During learning, compute before updating using. Every 1000 iterations (say), plot averaged over the last 1000 examples processed by algorithm.

Andrew Ng Checking for convergence Plot, averaged over the last 1000 (say) examples No. of iterations

Andrew Ng Learning rate is typically held constant. Can slowly decrease over time if we want to converge. (E.g. ) const1 iterationNumber + const2 Stochastic gradient descent 1.Randomly shuffle dataset. 2.Repeat { for { (for ) }

Andrew Ng Learning rate is typically held constant. Can slowly decrease over time if we want to converge. (E.g. ) const1 iterationNumber + const2 Stochastic gradient descent 1.Randomly shuffle dataset. 2.Repeat { for { (for ) }

Large scale machine learning Online learning Machine Learning

Andrew Ng Online learning Shipping service website where user comes, specifies origin and destination, you offer to ship their package for some asking price, and users sometimes choose to use your shipping service ( ), sometimes not ( ). Features capture properties of user, of origin/destination and asking price. We want to learn to optimize price.

Andrew Ng Other online learning example: Product search (learning to search) User searches for “Android phone 1080p camera” Have 100 phones in store. Will return 10 results. Use to show user the 10 phones they’re most likely to click on. Other examples: Choosing special offers to show user; customized selection of news articles; product recommendation; … features of phone, how many words in user query match name of phone, how many words in query match description of phone, etc. if user clicks on link. otherwise. Learn.

Large scale machine learning Map-reduce and data parallelism Machine Learning

Andrew Ng Map-reduce Batch gradient descent: Machine 1: Use Machine 2: Use Machine 3: Use Machine 4: Use [ Jeffrey Dean and Sanjay Ghemawat]

Andrew Ng Map-reduce Computer 4 Computer 3 Computer 2 Computer 1 Combine results Training set [

Andrew Ng Map-reduce and summation over the training set Many learning algorithms can be expressed as computing sums of functions over the training set. E.g. for advanced optimization, with logistic regression, need:

Andrew Ng Multi-core machines Core 4 Core 3 Core 2 Core 1 Combine results Training set [ (central-processing-unit)-by-ivak ]