Introduction to Data Science Lecture 7 Machine Learning Overview

Slides:



Advertisements
Similar presentations
Machine Learning: Intro and Supervised Classification
Advertisements

CHAPTER 9: Decision Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Indian Statistical Institute Kolkata
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Part I: Classification and Bayesian Learning
5. Machine Learning ENEE 759D | ENEE 459D | CMSC 858Z
Introduction to machine learning
Overview DM for Business Intelligence.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Machine Learning CSE 681 CH2 - Supervised Learning.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Learning from Observations Chapter 18 Through
Instructor: Pedro Domingos
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Data Mining and Decision Support
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
LECTURE 20: SUPPORT VECTOR MACHINES PT. 1 April 11, 2016 SDS 293 Machine Learning.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Big data classification using neural network
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Who am I? Work in Probabilistic Machine Learning Like to teach 
Information Organization: Overview
Machine Learning Clustering: K-means Supervised Learning
Instructor: Pedro Domingos
Deep Learning Amin Sobhani.
Lecture 3: Linear Regression (with One Variable)
CSE 4705 Artificial Intelligence
Perceptrons Lirong Xia.
Machine Learning I & II.
The Elements of Statistical Learning
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Supervised Learning Seminar Social Media Mining University UC3M
CSEP 546 Data Mining Machine Learning
Special Topics in Data Mining Applications Focus on: Text Mining
Machine Learning & Data Science
What is Pattern Recognition?
Machine Learning Week 1.
Intro to Machine Learning
Prepared by: Mahmoud Rafeek Al-Farra
CSEP 546 Data Mining Machine Learning
CS Fall 2016 (Shavlik©), Lecture 2
Overview of Machine Learning
Lecture 6: Introduction to Machine Learning
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Intro to Machine Learning
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Transformations targeted at minimizing experimental variance
Information Retrieval
Machine Learning Algorithms – An Overview
Junheng, Shengming, Yunsheng 11/09/2018
Basics of ML Rohan Suri.
COSC 4335: Part2: Other Classification Techniques
CS639: Data Management for Data Science
Machine Learning – a Probabilistic Perspective
Information Organization: Overview
Shih-Yang Su Virginia Tech
Lecture 16. Classification (II): Practical Considerations
Perceptrons Lirong Xia.
Machine Learning in Business John C. Hull
CS249: Neural Language Model
An introduction to Machine Learning (ML)
ECE – Pattern Recognition Midterm Review
Presentation transcript:

Introduction to Data Science Lecture 7 Machine Learning Overview CS 194 Spring 2014 Michael Franklin Dan Bruckner, Evan Sparks, Shivaram Venkataraman

What is it? “Machine learning systems automatically learn programs from data” P. Domingos, CACM 10/12

Some Examples Classification Regression Learned attribute is categorical (spam vs. ham) Input: vectors of “feature values” (disc. or cont.) Output: a single discrete value (a.k.a. the “class”) Regression Learned attribute is numeric Fiting a curve to data can then use that curve to predict outcomes

Classification Example Learn a function that predicts given weather if someone will play golf? From Bill Howe’s Coursera Class: Introduction to Data Science

Three Components of ML Algorithms Representation Language for the classifier What the input looks like Evaluation (scoring function) How to tell good models from bad ones Optimization How to search among the possible models to find the highest-scoring one P. Domingos, “A Few Useful Things to Know About Machine Learning”, CACM Oct 2012.

Some Examples of Them P. Domingos, “A Few Useful Things to Know About Machine Learning”, CACM Oct 2012.

Terminology Supervised Learning Unsupervised Learning Given examples of inputs and outputs (i.e., labeled data) Learn the relationship between them Unsupervised Learning Inputs but no outputs (unlabeled data) Learn the latent labels e.g., clustering, dimension reduction You get to do both in HW 2 (see rest of today’s reading ch for K-means)

Supervised Learning Cat Dog ???

Generalization is the Goal Pick a subset of your data as the training set. Train your model on that Then, test it using the held back data (i.e., the test set) Most important rule: Don’t Test on your Training Data (easy to predict the ones you’ve already seen!)

Overfitting Low error on training data, but High error on test data Example: Your classifier is 100% accurate on training data but only 50% accurate on test data, when it could have been 75% accurate on each

Cross-Validation Holding back data reduces amount of data available for training. Alternative: Randomly divide training data into multiple subsets hold out each one while training on the rest average the results for evaluation

Feature Engineering Constructing features of the raw data on which to learn After gathering, integrating and cleaning, this is the next step when do we get to run our learning algorithm? Often domain-specific Requires trial and error

Unsupervised Learning “Deep Learning” from Google’s Brain project

Plan for Rest of This Evening We’ll focus on Supervised Learning In particular, Shivaram will cover Linear Regression in some detail Followed by an R-based Lab Finally – some HW2 programming tips based on what we saw in HW1 (Dan) Announcement: Midterm: Thursday April 17 6pm; Kroeber rm 160