Introduction to Data Science Lecture 7 Machine Learning Overview

Slides:

Advertisements

Similar presentations

Machine Learning: Intro and Supervised Classification

Advertisements

CHAPTER 9: Decision Trees

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Indian Statistical Institute Kolkata

ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.

Part I: Classification and Bayesian Learning

5. Machine Learning ENEE 759D | ENEE 459D | CMSC 858Z

Introduction to machine learning

Overview DM for Business Intelligence.

Short Introduction to Machine Learning Instructor: Rada Mihalcea.

INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.

Machine Learning CSE 681 CH2 - Supervised Learning.

Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.

Learning from Observations Chapter 18 Through

Instructor: Pedro Domingos

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Data Mining and Decision Support

CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct

LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.

6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.

DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.

Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

LECTURE 20: SUPPORT VECTOR MACHINES PT. 1 April 11, 2016 SDS 293 Machine Learning.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Big data classification using neural network

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Who am I? Work in Probabilistic Machine Learning Like to teach 

Information Organization: Overview

Machine Learning Clustering: K-means Supervised Learning

Instructor: Pedro Domingos

Deep Learning Amin Sobhani.

Lecture 3: Linear Regression (with One Variable)

CSE 4705 Artificial Intelligence

Perceptrons Lirong Xia.

Machine Learning I & II.

The Elements of Statistical Learning

Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.

Supervised Learning Seminar Social Media Mining University UC3M

CSEP 546 Data Mining Machine Learning

Special Topics in Data Mining Applications Focus on: Text Mining

Machine Learning & Data Science

What is Pattern Recognition?

Machine Learning Week 1.

Intro to Machine Learning

Prepared by: Mahmoud Rafeek Al-Farra

CSEP 546 Data Mining Machine Learning

CS Fall 2016 (Shavlik©), Lecture 2

Overview of Machine Learning

Lecture 6: Introduction to Machine Learning

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Intro to Machine Learning

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Data Transformations targeted at minimizing experimental variance

Information Retrieval

Machine Learning Algorithms – An Overview

Junheng, Shengming, Yunsheng 11/09/2018

Basics of ML Rohan Suri.

COSC 4335: Part2: Other Classification Techniques

CS639: Data Management for Data Science

Machine Learning – a Probabilistic Perspective

Information Organization: Overview

Shih-Yang Su Virginia Tech

Lecture 16. Classification (II): Practical Considerations

Perceptrons Lirong Xia.

Machine Learning in Business John C. Hull

CS249: Neural Language Model

An introduction to Machine Learning (ML)

ECE – Pattern Recognition Midterm Review

Presentation transcript:

Introduction to Data Science Lecture 7 Machine Learning Overview CS 194 Spring 2014 Michael Franklin Dan Bruckner, Evan Sparks, Shivaram Venkataraman

What is it? “Machine learning systems automatically learn programs from data” P. Domingos, CACM 10/12

Some Examples Classification Regression Learned attribute is categorical (spam vs. ham) Input: vectors of “feature values” (disc. or cont.) Output: a single discrete value (a.k.a. the “class”) Regression Learned attribute is numeric Fiting a curve to data can then use that curve to predict outcomes

Classification Example Learn a function that predicts given weather if someone will play golf? From Bill Howe’s Coursera Class: Introduction to Data Science

Three Components of ML Algorithms Representation Language for the classifier What the input looks like Evaluation (scoring function) How to tell good models from bad ones Optimization How to search among the possible models to find the highest-scoring one P. Domingos, “A Few Useful Things to Know About Machine Learning”, CACM Oct 2012.

Some Examples of Them P. Domingos, “A Few Useful Things to Know About Machine Learning”, CACM Oct 2012.

Terminology Supervised Learning Unsupervised Learning Given examples of inputs and outputs (i.e., labeled data) Learn the relationship between them Unsupervised Learning Inputs but no outputs (unlabeled data) Learn the latent labels e.g., clustering, dimension reduction You get to do both in HW 2 (see rest of today’s reading ch for K-means)

Supervised Learning Cat Dog ???

Generalization is the Goal Pick a subset of your data as the training set. Train your model on that Then, test it using the held back data (i.e., the test set) Most important rule: Don’t Test on your Training Data (easy to predict the ones you’ve already seen!)

Overfitting Low error on training data, but High error on test data Example: Your classifier is 100% accurate on training data but only 50% accurate on test data, when it could have been 75% accurate on each

Cross-Validation Holding back data reduces amount of data available for training. Alternative: Randomly divide training data into multiple subsets hold out each one while training on the rest average the results for evaluation

Feature Engineering Constructing features of the raw data on which to learn After gathering, integrating and cleaning, this is the next step when do we get to run our learning algorithm? Often domain-specific Requires trial and error

Unsupervised Learning “Deep Learning” from Google’s Brain project

Plan for Rest of This Evening We’ll focus on Supervised Learning In particular, Shivaram will cover Linear Regression in some detail Followed by an R-based Lab Finally – some HW2 programming tips based on what we saw in HW1 (Dan) Announcement: Midterm: Thursday April 17 6pm; Kroeber rm 160