Supervised Learning Seminar Social Media Mining University UC3M

Slides:

Advertisements

Similar presentations

On the Optimality of Probability Estimation by Random Decision Trees Wei Fan IBM T.J.Watson.

Advertisements

Is Random Model Better? -On its accuracy and efficiency-

Data Mining Lecture 9.

DECISION TREES. Decision trees  One possible representation for hypotheses.

CHAPTER 9: Decision Trees

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.

Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5

K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.

Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.

Indian Statistical Institute Kolkata

Classification Techniques: Decision Tree Learning

Decision Tree Rong Jin. Determine Milage Per Gallon.

Sparse vs. Ensemble Approaches to Supervised Learning

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

Ensemble Learning: An Introduction

Classification Continued

Lecture 5 (Classification with Decision Trees)

Three kinds of learning

Classification.

Sparse vs. Ensemble Approaches to Supervised Learning

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.

Chapter 9 – Classification and Regression Trees

Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.

Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.

For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Lecture Notes for Chapter 4 Introduction to Data Mining

Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.

1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Decision tree and random forest

Ensemble Classifiers.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Machine Learning Inductive Learning and Decision Trees

DECISION TREES An internal node represents a test on an attribute.

k-Nearest neighbors and decision tree

Learning with Perceptrons and Neural Networks

Artificial Intelligence

Trees, bagging, boosting, and stacking

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Chapter 6 Classification and Prediction

Data Science Algorithms: The Basic Methods

Introduction to Data Science Lecture 7 Machine Learning Overview

Estimating Link Signatures with Machine Learning Algorithms

ECE 5424: Introduction to Machine Learning

CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,

Introduction to Data Mining, 2nd Edition by

Classification and Prediction

Introduction to Data Mining, 2nd Edition by

Roberto Battiti, Mauro Brunato

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.

Introduction to Predictive Modeling

CSCI N317 Computation for Scientific Applications Unit Weka

Statistical Learning Dong Liu Dept. EEIS, USTC.

©Jiawei Han and Micheline Kamber

CS639: Data Management for Data Science

A task of induction to find patterns

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor

A task of induction to find patterns

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

STT : Intro. to Statistical Learning

Presentation transcript:

Supervised Learning Seminar Social Media Mining University UC3M Date May 2017 Lecturer Carlos Castillo http://chato.cl/ Sources: CS583 slides by Bing Liu (2017) Supervised learning course by Mark Herbster (2014) based on: T. Hastie, R. Tibshirani and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2002.

What is “learning” in this context? Computing a functional relationship between an input (x) and an output (y) E.g.: xi = e-mail, yi = { spam, not spam } xi = tweet, yi = { health-related, not health-related} xi = hand written digit, yi = { 0, 1, 2, …, 9 } xi = news, yi = [ 0: meaningless, …, 1: important ] Vector x is usually high dimensional

Example: photo of people or not?

Example (cont.) Many problems: doing this efficiently, generalizing well, ...

Formally Goal: given training data: … infer a function such that... … and apply it to future data: Binary classification: Regression:

Supervised learning algorithms use a training data set Example supervised learning algorithms: Linear regression / logistic regression Decision trees / decision forests Neural networks

What do we want? Collect appropriate training data This requires some assumptions (e.g., uniform random sample) Represent inputs appropriately Good feature construction and selection Learn efficiently E.g., linearly on the number of training elements “Test” efficiently I.e., operate efficiently at running time

Key goal: generalize Borges’ “Funes el Memorioso” (1942) “Not only was it difficult for him to see that the generic symbol 'dog' took in all the dissimilar individuals of all shapes and sizes, it irritated him that the 'dog' of three-fourteen in the afternoon, seen in profile, should be indicated by the same noun as the dog at three-fifteen, seen frontally" To generalize is to forget differences and focus on the important Simple models (using fewer features) are preferable in general

Overfitting and Underfitting Underfitted models perform bad in the training set and in the testing set Overfitted models perform great in the training set but very poorly in the testing set Source: http://pingax.com/regularization-implementation-r/

Finding a sweet spot: prototypical error curves Inference error = “training error” Estimation error = “testing error”

Example: k-NN classifier Suppose we’re classifying social media postings as “health-related” (green) vs “not health-related” (red) All messages in the training set can be “pivots” For a new, unlabeled, unseen message, pick the k pivots that are more similar to the it Do majority voting Green wins => message is about health Red wins => message is not about health

How large should k be? How to decide?

Overfitting with k-NN

Decision trees Discriminative model based on per-feature decisions; each internal node is a decision Source: http://blog.akanoo.com/tag/decision-tree-example/

Example (loan application) Class: yes = credit, no = no-credit

Example (loan application) Class: yes = credit, no = no-credit BEFORE READING THE NEXT SLIDES ... Build manually a decision tree for this table. Try to have few internal nodes. You can start in any column (not necessarily the first one).

Example (loan application) Class: yes = credit, no = no-credit BEFORE READING THE NEXT SLIDES ... Build manually a decision tree for this table. Try to have few internal nodes. You can start in any column (not necessarily the first one).

Example (loan application) Class: yes = credit, no = no-credit

Simplest decision tree: majority class START Yes Accuracy? (Accuracy = correct / total)

Example tree

Is this decision tree unique? No. Here is a simpler tree. We want a small tree and an accurate tree. It’s easy to understand and performs better. Finding the optimal tree is NP-hard, we need to use heuristics

Basic algorithm (greedy divide-and-conquer) Assume attributes are categorical for now (continuous attributes can be handled too) Tree is constructed in a top-down recursive manner At start, all the training examples are at the root Examples are partitioned recursively based on selected attributes Attributes are selected on the basis of an impurity function (e.g., information gain) Example conditions for stopping partitioning All examples for a given node belong to the same class Largest leaf node has min_leaf_size elements or less

Example: information gain Source (this and following slides): http://www.saedsayad.com/decision_tree.htm

Entropy Entropy:

Expected entropy of splitting attribute X

Information gain See also: http://www.math.unipd.it/~aiolli/corsi/0708/IR/Lez12.pdf

Information gain See also: http://www.math.unipd.it/~aiolli/corsi/0708/IR/Lez12.pdf

Building decision tree recursively on every sub-dataset

There is a lot more in supervised learning! They all require labeled input data (“supervision”) Main practical difficulties: Good labeled data can be expensive to get Efficiency requires careful algorithmic design Typical problems Sensitivity to incorrectly labeled instances Slow convergence and no guarantee of global optimality We may want to update a model (online learning) We may want to know what to label (active learning) Overfitting

Some state-of-the-art methods Text classification: Random forests Image classification: Neural networks

Important element: explainability Example: Husky vs Wolf classifier Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why Should I Trust You?: Explaining the Predictions of Any Classifier." In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135-1144. ACM, 2016.