Download presentation
Presentation is loading. Please wait.
Published byRoss Malone Modified over 6 years ago
1
Supervised Learning Seminar Social Media Mining University UC3M
Date May 2017 Lecturer Carlos Castillo Sources: CS583 slides by Bing Liu (2017) Supervised learning course by Mark Herbster (2014) based on: T. Hastie, R. Tibshirani and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2002.
2
What is “learning” in this context?
Computing a functional relationship between an input (x) and an output (y) E.g.: xi = , yi = { spam, not spam } xi = tweet, yi = { health-related, not health-related} xi = hand written digit, yi = { 0, 1, 2, …, 9 } xi = news, yi = [ 0: meaningless, …, 1: important ] Vector x is usually high dimensional
3
Example: photo of people or not?
4
Example (cont.) Many problems: doing this efficiently, generalizing well, ...
5
Formally Goal: given training data: … infer a function such that...
… and apply it to future data: Binary classification: Regression:
6
Supervised learning algorithms use a training data set
Example supervised learning algorithms: Linear regression / logistic regression Decision trees / decision forests Neural networks
7
What do we want? Collect appropriate training data
This requires some assumptions (e.g., uniform random sample) Represent inputs appropriately Good feature construction and selection Learn efficiently E.g., linearly on the number of training elements “Test” efficiently I.e., operate efficiently at running time
8
Key goal: generalize Borges’ “Funes el Memorioso” (1942)
“Not only was it difficult for him to see that the generic symbol 'dog' took in all the dissimilar individuals of all shapes and sizes, it irritated him that the 'dog' of three-fourteen in the afternoon, seen in profile, should be indicated by the same noun as the dog at three-fifteen, seen frontally" To generalize is to forget differences and focus on the important Simple models (using fewer features) are preferable in general
9
Overfitting and Underfitting
Underfitted models perform bad in the training set and in the testing set Overfitted models perform great in the training set but very poorly in the testing set Source:
10
Finding a sweet spot: prototypical error curves
Inference error = “training error” Estimation error = “testing error”
11
Example: k-NN classifier
Suppose we’re classifying social media postings as “health-related” (green) vs “not health-related” (red) All messages in the training set can be “pivots” For a new, unlabeled, unseen message, pick the k pivots that are more similar to the it Do majority voting Green wins => message is about health Red wins => message is not about health
12
How large should k be? How to decide?
13
Overfitting with k-NN
14
Decision trees Discriminative model based on per-feature decisions; each internal node is a decision Source:
15
Example (loan application) Class: yes = credit, no = no-credit
16
Example (loan application) Class: yes = credit, no = no-credit
BEFORE READING THE NEXT SLIDES ... Build manually a decision tree for this table. Try to have few internal nodes. You can start in any column (not necessarily the first one).
17
Example (loan application) Class: yes = credit, no = no-credit
BEFORE READING THE NEXT SLIDES ... Build manually a decision tree for this table. Try to have few internal nodes. You can start in any column (not necessarily the first one).
18
Example (loan application) Class: yes = credit, no = no-credit
19
Simplest decision tree: majority class
START Yes Accuracy? (Accuracy = correct / total)
20
Example tree
21
Is this decision tree unique?
No. Here is a simpler tree. We want a small tree and an accurate tree. It’s easy to understand and performs better. Finding the optimal tree is NP-hard, we need to use heuristics
22
Basic algorithm (greedy divide-and-conquer)
Assume attributes are categorical for now (continuous attributes can be handled too) Tree is constructed in a top-down recursive manner At start, all the training examples are at the root Examples are partitioned recursively based on selected attributes Attributes are selected on the basis of an impurity function (e.g., information gain) Example conditions for stopping partitioning All examples for a given node belong to the same class Largest leaf node has min_leaf_size elements or less
23
Example: information gain
Source (this and following slides):
24
Entropy Entropy:
25
Expected entropy of splitting attribute X
26
Information gain See also:
27
Information gain See also:
28
Building decision tree recursively on every sub-dataset
29
There is a lot more in supervised learning!
They all require labeled input data (“supervision”) Main practical difficulties: Good labeled data can be expensive to get Efficiency requires careful algorithmic design Typical problems Sensitivity to incorrectly labeled instances Slow convergence and no guarantee of global optimality We may want to update a model (online learning) We may want to know what to label (active learning) Overfitting
30
Some state-of-the-art methods
Text classification: Random forests Image classification: Neural networks
31
Important element: explainability Example: Husky vs Wolf classifier
Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why Should I Trust You?: Explaining the Predictions of Any Classifier." In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp ACM, 2016.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.