Using Classification Trees to Decide News Popularity

Slides:

Advertisements

Similar presentations

DECISION TREES. Decision trees  One possible representation for hypotheses.

Advertisements

Random Forest Predrag Radenković 3237/10

CHAPTER 9: Decision Trees

RIPPER Fast Effective Rule Induction

A Statistician’s Games * : Bootstrap, Bagging and Boosting * Please refer to “Game theory, on-line prediction and boosting” by Y. Freund and R. Schapire,

Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.

Indian Statistical Institute Kolkata

A Quick Overview By Munir Winkel. What do you know about: 1) decision trees 2) random forests? How could they be used?

Decision Tree Rong Jin. Determine Milage Per Gallon.

Sparse vs. Ensemble Approaches to Supervised Learning

1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

A Brief Introduction to Adaboost

Ensemble Learning: An Introduction

Three kinds of learning

ICS 273A Intro Machine Learning

Machine Learning: Ensemble Methods

Sparse vs. Ensemble Approaches to Supervised Learning

Ensemble Learning (2), Tree and Forest

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Wei FENG Geo-Resources and Environment Lab, Bordeaux INP (Bordeaux Institute of Technology), France Supervisor:

M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,

Machine Learning CS 165B Spring 2012

Fall 2004 TDIDT Learning CS478 - Machine Learning.

by B. Zadrozny and C. Elkan

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University

Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)

Data Mining - Volinsky Columbia University 1 Topic 10 - Ensemble Methods.

Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.

Scaling up Decision Trees. Decision tree learning.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

CLASSIFICATION: Ensemble Methods

Multivariate Dyadic Regression Trees for Sparse Learning Problems Xi Chen Machine Learning Department Carnegie Mellon University (joint work with Han Liu)

Ensemble with Neighbor Rules Voting Itt Romneeyangkurn, Sukree Sinthupinyo Faculty of Computer Science Thammasat University.

1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.

XINYUE LIU Can We Determine Whether an is SPAM ?

Ensemble Methods in Machine Learning

CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages

Konstantina Christakopoulou Liang Zeng Group G21

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Classification Ensemble Methods 1

Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms

Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.

Xiangnan Kong,Philip S. Yu An Ensemble-based Approach to Fast Classification of Multi-label Data Streams Dept. of Computer Science University of Illinois.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Decision/Classification Trees Readings: Murphy ; Hastie 9.2.

Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.

Ensemble Methods for Machine Learning. COMBINING CLASSIFIERS: ENSEMBLE APPROACHES.

Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.

Worcester Polytechnic Institute CS548 Spring 2016 Decision Trees Showcase By Yi Jiang and Brandon Boos ---- Showcase work by Zhun Yu, Fariborz Haghighat,

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.

Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.

1 Machine Learning: Ensemble Methods. 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training data or different.

Decision tree and random forest

Ensemble Classifiers.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Machine Learning: Ensemble Methods

Week 2 Presentation: Project 3

Introduction to Machine Learning and Tree Based Methods

Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.

Supervised Learning Seminar Social Media Mining University UC3M

Issues in Decision-Tree Learning Avoiding overfitting through pruning

© 2013 ExcelR Solutions. All Rights Reserved Data Mining - Supervised Decision Tree & Random Forest.

CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,

Decision Trees By Cole Daily CSCI 446.

Ensemble learning Reminder - Bagging of Trees Random Forest

Classification with CART

… 1 2 n A B V W C X 1 2 … n A … V … W … C … A X feature 1 feature 2

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

Using Classification Trees to Decide News Popularity Xinyue Liu

The News Popularity Data Set Goal: compare the performance of the two algorithms on the same dataset. Source and Origin Goal Instances and Attributes Examples

Data Description Significant Variables Selection

Decision Tree Iteratively split variables into groups Evaluate “homogeneity” within each group Split again if necessary Weekdays Weekends

Decision Tree Pros: Cons: - Easy to Interpret - Better performance in nonlinear settings Cons: - Without pruning can lead to overfitting - Uncertainty

Random Forest Bootstrap: random sampling with replacement Bootstrap samples and variables Grow multiple trees and average

Random Forest Pros: Pros: Cons: - Accuracy - Speed - Hard to interpret - Overfitting Pros:

Solution Accuracy: 0.5056 Sensitivity : 0.4552 Specificity : 0.5273 Decision Tree Random Forest Accuracy: 0.5056 Sensitivity : 0.4552 Specificity : 0.5273 Accuracy: 0.5068 Sensitivity : 0.4628 Specificity : 0.5339

Conclusion Both algorithms do not perform well on this dataset. The day the news published, the topic, and the number of images decide the popularity of the online news. Resort to other machine learning algorithms. Weekdays Weekends

References Trevor Hastie, Rob Tibshirani. “Statistical Learning.” Statistical Learning. Stanford University Online CourseWare, 21 August 2015. Lecture. http://online.stanford.edu/course/statistical-learning-winter-2014 Andrew Liu. “Practical Machine Learning.” Practical Machine Learning. Coursera, 21 August 2015. Lecture.

Thank you!