Using Classification Trees to Decide News Popularity

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Random Forest Predrag Radenković 3237/10
CHAPTER 9: Decision Trees
RIPPER Fast Effective Rule Induction
A Statistician’s Games * : Bootstrap, Bagging and Boosting * Please refer to “Game theory, on-line prediction and boosting” by Y. Freund and R. Schapire,
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Indian Statistical Institute Kolkata
A Quick Overview By Munir Winkel. What do you know about: 1) decision trees 2) random forests? How could they be used?
Decision Tree Rong Jin. Determine Milage Per Gallon.
Sparse vs. Ensemble Approaches to Supervised Learning
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
A Brief Introduction to Adaboost
Ensemble Learning: An Introduction
Three kinds of learning
ICS 273A Intro Machine Learning
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning (2), Tree and Forest
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Wei FENG Geo-Resources and Environment Lab, Bordeaux INP (Bordeaux Institute of Technology), France Supervisor:
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,
Machine Learning CS 165B Spring 2012
Fall 2004 TDIDT Learning CS478 - Machine Learning.
by B. Zadrozny and C. Elkan
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Data Mining - Volinsky Columbia University 1 Topic 10 - Ensemble Methods.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Scaling up Decision Trees. Decision tree learning.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
CLASSIFICATION: Ensemble Methods
Multivariate Dyadic Regression Trees for Sparse Learning Problems Xi Chen Machine Learning Department Carnegie Mellon University (joint work with Han Liu)
Ensemble with Neighbor Rules Voting Itt Romneeyangkurn, Sukree Sinthupinyo Faculty of Computer Science Thammasat University.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
XINYUE LIU Can We Determine Whether an is SPAM ?
Ensemble Methods in Machine Learning
CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages
Konstantina Christakopoulou Liang Zeng Group G21
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
Xiangnan Kong,Philip S. Yu An Ensemble-based Approach to Fast Classification of Multi-label Data Streams Dept. of Computer Science University of Illinois.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Decision/Classification Trees Readings: Murphy ; Hastie 9.2.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Ensemble Methods for Machine Learning. COMBINING CLASSIFIERS: ENSEMBLE APPROACHES.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Worcester Polytechnic Institute CS548 Spring 2016 Decision Trees Showcase By Yi Jiang and Brandon Boos ---- Showcase work by Zhun Yu, Fariborz Haghighat,
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
1 Machine Learning: Ensemble Methods. 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training data or different.
Decision tree and random forest
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Week 2 Presentation: Project 3
Introduction to Machine Learning and Tree Based Methods
Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
Supervised Learning Seminar Social Media Mining University UC3M
Issues in Decision-Tree Learning Avoiding overfitting through pruning
© 2013 ExcelR Solutions. All Rights Reserved Data Mining - Supervised Decision Tree & Random Forest.
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
Decision Trees By Cole Daily CSCI 446.
Ensemble learning Reminder - Bagging of Trees Random Forest
Classification with CART
… 1 2 n A B V W C X 1 2 … n A … V … W … C … A X feature 1 feature 2
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Using Classification Trees to Decide News Popularity Xinyue Liu

The News Popularity Data Set Goal: compare the performance of the two algorithms on the same dataset. Source and Origin Goal Instances and Attributes Examples

Data Description Significant Variables Selection

Decision Tree Iteratively split variables into groups Evaluate “homogeneity” within each group Split again if necessary Weekdays Weekends

Decision Tree Pros: Cons: - Easy to Interpret - Better performance in nonlinear settings Cons: - Without pruning can lead to overfitting - Uncertainty

Random Forest Bootstrap: random sampling with replacement Bootstrap samples and variables Grow multiple trees and average

Random Forest Pros: Pros: Cons: - Accuracy - Speed - Hard to interpret - Overfitting Pros:

Solution Accuracy: 0.5056 Sensitivity : 0.4552 Specificity : 0.5273 Decision Tree Random Forest Accuracy: 0.5056 Sensitivity : 0.4552 Specificity : 0.5273 Accuracy: 0.5068 Sensitivity : 0.4628 Specificity : 0.5339

Conclusion Both algorithms do not perform well on this dataset. The day the news published, the topic, and the number of images decide the popularity of the online news. Resort to other machine learning algorithms. Weekdays Weekends

References Trevor Hastie, Rob Tibshirani. “Statistical Learning.” Statistical Learning. Stanford University Online CourseWare, 21 August 2015. Lecture. http://online.stanford.edu/course/statistical-learning-winter-2014 Andrew Liu. “Practical Machine Learning.” Practical Machine Learning. Coursera, 21 August 2015. Lecture.

Thank you!