CS548 Spring 2015 Showcase By Yang Liu, Viseth Sean, Azharuddin Priyotomo Showcasing work by Le, Abrahart, and Mount on "M5 Model Tree applied to modelling.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

CHAPTER 9: Decision Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Decision Tree Approach in Data Mining
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 1.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
2015/6/1Course Introduction1 Welcome! MSCIT 521: Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
A Study on Feature Selection for Toxicity Prediction*
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Basic Data Mining Techniques Chapter Decision Trees.
Induction of Decision Trees
Tree-based methods, neutral networks
Basic Data Mining Techniques
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
GIS Definition & Key Elements 1 Geographic Information System [GIS] Continuum E. Atlas Thematic Mapper--CAC GIS.
T T Population Variance Confidence Intervals Purpose Allows the analyst to analyze the population confidence interval for the variance.
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Introduction to Directed Data Mining: Decision Trees
Basic Data Mining Techniques
An Exercise in Machine Learning
CSCI 347 – Data Mining Lecture 01 – Course Overview.
Data Mining: Classification
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
CLassification TESTING Testing classifier accuracy
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Inductive learning Simplest form: learn a function from examples
Slides for “Data Mining” by I. H. Witten and E. Frank.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm David Ramos CS 157B, Section 1 May 4, 2006.
K Nearest Neighbors Classifier & Decision Trees
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef StefanNovember 2005.
Learning with Decision Trees Artificial Intelligence CMSC February 20, 2003.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
CS690L Data Mining: Classification
ID3 Algorithm Michael Crawford.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
W E K A Waikato Environment for Knowledge Aquisition.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Worcester Polytechnic Institute CS548 Spring 2016 Decision Trees Showcase By Yi Jiang and Brandon Boos ---- Showcase work by Zhun Yu, Fariborz Haghighat,
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:
WEKA: A Practical Machine Learning Tool WEKA : A Practical Machine Learning Tool.
Data Mining is the process of analyzing data and summarizing it into useful information Data Mining is usually used for extremely large sets of data It.
Geographical Data Mining Thales Sehn Korting
Geographic Information System [GIS]
Data Science Algorithms: The Basic Methods
Akbar Akbari Esfahani1, Theodor Asch2
Stock Market Prediction
Data Mining: Concepts and Techniques Course Outline
CS548 Fall 2018 Model and Regression Trees
Course Lab Introduction to IBM Watson Analytics
Geographic Information System [GIS]
Geographic Information System [GIS]
Decision Tree (Rule Induction)
A task of induction to find patterns
Neural Networks Weka Lab
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

CS548 Spring 2015 Showcase By Yang Liu, Viseth Sean, Azharuddin Priyotomo Showcasing work by Le, Abrahart, and Mount on "M5 Model Tree applied to modelling town centre area activities for the city of Nottingham" bbc.co.uk

References [WFH 2011] Ian H. Witten, Eibe Frank, Mark A. Hall (2011). Data Mining: Practical Machine Learning Tools and Techniques 3rd Edition (pp , ). Burlington, MA: Morgan Kaufmann. [LAM 2007] T. K. T. Le, R. J. Abrahart, N. J. Mount (2007). M5 Model Tree applied to modelling town centre area activities for the city of Nottingham. Proceedings of the 9th International Conference on GeoComputation National Centre for Geocomputation, National University of Ireland, Maynooth, September 2007 [WW 1997] Y. Wang, I. H. Witten. Induction of model trees for predicting continuous classes. In Proc European Conference on Machine Learning Poster Papers, pages , Prague, Czech Republic, [Quin 1992] Ross J. Quinlan. Learning with Continuous Classes. In 5th Australian Joint Conference on Artificial Intelligence, Singapore, pages , 1992.

Content ❖ Regression Tree & Model Tree ❖ Model Tree Induction Algorithm ❖ Real Application

Regression Tree & Model Tree Taken from [WFH 2011] Regression Tree: each leaf Model Tree: Linear Regression each leaf Taken from [WFH 2011]

Model Tree Induction (M5) ❖ Following ordinary decision tree induction algorithm to build an initial tree. ❖ Splitting criterion: Std Dev instead of entropy; but, based on the same rationale: The lower the Std Dev, the shallower the subtree and the shorter the tree/rule. ❖ Pruning algorithm stays the same except replacing a sub- tree by a regression plane instead of a constant. ❖ Smoothing: remove any sharp discontinuities that exist between neighboring leaves of the pruned tree.

Real Application ❖ Analyzing patterns of city activities using spatial data ❖ Spatial data is usually stored as coordinates and topology, and is data that can be mapped. Spatial data is often accessed, manipulated or analyzed through Geographic Information Systems (GIS).GIS ❖ Main Attributes: TCPIs

❖ Town Center Performance Indicator ❖ Indicators used for defining vital activities in a town center ❖ Publicly agreed over a set of 8 TCPIs for this application TCPI

Town Center Performance Indicators GIS input layers: 8 considered TCPIs for Nottingham’s town centre Taken from [LAM 2007]

❖ The different perceptions of the significance of each TCPI and their relative importance ❖ How to choose representative sample ❖ How many linear models in the tree The Main Problems

Spatial Data Collection (Cool Stuff) Change spray size Wipe the map Add New Area When done click on “send” button User sprays on map Write in comments

Model Tree Creation ❖ Data instances: 4250 instances as training set, generated by random sampling ❖ Attributes: 8 TCPIs (Leisure, car park, commerce, public, pedestrian, industry, population, education) ❖ Splitting input space of the training set (town center area activities) into sub spaces (sub-areas) ❖ Building a linear regression model (at the leaves) for each sub-space

An Example of M5 Algorithm Splitting the input space of the training set[X1, X2] using M5 algorithm Each model is a linear regression model Y = a0 + a1X1 + a2X2 Taken from [LAM 2007]

The Model Tree Tree model results from 4250 instances for eight TCPIs Associated indicators ❖ Commerce ❖ Pedestrian ❖ Leisure ❖ Car_park ❖ Public_building Less associated indicators ❖ Population ❖ Industry ❖ Education Taken from [LAM 2007]

Why choose 14 linear models? Taken from [LAM 2007]

A single overall public mental town centre map(web-based GIS survey) Nottingham Mental Map ●Target output of the model ●The darker the red color is, the more confident those areas belong to town center area activities. Taken from [LAM 2007]

Result In Maps 14th linear model 13th linear model 12th linear model High dense in commercial and pedestrian flow High dense in commercial Less dense in commercial & High dense in leisure and pedestrian flow Taken from [LAM 2007]

Result In Maps 3rd linear model 2nd linear model Less dense in commercial & High in Residential use High dense in Industry Taken from [LAM 2007]

Pros of this Model Tree ❖ Tells the story of how significant each indicator (attribute) is for prediction ❖ Tells to which degree each indicator explains the output (town center area activities) ❖ Is particularly useful for natural temporal and complex characteristics of urban city

Thank You!!!