The application of rough sets analysis in activity-based modelling. Opportunities and constraints Speaker: Yanan Yean.

Slides:



Advertisements
Similar presentations
The Robert Gordon University School of Engineering Dr. Mohamed Amish
Advertisements

Heuristic Search techniques
H EURISTIC S OLVER  Builds and tests alternative fuel treatment schedules (solutions) at each iteration  In each iteration:  Evaluates the effects of.
_ Rough Sets. Basic Concepts of Rough Sets _ Information/Decision Systems (Tables) _ Indiscernibility _ Set Approximation _ Reducts and Core _ Rough Membership.
Continuous Value Enhancement Process
Introduction to Research Methodology
Quantitative vs. Qualitative Research Method Issues Marian Ford Erin Gonzales November 2, 2010.
Huge Raw Data Cleaning Data Condensation Dimensionality Reduction Data Wrapping/ Description Machine Learning Classification Clustering Rule Generation.
Rough Sets Theory Speaker:Kun Hsiang.
Perception and Individual Decision Making
Introduction to Data Mining with XLMiner
Genetic Algorithms as a Tool for General Optimization Angel Kuri 2001.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Decision Tree Algorithm
Data classification based on tolerant rough set reporter: yanan yean.
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
1 Chapter 4 Decision Support and Artificial Intelligence Brainpower for Your Business.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
The Analysis of Variance
The Research Process. Purposes of Research  Exploration gaining some familiarity with a topic, discovering some of its main dimensions, and possibly.
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Introduction to Directed Data Mining: Decision Trees
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining Techniques
VIRTUAL BUSINESS RETAILING
By Saparila Worokinasih
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Chapter 1: Introduction to Statistics
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
On Applications of Rough Sets theory to Knowledge Discovery Frida Coaquira UNIVERSITY OF PUERTO RICO MAYAGÜEZ CAMPUS
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
by B. Zadrozny and C. Elkan
Presented by Tienwei Tsai July, 2005
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Some Key Facts About Optimal Solutions (Section 14.1) 14.2–14.16
Chapter 9 – Classification and Regression Trees
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Linear Programming An Example. Problem The dairy "Fior di Latte" produces two types of cheese: cheese A and B. The dairy company must decide how many.
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Chapter 4 MODELING AND ANALYSIS. Model component Data component provides input data User interface displays solution It is the model component of a DSS.
3. Rough set extensions  In the rough set literature, several extensions have been developed that attempt to handle better the uncertainty present in.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
MBA7020_01.ppt/June 13, 2005/Page 1 Georgia State University - Confidential MBA 7020 Business Analysis Foundations Introduction - Why Business Analysis.
FORS 8450 Advanced Forest Planning Lecture 11 Tabu Search.
FORS 8450 Advanced Forest Planning Lecture 5 Relatively Straightforward Stochastic Approach.
O PTIMAL SERVICE TASK PARTITION AND DISTRIBUTION IN GRID SYSTEM WITH STAR TOPOLOGY G REGORY L EVITIN, Y UAN -S HUN D AI Adviser: Frank, Yeong-Sung Lin.
McGraw-Hill/Irwin © The McGraw-Hill Companies, Inc., Table of Contents CD Chapter 14 (Solution Concepts for Linear Programming) Some Key Facts.
FORS 8450 Advanced Forest Planning Lecture 6 Threshold Accepting.
Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Feature Selection and Weighting using Genetic Algorithm for Off-line Character Recognition Systems Faten Hussein Presented by The University of British.
Improving Support Vector Machine through Parameter Optimized Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo , China { brj,
Data Mining and Decision Support
Automatic Categorization of Query Results Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang Sushruth Puttaswamy.
What is Research?. Intro.  Research- “Any honest attempt to study a problem systematically or to add to man’s knowledge of a problem may be regarded.
Urban Planning Group Implementation of a Model of Dynamic Activity- Travel Rescheduling Decisions: An Agent-Based Micro-Simulation Framework Theo Arentze,
ILUTE A Tour-Based Mode Choice Model Incorporating Inter-Personal Interactions Within the Household Matthew J. Roorda Eric J. Miller UNIVERSITY OF TORONTO.
Developing Smart objectives and literature review Zia-Ul-Ain Sabiha.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Introduction to genetic algorithm
DATA COLLECTION METHODS IN NURSING RESEARCH
Presented by Khawar Shakeel
Rule Induction for Classification Using
Rough Sets.
A Modified Naïve Possibilistic Classifier for Numerical Data
Quantitative vs. Qualitative Research Method Issues
Chapter 12 Analyzing Semistructured Decision Support Systems
Presentation transcript:

The application of rough sets analysis in activity-based modelling. Opportunities and constraints Speaker: Yanan Yean

OUTLINE 1.Introduction 2.Activity-based modelling 3.Rough sets 4.Data 5.Application of rough sets in the SAMBA- project –Case study 1 –Case study 2 6.Conclusions and further challenges

1.Introduction Ⅰ Knowledge on travel behavior increases continuously, as researchers constantly make improvements to obtain more accurate and realistic models. If databases grow too large human inspection and interpretation are not feasible any more, resulting in a gap between data generation and data understandinag. A vast multitude of methods that ‘learn’ from examples, and that can be used to extract patterns from data for classification. e.x. rough sets The rough sets technique is a mathematical tool to search large, complex databases for meaningful decision rules.

1.Introduction Ⅱ The aim is to explore how travel data of a Belgian travel behavior survey can be analyzed using the rough sets analysis. The basic concepts of the rough sets technique Explore the possibilities of using rough sets by conducting two case studies in which we assess the performance of the approach in pattern generation, classification and choice prediction.

2.Activity-based modelling Ⅰ the most popular and advanced approach in passenger transport modelling is the activity-based modelling approach. It aims at forecasting which activities are done, where at what time, with whom, for how long and with which type of transport mode. The application of such models is characterized by many problems. A modelling approach that avoids these problems is qualitative modelling. (IF,THEN…ELSE )

2.Activity-based modelling Ⅱ Another widely used technique within AI is rough sets, rough sets have now already been successfully applied in a wide variety of research fields.( medicine, tourism travel demand, geography) Unlike many other DM techniques, the obtained results are expressed in a more or less natural language, which make the results easer to interpret.

3.Rough sets-some basic concepts Ⅰ Indiscernibility –Indiscernibility is related to similarity –Sets of objects will probably not be determined unambiguously, hence, objects will have to be described roughly through a pair of sets: i.e. a lower and a upper approximation. –An important advantage of the rough set approach is that it can deal with a set of inconsistent examples, i.e. objects indiscernible by condition attributes but discernible by decision attributes.

3.Rough sets-some basic concepts Ⅱ Reduct and core –In large data sets some attributes may be redundant, and thus can be eliminated without losing essential classificatory information. –The reduct is the minimal subset still providing the same object classification as with the full set of attributes. –The intersection of all reducts is called the core. –The core is the class of all indispensable attributes. Decision rule –As a DM technique, one of the most important reasons for applying rough sets is the generation of decision rules.

3.Rough sets-modelling process Ⅰ Usually, the rough sets modelling process can be divided in five main stages. –Data selection –Pre-processing and transformation The selected data set can be split in a training set and a test set in order to enable in the final step an assessment of the decision rules in the output. –Creation of reducts –Rule generation If gender (female) and age (35-45) and purpose (shopping) then mode (car) or mode (bike)

3.Rough sets-modelling process Ⅱ –Evaluation The overall performance can be evaluated by testing how well the generated decision rules could classify objects. In this paper we will only make use of the standard voting and the Naïve Bayes procedure.

4.data The applied data are part of a broader research project called Spatial Analysis and Modelling Based on Activities (SAMBA), funded by the Belgian Federal Govement. The final aim is to build an origin-destination matrix, which allows deducing travel demand in the Belgian spatial context.

5.Application of rough sets in the SAMBA-project Ⅰ the aim is to find out how the rough sets techniques perform with SAMBA-data as input. In the first case study we will try to find pattern on spatial preferences In the second case study the aim is retrieving patterns in transport mode choice.

5.Application of rough sets in the SAMBA-project Ⅱ 條件變數 決策變數

5.Application of rough sets in the SAMBA-project Ⅲ Case study 1 The reducts were calculated based on a genetic algorithm and on a Johnson`s algorithm. The GA is a heuristic for function optimization and promotes ‘survival of the fittest’, it may find more than one reduct The Johnson`s algorithm has a natural bias towards finding a single prime implicant of minimal length. Based on the reduct of nine variables over 4000 rules were generated.

5.Application of rough sets in the SAMBA-project Ⅳ This is a rather large number since we stared with only 8500 objects. This means that most rules are supported by just one or two objects. e.x. x1(3)  x5(4)

5.Application of rough sets in the SAMBA-project Ⅴ However, the amount of rules is still too high for direct human interpretation. In fact, some additional treatment will be necessary in order to understand the relation between destination choice and the conditional variables.

5.Application of rough sets in the SAMBA-project Ⅰ Case study 2 In the second case study we will try to find rules Describing the transportation mode choice.

5.Application of rough sets in the SAMBA-project Ⅱ People, with the same background, could choose very different options and even the same person facing the same choice options will not always choose the same transportation mode. One way to surmount this problem is reducing the value set of the attributes. But then the classification performance could be lower, because less detail information in the condition attributes could lead to more approximate rules and indecisiveness.

5.Application of rough sets in the SAMBA-project Ⅲ daily;sometimes;never =2km

5.Application of rough sets in the SAMBA-project Ⅲ Although the number of variable classes was reduced, the results are quite similar: the reduct is now both for uncompleted and completed table equal to 16. The number of rule is a little lower: 7669(8077) without completion and 3535(3740) after conducting the completion task The classification performance is equal to the first results. So, based on less detailed information, the same classification performance was reached by the generated rules.

5.Application of rough sets in the SAMBA-project Ⅳ

Base on this new table, the reduct is still equal to six, but the amount of rules is clearly reduced:1408(7669) rules without completion task and 1212(3535) rules with completion task, while the classification performance is still around However, over 1000 rules is not yet interpretable. 5.Application of rough sets in the SAMBA-project Ⅳ

5.Application of rough sets in the SAMBA-project Ⅴ With the ROSETTA-software, it is possible to filter the rules based on several criteria. The table 5 displays the outcome of the filtering procedure based on minimal support. Thus, although the filtering improves human interpretation of the rule set, it biases the result by filtering away less supported, but not necessarily less important categories.

6.Conclusions and further challenges we found a solution for tempering the amount of rules of rules by filtering the set of rules based on rule support. –It also means loss in predictive ability A major challenge is finding a balance in producing clear, comprehensible rule sets, while still maintaining maximum detail for better pattern prediction.