Examining Activity Patterns Using Fuzzy Clustering by D De Silva, University of Calgary JD Hunt, University of Calgary PROCESSUS Second International Colloquium.

Slides:



Advertisements
Similar presentations
Cognitive Modelling – An exemplar-based context model Benjamin Moloney Student No:
Advertisements

Topic 12 – Further Topics in ANOVA
Clustering: Introduction Adriano Joaquim de O Cruz ©2002 NCE/UFRJ
Introduction to Statistics
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
K Means Clustering , Nearest Cluster and Gaussian Mixture
The Diversity of Samples from the Same Population Thought Questions 1.40% of large population disagree with new law. In parts a and b, think about role.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
INTERPRET MARKETING INFORMATION TO TEST HYPOTHESES AND/OR TO RESOLVE ISSUES. INDICATOR 3.05.
Jierui Xie, Boleslaw Szymanski, Mohammed J. Zaki Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180, USA {xiej2, szymansk,
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
Whole Genome Alignment using Multithreaded Parallel Implementation Hyma S Murthy CMSC 838 Presentation.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Sequence similarity.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Introduction to Statistics
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
FOCUS MODEL OVERVIEW Denver Regional Council of Governments June 24, 2011.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Clustering analysis workshop Clustering analysis workshop CITM, Lab 3 18, Oct 2014 Facilitator: Hosam Al-Samarraie, PhD.
Effects of Income Imputation on Traditional Poverty Estimates The views expressed here are the authors and do not represent the official positions.
PSYCHOLOGY 820 Chapters Introduction Variables, Measurement, Scales Frequency Distributions and Visual Displays of Data.
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
Chapter 2 Summarizing and Graphing Data
CAR-POOLING POTENTIAL IN SWITZERLAND F. Ciari May 2013.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
Presented by Tienwei Tsai July, 2005
Characteristics of Weekend Travel in the City of Calgary: Towards a Model of Weekend Travel Demand JD Hunt, University of Calgary DM Atkins, City of Calgary.
Chapter Eleven A Primer for Descriptive Statistics.
Stat 1510: Statistical Thinking and Concepts 1 Density Curves and Normal Distribution.
Chapter 1 The Role of Statistics. Three Reasons to Study Statistics 1.Being an informed “Information Consumer” Extract information from charts and graphs.
Ranking and Rating Data in Joint RP/SP Estimation by JD Hunt, University of Calgary M Zhong, University of Calgary PROCESSUS Second International Colloquium.
Pilot National Travel Survey 2009 Summary Findings Prepared by Mairead Griffin.
1 Multiple Classifier Based on Fuzzy C-Means for a Flower Image Retrieval Keita Fukuda, Tetsuya Takiguchi, Yasuo Ariki Graduate School of Engineering,
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
1 Chapter Two: Sampling Methods §know the reasons of sampling §use the table of random numbers §perform Simple Random, Systematic, Stratified, Cluster,
EFFECTS OF HOUSEHOLD LIFE CYCLE CHANGES ON TRAVEL BEHAVIOR EVIDENCE FROM MICHIGAN STATEWIDE HOUSEHOLD TRAVEL SURVEYS 13th TRB National Transportation Planning.
Introduction A Review of Descriptive Statistics. Charts When dealing with a larger set of data values, it may be clearer to summarize the data by presenting.
Copyright © All rights reserved to Student Insights. 1 Student-View ™ Report Level Two July, 2007 SAMPLE REPORT.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Design and Assessment of the Toronto Area Computerized Household Activity Scheduling Survey Sean T. Doherty, Erika Nemeth, Matthew Roorda, Eric J. Miller.
Sub-regional Workshop on Census Data Evaluation, Phnom Penh, Cambodia, November 2011 Evaluation of Age and Sex Distribution United Nations Statistics.
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Chapter 1 Statistics by Mohamed ELhusseiny
Estimation of a Weekend Location Choice Model for Calgary KJ Stefan, City of Calgary JDP McMillan, City of Calgary CR Blaschuk, City of Calgary JD Hunt,
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Erasmus University Rotterdam Patient choice when prices don’t matter What do time-elasticities tell about hospitals’ market power? Academy Health Annual.
DNA, RNA and protein are an alien language
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Multiple Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 13, 2004 ChengXiang Zhai Department of Computer Science University.
ILUTE A Tour-Based Mode Choice Model Incorporating Inter-Personal Interactions Within the Household Matthew J. Roorda Eric J. Miller UNIVERSITY OF TORONTO.
Elementary Statistics (Math 145) June 19, Statistics is the science of collecting, analyzing, interpreting, and presenting data. is the science.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Math 145 June 19, Outline 1. Recap 2. Sampling Designs 3. Graphical methods.
2 NURS/HSCI 597 NURSING RESEARCH & DATA ANALYSIS GEORGE MASON UNIVERSITY.
Public transport quality elements – What really matters for users? By Dimitrios Papaioannou and Luis Miguel Martinez Presentation for the 20 th ECOMM in.
INTRODUCTION AND DEFINITIONS
Tabulations and Statistics
Clustering.
Activity-Travel Trends
Data Transformations targeted at minimizing experimental variance
CHAPTER 1 Exploring Data
Clustering.
Essentials of Statistics 4th Edition
Presentation transcript:

Examining Activity Patterns Using Fuzzy Clustering by D De Silva, University of Calgary JD Hunt, University of Calgary PROCESSUS Second International Colloquium Toronto ON, Canada June 2005

Overview Introduction Data Method Preliminary Results Conclusions

Introduction Context Activity-based transport models increasing Need for grouping into segments At present seems largely based on received wisdom Motivations Opportunity in Calgary Large Household Activity Diary Survey Interest in Activity-based model development Willingness to explore issue of grouping Increase understanding of activity patterns resulting from behavioral processes

Introduction Previous work Fair amount of work drawing in essence on three basic elements Data interpretation Similarity or Dissimilarity Measures Pattern Recognition Algorithms

Introduction Previous work (Contd.) Data Interpretation Some used Time Slices in 5 to 15 minute intervals (Recker et al; Wilson) Others Disagreed with it and used number of stops made. (Pas) Similarity or Dissimilarity Measures Similarity Matrix (Pas;Wilson; Ma) Sequential Alignment Method (Wilson; Jun Ma) Walsh-Hadamand transformation, a Fourier Type Analysis, (Recker et al) Pattern Recognition Algorithms All have used Crisp Clustering Methods

Introduction Previous work (Contd.) Groups with similar activities Pas – 12 groups based on the number of non-home stops Recker – 7 Groups based on Socio Economic Data Wilson – 8 groups Similar to Recker Applications To Model Inter Shopping Duration (Bhat) Micro simulation of Activity Patterns (Kitamura et al; Kulkarni et al) Extension – the work described here Time Slices Sequential Alignment Method Fuzzy Clustering

Data Household Activity Survey (HAS) 24-hour diary Fall of 2001 Sample size 8,400 households overall 5,900 on weekdays 15-minute intervals activity location Activities in 19 categories Locations X,Y Home, Work, Travel, Other All household members

Activities Covered in HAS Travel (A) Pick Up Someone (B) Drop Off Someone (C) Work (D) School / Homework (E) Shopping (F) Daycare (G) Social (H) Eating (J) Entertainment / Leisure (K) Medical / Financial (L) Exercise (M) Religious / Civic (N) Sleeping (O) Household Chores (P) Park / Un-park Vehicle (X) Work-Travel (e.g. Taxi Driver) (Y) Out-of-Town (Z)

Example Sequence Activity Sequence of 30 min Sleep 15 min Eat 30 min Travel 1 hr Work O O J A A D D D D

Initial Sample for Testing Covered in this presentation 75 persons 50 households Just activity type and weekdays (not location & weekends) Later consider: Full sample Weekends and weekdays Location types as a further dimension

Method Dissimilarity Matrix Groups of Similar Activity Patterns Sequential Alignment Method (CLUSTALG Software) Data Set (Time Slices) Fuzzy Cluster Memberships Fuzzy Clustering (S-Plus Software) Cluster Center Interpretation Socio Economic Variable Distribution Fuzzy Weighted Frequency Distributions

Sequential Alignment Method (SAM) Alignment Methods first used in field of Molecular Biology for DNA matching Activity Travel Patterns Intrinsically Sequential SAM Evaluation of Sequence of Characters Global Alignment (Whole Sequence) Local Alignment (Short sequence within entire sequence) Simplest case is Pairwise alignment

Sequential Alignment Method Pairwise Alignment Two Character Sequences ID 1: O O J A A D D D D ID 2: O O O J A D D D O Elementary Operations until equal Insertions and Deletions (Indel) Gaps Gap insertion and extension Penalties Global Alignment – Needleman & Wunch algorithm minimizing the distance or maximizing the similarity ID 1: - O O J A A D D D D - ID 2: O O O J A - D D D – O Similarity Score = 70 Lesser operations  Similar Pair

Gap Opening and Extension Penalties Role of gap penalty High Value Alignment compressed Literally to matches avoiding gaping Resemble main activities at their relative times Recommended values 8 and 3 (Wilson) Low Value Identification of similar activities displaced during the day Better pairwise comparison Little similarity to the actual activity Pattern Recommended values 1 and 0.1 (Wilson) Tested and accepted recommendation of Low Value for Transportation Research (Wilson) Sequential Alignment Method

Multiple Alignment Extension of pairwise alignment to N dimensions Computation power enormous after 10 sequences of reasonable length Approximation method based on data of pairwise alignment Use of ClustalG software by Wilson Sequential Alignment Method

Output is a Dissimilarity Matrix

Fuzzy Clustering Partition Clustering Method Number of clusters k - specified in front The Objects (Activity Patterns) are not assigned to a particular cluster but assigned a membership ranging between 0 and 1 for all clusters Uses S-plus Software (Kaufman Procedure) Dissimilarity matrix is input

Fuzzy Clustering Minimize Objective Function (Kaufman)

Fuzzy Clustering Number of clusters ? An Open question – To be determined as part of research Two quality indices from S-Plus Dunn’s Coefficient Average Silhouette Value with Shadow plot

Fuzzy Clustering Dunn’s Coefficient Where F k always lies in the range [1/k,1].  entirely Fuzzy Clustering   Crisp Clustering 

Fuzzy Clustering Average Silhouette Value (ASV) with Shadow plot Strength of Classification to the nearest crisp cluster compared to the next best cluster Width of Bar 1 – Well Classified 0 – Between two clusters 0< - Badly classified (lies near the next best cluster) Average Value gives a approximation to the best number of clusters ASV must be higher than 0.25

Cluster Center Interpretation Distributions of socio-economic variables Basis for grouping in subsequent modeling Person characteristics: Age Gender Person type category from survey Employment Status Household characteristics: attributed to persons Only income so far Household structure later Fuzzy weighted frequency distributions Need for eventual Crisp Potentially use logit to assign cluster membership values Calibrate ‘utility functions’ for clusters with person characteristics Use Monte Carlo to select specific cluster in each case

Cluster Center Interpretation Fuzzy Weighted Frequency Distributions; Bar for category in histogram for cluster is Percentage sum of people for that category in entire sample factored by cluster membership

Results Sequential Alignment Low Vs High Gap Penalty Results Cluster plot for 3 clusters Low Gap High Gap

Results Use low Gap Penalty – consistent with recommendation (1 and.1) Shadow Plot Low GapHigh Gap Co efficientLow GapHigh Gap Dunn’s Co-efficient Average Silhouette Value0.40.3

Results Number of Clusters Clustal Plot Helps to See the potential range of number of clusters for Clustering

Results Number of Clusters Potential range 2 to 5

Results Number of Clusters (k) K=2 F k = 0.60ASV = 0.42

Results Number of Clusters (k) K=3 F k = 0.43ASV = 0.40

Results Number of Clusters (k) K= 4 F k = 0.34ASV = 0.32

Results Number of Clusters (k) K= 5 F k = 0.28ASV = 0.20

Results Number of Clusters (k) ? Use 3 clusters for testing Expect different for total sample 2 Clusters3 Clusters4 Clusters5 Clusters FkFk ASV

Fuzzy Cluster Memberships Output of S-plus software HH2701 has almost equal memberships to all three clusters -

Results Fuzzy weighted frequency Distribution

Results Cluster Interpretation Crisp presentation

Results Cluster Interpretation - tends to be more; Cluster 1 Students age of 5 to 15 Mainly KEJS and youths Cluster 2 Females Seniors and other adults in Age range Retired home makers and volunteers Cluster 3 Males 100% Adults workers Age 40’s Majority Adults workers not needing a car to work Expect different for total sample

Conclusions Methods seems to work well to identify the clusters as intended – no hurdles. Fuzzy clustering better indicate strength of membership Best to have multiple measures “quality” of clustering regarding number of clusters Still work in progress Results not complete – just for example But essential elements of analysis process set

Conclusions Future Work Proceeding to full sample of 8,400 households including Weekends Expanding to location dimension Calibrate Logit model for allocation of clusters Consider Household Structure

Thank You ?