STATISTICAL ANALYSIS FOR ORIGIN-DESTINATION MATRICES OF TRANSPORT NETWORK Baibing Li Business School Loughborough University Loughborough, LE11 3TU.

Slides:

Advertisements

Similar presentations

CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.

Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)

LECTURE 11: BAYESIAN PARAMETER ESTIMATION

Using Dynamic Traffic Assignment Models to Represent Day-to-day Variability Dirck Van Vliet 20 th International EMME Users’ Conference Montreal October.

Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.

Segmentation and Fitting Using Probabilistic Methods

Visual Recognition Tutorial

Overview Full Bayesian Learning MAP learning

3.3 Toward Statistical Inference. What is statistical inference? Statistical inference is using a fact about a sample to estimate the truth about the.

Bayesian network inference

Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Presenting: Assaf Tzabari

1 Engineering Computation Part 5. 2 Some Concepts Previous to Probability RANDOM EXPERIMENT A random experiment or trial can be thought of as any activity.

Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

Visual Recognition Tutorial

Maximum likelihood (ML)

Lecture II-2: Probability Review

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

: Appendix A: Mathematical Foundations 1 Montri Karnjanadecha ac.th/~montri Principles of.

Particle Filtering in Network Tomography

The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.

Principles of Pattern Recognition

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

Statistical Decision Theory

Model Inference and Averaging

Random Sampling, Point Estimation and Maximum Likelihood.

A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.

ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.

Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.

Bayesian Analysis and Applications of A Cure Rate Model.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Dynamics of Traffic Flows in Combined Day-to-day and With- in Day Context Chandra Balijepalli ITS, Leeds September 2004.

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:

HMM - Part 2 The EM algorithm Continuous density HMM.

Lecture 2: Statistical learning primer for biologists

The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.

Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Brief Review Probability and Statistics. Probability distributions Continuous distributions.

Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Generated Trips and their Implications for Transport Modelling using EMME/2 Marwan AL-Azzawi Senior Transport Planner PDC Consultants, UK Also at Napier.

Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.

Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.

. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.

Applied statistics Usman Roshan.

Model Inference and Averaging

Inference for the mean vector

Statistical Models for Automatic Speech Recognition

Sampling Distributions

Review of Probability and Estimators Arun Das, Jason Rebello

Bayesian Models in Machine Learning

EC 331 The Theory of and applications of Maximum Likelihood Method

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Statistical NLP: Lecture 4

LECTURE 07: BAYESIAN ESTIMATION

Parametric Methods Berlin Chen, 2005 References:

Chapter 5: Sampling Distributions

Presentation transcript:

STATISTICAL ANALYSIS FOR ORIGIN-DESTINATION MATRICES OF TRANSPORT NETWORK Baibing Li Business School Loughborough University Loughborough, LE11 3TU

STATISTICAL ANALYSIS FOR ORIGIN-DESTINATION MATRICES OF TRANSPORT NETWORKS lBackground lStatement of the problem lExisting methods lBayesian analysis via the EM algorithm lA numerical example lConclusions Overview

Background Example. nLocated in Northwest Washington, DC, bounded by Loughboro Road in the north; Canal Road and MacArthur Boulevand in the west; and Foxhall Road in the east nCanal Road is a principal arterial, two lanes wide, generally running northwest-southeast nFoxhall Road is a two-way, two- lanes minor arterial running north- south through the study area nLoughboro Road is a two-way east-west road

What is a transport network nA transport network consists of nodes and directed links nAn origin (destination) is a node from (to) which traffic flows start (travel) nA path is defined to be a sequence of nodes connected in one direction by links Background

nOrigin-destination (O-D) matrices lAn O-D matrix consists of traffic counts from all origins to all destinations lIt describes the basic pattern of demand across a network lIt provides fundamental information for transport management Background

nMethods of obtaining O-D data lRoadside interviews and roadside mailback questionnaires disruption of traffic flow; unpopular with drivers and highway authorities lRegistration plate matching very susceptible to error (e.g. a vehicle passing two observation points has its plate incorrectly recorded at one of the points) lUse of vantage point observers or video for small study area (e.g. to determine the pattern of flows through a complex intersection) l Traffic counts much cheaper than surveys; much smaller observation errors Background

nStatement of the problem lAim: Inference about O-D matrices lAvailable data: traffic counts A relatively inexpensive method is to collect a single observation of traffic counts on a specific set of network links over a given period Statement of the problem

nNotation y=[y 1,…,y c ] T is the vector of the traffic counts on all feasible paths (ordered in some arbitrary fashion) x=[x 1,…,x m ] T is the vector of the observed traffic counts on the monitored links. z=[z 1,…,z n ] T be the vector of O-D traffic counts The matrix A is an m  c path-link incidence matrix for the monitored links only, whose (i, j)th element is 1 if link i forms part of path j; otherwise 0 The matrix B is an n  c matrix whose (i, j)th element is 1 if path j connects O-D pair i; otherwise 0

Statement of the problem nStatistical model (I) x = Ay z = By Assume that y 1,…,y c are unobserved independent Poisson random variables with means  1,…,  c respectively, i.e. y i ~ Poisson(y i ;  i ). Denote  =[  1,…,  c ] T Vector x has a multivariate Poisson distribution with a mean of A 

x (monitored link) y 123 y 43 y 423 x=y 123 +y 423 z 43 =y 43 +y 423 Statement of the problem

n Statistical model (II) x = Pz P*= [p ij ] is a proportional assignment matrix, where p ij is defined to be the proportions of using link j which connects O-D pair i (assumed to be available). P is a sub-matrix of selecting those rows associated with x A common assumption is that the O-D counts z j are independent Poisson variates, thus x being linear combinations of the Poisson variates with mean of P , where  is the mean of z Statement of the problem

x (monitored link) y 123 y 43 y 423 then x=1.0z z 43 If y 423 =0.3z 43 Note y 123 =z 13 Statement of the problem

nRelationship between Model (I) and Model (II) Assumptions: O-D traffic counts z j are independent Poisson random variables with mean  j If y j =[y jk ] is vector of route flows and p j =[p jk ] route probabilities for O-D pair j, then conditional upon the total number of O-D trips, then y j ~ multinomial(z j, p j ) Conclusion: The distributions of y jk are Poisson with parameters  jk =  j p jk Statement of the problem

nMajor research challenges lA highly underspecified problem for inference about an O-D matrix from a single observation lAn analytically intractable likelihood Statement of the problem

nExample of multivariate Poisson distributions Let Y 1, Y 2, and Y 3 be three independent Poisson variates Y i ~ Poisson(y i ;  i ) Define X 1 = Y 1 +Y 3 and X 2 = Y 2 +Y 3. The joint distribution of X 1 and X 2 is a multivariate Poisson distribution: Statement of the problem

nMaximum entropy method (Van Zuylen and Willumsen, 1980) --- Dealing with the issue of under-specification lMaximising entropy, subject to the observation equations lAdding as little information as possible to the knowledge contained in the observation equations Previous research

nUsing normal approximations (Hazelton, 2001) --- Dealing with intractability of multivariate Poisson distributions To circumvent the problem, Hazelton (2001) considered following multivariate normal approximation for the distribution of y : Since x = Ay, we obtain Note that the covariance matrix  depends on . Previous research

nBasic idea --- dealing with the issue of intractability Instead of an analysis on the basis of the observed traffic counts x, the inference will be drawn based on unobserved y Incomplete data  The observed network link traffic counts x are treated as incomplete data (observable) u Follow a multivariate Poisson --- analytically intractable lComplete data  The traffic counts on all feasible paths, y, are treated as complete data (unobservable) u Follow a univariate Poisson --- analytically tractable Bayesian analysis + EM algorithm

nBasic idea --- dealing with the issue of under-specification Bayesian analysis combines two sources of information lPrior knowledge e.g. an obsolete O-D matrix; or non-informative prior in the case of no prior information Current observation on traffic flows Bayesian analysis + EM algorithm

nComplete-data Bayesian inference Complete-data likelihood P(y |  ) The joint distribution of y: ∏ j Poisson(y j |  j ) Incorporate a natural conjugate prior  (  )  j ~ Gamma  (  j ;  j ) Result in a posterior density P(  | y )  j ~ Gamma  (a j ; b j ) with a j =  j + y j and b j =  j +1 Bayesian analysis

The EM algorithm nPosterior density Prior density  (  ) Complete-data likelihood P(y |  )=P(x |  )P(y | x,  ) Complete-data posterior density P(  | y )  P(y |  )  (  ) E-step: averaging over the conditional distribution of y given ( x,  (t) ) E{logP(  | y ) | x,  (t) }=l(  | x)+E{logP(y | x,  ) | x,  (t) }+log  (  (t) )+c M-step: choosing the next iterate  (t+1) to maximize E{logP(  | y ) | x,  (t) } Each iteration will increase l(  | x) and {  (t) } will converge

The EM algorithm nBayesian inference via the EM algorithm lM-step The a posteriori most probable estimate of  j is given by (  j + y j  1)/(  j +1) lE-step Replacing the unobservable data y j by its conditional expectation at the t-th iteration: (  j + E{y j | x,  (t) }  1)/(  j +1)

nCalculation of conditional expectation Theorem. Suppose that { y j } are independent Poisson random variables with means {  j } (j=1,…,c) and A=[A 1, ,A c ] is an m  c matrix with A j the jth column of A. Then for a given m  1 vector, x, we have E{y j | x,  (t) }=  j (t) {Pr(Ay=x  A j ) /Pr(Ay=x)} Major advantage: guarantee positivity Conditional expectation

Estimation, prediction & reconstruction nHazelton (2001) has investigated some fundamental issues and clarified some confusion in the inference for O-D matrices. He clearly defines the following concepts: lEstimation The aim is to estimate the expected number of O-D trips lPrediction The aim is to estimate future O-D traffic flows lReconstruction The aim is to estimate the actual number of trips between each O-D pair that occurred during the observational period

Prediction nFor future traffic counts, the complete-data posterior predictive distribution is nThe complete-data marginal posterior predictive distributions are negative binomial distributions with nThe mode of the marginal posterior predictive distribution is at Given the incomplete data x, the prediction is

Reconstruction The marginal distributions of y j are NB(  j,  j ). Denote the corresponding probability mass functions as For given observation x, the reconstructed traffic counts can be calculated as the a posteriori most probable vector of y, i.e. the solution to the following maximization problem: subject to Ay=x nSolving the above problem yields the reconstructed traffic counts

A numerical example

OriginDestination Table A1. Prior estimates of origin-destination counts A numerical example

OriginDestination Table A2. True values of origin-destination counts A numerical example

nPrior distributions The prior distributions are taken as Gamma distributions with parameters  j being the prior estimates in Table A1 and  j =1 nSimulated data Simulation of unobservable vector of traffic counts, y outcomes of independent Poisson variables with means displayed in Table A2. lMonitored links Assume the traffic counts are available on m=8 of the links, i.e. links 1, 2, 5, 6, 7, 8, 11, 12. Simulation of a single observation, x=Ay x = [884, 548, 111, 133, 191, 144, 214, 640] T. A numerical example

nRepeated experiments lThe simulation experiment was repeated 500 times The quality of prior information varies via adjusting the parameters of the prior distributions  (  j ;  j ) with  = 1, 2, 5, 10, 20,50  j * are the ‘true’ values of the parameters in Table A2 and  j0 are the prior values in Table A1 A numerical example

Conclusions nBayesian analysis lChallenge: a highly underspecified problem for inference about an O-D matrix from a single observation lSolution: Bayesian analysis combining the prior information with current observation nThe EM algorithm lChallenge: an analytically intractable likelihood of observed data lSolution: the EM algorithm dealing with unobservable complete data which have analytically tractable likelihood

References Hazelton, L. M. (2001). Inference for origin-destination matrices: estimation, prediction and reconstruction. Transportation Research, 35B, Li, B. (2005). Bayesian inference for origin-destination matrices of transport networks using the EM algorithm. Technometrics, 47, 2005, Van Zuylen, H. J. and Willumsen, L. G. (1980). The most likely trip matrix estimated from traffic counts. Transportation Research, 14B,