Spring 2013. Crash frequency is used as a fundamental indicator of “safety” in the evaluation and estimation methods presented in the HSM. Where the term.

Slides:



Advertisements
Similar presentations
The Poisson distribution
Advertisements

Assumptions underlying regression analysis
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Modeling Process Quality
Business Statistics for Managerial Decision
Discrete Probability Distributions
Spring  Crash modification factors (CMFs) are becoming increasing popular: ◦ Simple multiplication factor ◦ Used for estimating safety improvement.
Spring  Types of studies ◦ Naïve before-after studies ◦ Before-after studies with control group ◦ Empirical Bayes approach (control group) ◦ Full.
Spring Sampling Frame Sampling frame: the sampling frame is the list of the population (this is a general term) from which the sample is drawn.
Spring INTRODUCTION There exists a lot of methods used for identifying high risk locations or sites that experience more crashes than one would.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Chapter 4 Probability Distributions
Evaluating Hypotheses
Lec 32, Ch5, pp : Highway Safety Improvement Program (objectives) Learn the components of FHWA’s Highway Safety Improvement Program Know typical.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
Some standard univariate probability distributions
CHAPTER 6 Statistical Analysis of Experimental Data
3-1 Introduction Experiment Random Random experiment.
Some standard univariate probability distributions
Inferences About Process Quality
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Some standard univariate probability distributions
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Incorporating Temporal Effect into Crash Safety Performance Functions Wen Cheng, Ph.D., P.E., PTOE Civil Engineering Department Cal Poly Pomona.
Chapter 5 Several Discrete Distributions General Objectives: Discrete random variables are used in many practical applications. These random variables.
Correlation & Regression
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
Distributions Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Inference for regression - Simple linear regression
Correlation and Linear Regression
Investment Analysis and Portfolio management Lecture: 24 Course Code: MBF702.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 8 Continuous.
Understanding Statistics
Modeling and Simulation CS 313
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Random Sampling, Point Estimation and Maximum Likelihood.
The Empirical Bayes Method for Safety Estimation Doug Harwood MRIGlobal Kansas City, MO.
Theory of Probability Statistics for Business and Economics.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
1 CEE 763 Fall 2011 Topic 1 – Fundamentals CEE 763.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Statistical Applications Binominal and Poisson’s Probability distributions E ( x ) =  =  xf ( x )
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Biostatistics Class 3 Discrete Probability Distributions 2/8/2000.
Highway accident severities and the mixed logit model: An exploratory analysis John Milton, Venky Shankar, Fred Mannering.
2007 CAS Predictive Modeling Seminar Estimating Loss Costs at the Address Level Glenn Meyers ISO Innovative Analytics.
Chapter 8: Simple Linear Regression Yang Zhenlin.
© Copyright McGraw-Hill 2004
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
 A probability function is a function which assigns probabilities to the values of a random variable.  Individual probability values may be denoted.
Copyright © Cengage Learning. All rights reserved. 3 Discrete Random Variables and Probability Distributions.
Copyright © Cengage Learning. All rights reserved. 3 Discrete Random Variables and Probability Distributions.
Chapter 31Introduction to Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2012 John Wiley & Sons, Inc.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Session 2 History How did SPF come into being and why is it here to stay? Geni Bahar, P.E. NAVIGATS Inc.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Virtual University of Pakistan
Exploratory Analysis of Crash Data
Before-After Studies Part I
Probability distributions
Statistical NLP: Lecture 4
Network Screening & Diagnosis
Chapter 5: Sampling Distributions
Presentation transcript:

Spring 2013

Crash frequency is used as a fundamental indicator of “safety” in the evaluation and estimation methods presented in the HSM. Where the term “safety” is used in the HSM, it refers to the crash frequency or crash severity, or both, and collision type for a specific time period, a given location, and a given set of geometric and operational conditions. CRASHES AS THE BASIS OF SAFETY ANALYSIS

The HSM focuses on how to estimate and evaluate the crash frequency and crash severity for a particular roadway network, facility, or site, in a given period, and hence the focus is on “objective” safety. Objective safety refers to use of a quantitative measure that is independent of the observer. In contrast, “subjective” safety concerns the perception of how safe a person feels on the transportation system. Assessment of subjective safety for the same site will vary between observers. Objective and Subjective Safety

In the HSM, a crash is defined as a set of events that result in injury or property damage due to the collision of at least one motorized vehicle and may involve collision with another motorized vehicle, a bicyclist, a pedestrian, or an object. The terms used in the HSM do not include crashes between cyclists and pedestrians, or vehicles on rails. Definition of a Crash

In the HSM, “crash frequency” is defined as the number of crashes occurring at a particular site, facility, or network in a one-year period. Crash frequency is calculated according to Equation 3-1 and is measured in number of crashes per year. Definition of Crash Frequency

The term “predictive method“ refers to the methodology in Part C of the HSM that is used to estimate the “expected average crash frequency” of a site, facility, or roadway under given geometric design and traffic volumes for a specific period of time. Definition of Predictive Method The term “expected average crash frequency” is used in the HSM to describe the estimate of long-term average crash frequency of a site, facility, or network under a given set of geometric design and traffic volumes in a given time period (in years). Definition of Expected Average Crash Frequency

Crashes vary in the level of injury or property damage. The American National Standard ANSI D defines injury as “bodily harm to a person” (7). The level of injury or property damage due to a crash is referred to in the HSM as “crash severity.” While a crash may cause a number of injuries of varying severity, the term crash severity refers to the most severe injury caused by a crash. Definition of Crash Severity

Crash severity is often divided into categories according to the KABCO scale, which provides five levels of injury severity. Even if the KABCO scale is used, the definition of an injury may vary between jurisdictions. Definition of Crash Severity

The five KABCO crash severity levels are: K—Fatal injury: an injury that results in death; A—Incapacitating injury: any injury, other than a fatal injury, that prevents the injured person from walking, driving, or normally continuing the activities the person was capable of performing before the injury occurred; B—Non-incapacitating evident injury: any injury, other than a fatal injury or an incapacitating injury, that is evident to observers at the scene of the crash in which the injury occurred; C—Possible injury: any injury reported or claimed that is not a fatal injury, incapacitating injury, or non-incapacitating evident injury and includes claim of injuries not evident; O—No Injury/Property Damage Only (PDO). Definition of Crash Severity

Crashes are rare and random events. By rare, it is implied that crashes represent only a very small proportion of the total number of events that occur on the transportation system. Random means that crashes occur as a function of a set of events influenced by several factors, which are partly deterministic (they can be controlled) and partly stochastic (random and unpredictable). An event refers to the movement of one or more vehicles and or pedestrians and cyclists on the transportation network. Crashes Are Rare and Random Events

A crash is one possible outcome of a continuum of events on the transportation network during which the probability of a crash occurring may change from low risk to high risk. Crashes represent a very small proportion of the total events that occur on the transportation network. For example, for a crash to occur, two vehicles must arrive at the same point in space at the same time. However, arrival at the same time does not necessarily mean that a crash will occur. The drivers and vehicles have different properties (reaction times, braking efficiencies, visual capabilities, attentiveness, speed choice), that will determine whether or not a crash occurs. Crashes Are Rare and Random Events

Because crashes are random events, crash frequencies naturally fluctuate over time at any given site. The randomness of crash occurrence indicates that short-term crash frequencies alone are not a reliable estimator of long-term crash frequency. If a three-year period of crashes were used as the sample to estimate crash frequency, it would be difficult to know if this three-year period represents a typically high, average, or low crash frequency at the site. This year-to-year variability in crash frequencies adversely affects crash estimation based on crash data collected over short periods. The short-term average crash frequency may vary significantly from the long-term average crash frequency. This effect is magnified at study locations with low crash frequencies where changes due to variability in crash frequencies represent an even larger fluctuation relative to the expected average crash frequency. Natural Variability in Crash Frequency

Each time a vehicle enters an intersection, a highway segment, or any other type of entity (a trial) on a given transportation network, it will either crash or not crash. For purposes of consistency a crash is termed a “success” while failure to crash is a “failure.” For the Bernoulli trial, a random variable, defined as X, can be generated with the following probability model: if the outcome “w” is a particular event outcome (e.g. a crash), then X (ω) = 1 whereas if the outcome is a failure then X (ω) = 0. Thus, the probability model becomes: Theoretical Process of Motor Vehicle Crashes X10 P(x=X)pq where p is the probability of success (a crash) and q=(1-p) is the probability of failure (no crash).

It can be shown that if there are N independent trials (vehicle passing through an intersection, road segment, etc.), the count of successes over the number of trials give rise to a Bernoulli distribution. We’ll define the term Z as the number of successes over the N trials. Under the assumption that all trials are characterized by the same failure process (this assumption is revisited later), the appropriate probability model that accounts for a series of Bernoulli trials is known as the binomial distribution, and is given as: Where, n = 0, 1, 2, …, N (the number of successes or crashes) Binomial distribution Equation 1

Poisson Approximation For typical motor vehicle crashes where the event has a very low probability of occurrence and a large number of trials exist (e.g. million entering vehicles, vehicle-miles-traveled, etc.), it can be shown that the binomial distribution is approximated by a Poisson distribution. Under the Binomial distribution with parameters N and p, let p=λ/N, so that a large sample size N will be offset by the diminution of p to produce a constant mean number of events λ for all values of p. Then as N -› ∞, it can be shown that: Where, n = 0, 1, 2, …, N (the number of successes or crashes) λ = the mean of a Poisson distribution Equation 2

Poisson Approximation The approximation illustrated in Equation (2) works well when the mean λ and p are assumed to be constant. In practice however, it is not reasonable to assume that crash probabilities across drivers and across road segments (intersections, etc.) are constant. Specifically, each driver-vehicle combination is likely to have a probability that is a function of driving experience, attentiveness, mental workload, risk adversity, vision, sobriety, reaction times, vehicle characteristics, etc. Furthermore, crash probabilities are likely to vary as a function of the complexity and traffic conditions of the transportation network (road segment, intersection, etc.). All these factors and others will affect to various degrees the individual risk of a crash. These and other characteristics affecting the crash process create inconsistencies with the approximation illustrated in Equation (2). Outcome probabilities that vary from trial to trial are known as Poisson trials (note: Poisson trials are not the summation of independent Poisson distributions; this term is used to designate Bernoulli trials with unequal probability of events).

Poisson Approximation The equation below is used for determining if the unequal event of independent probabilities can be approximated by a Poisson process. Where, d TV = total variance between the two probabilities measured L(Z) and Po(λ); L(Z) = count data generated by unequal probability of events Po(λ) = count data generated by unequal events of independent probabilities with λ=E(Z). Equation 3 See Barbour et al. (1992) Poisson Approximation. Clarendon Press, New York, NY for additional information.

Poisson Approximation The equation below is used for determining if the unequal event of independent probabilities leads to over-dispersion, VAR(Z) > E(Z). If Then For any r > 2, where

Crash Data as Poisson Process Given the characteristics described in the previous overheads, it is often assumed that crash data on a given site (or entity) follow a Poisson a distribution. In other words, if one were to count data over time for one site, the data are assumed to be Poisson distributed. Example: Time t 1234ii Crash Count Poisson assumption: Where, λ = Mean of the Poisson distribution y = Crash count (0, 1, 2, …)

Crash Data as Poisson Process If we have counts = 3, 7, 0, and 3 on an entity, what is ̂ λ?

Crash Data as Poisson Process We can plot P(3, 7, 0, and 3) as a function of λ (the likelihood function)

Crash Data as Poisson Process We can plot P(3, 7, 0, and 3) as a function of λ (the likelihood function) is maximum at

Crash Data as Poisson Process λ p(y) λ*λ* is maximum at

Crash Data as Poisson Process Accuracy of estimation ( ): iyiyi Counts and Predicted Values

Crash Data as Poisson Process Method to calculate the mean and variance observed in crash data Finding the mean: Where: Finding the variance:

Overdispersion (aka heterogeneity) Crash data can rarely be exhibited as pure Poisson distribution. Usually, the data display a variance that is greater than the mean, VAR(Y) > E(Y). This is known as over-dispersion. Sometimes, the data can show under-dispersion, but this is very rare. The principal cause of over-dispersion was explained in the previous overheads (Bernoulli process with unequal probability of events). Over-dispersion can also be caused by numerous factors. For other types of processes (not based on a Bernoulli trial), over-dispersion can be explained by the clustering of data (neighborhood, regions, wiring boards, etc.), unaccounted temporal correlation, and model mis-specification. These factors also influences the heterogeneity found in crash data.

Overdispersion (aka heterogeneity) In order to account for over-dispersion commonly found in crash data, it has been hypothesized that the mean (λ) found in a population of sites follows a gamma probability density function. In other words, if we have a population of entities (say 100 intersections) their mean λs (if everything else remain constant) would follow a gamma distribution. The gamma probability density function is defined by: for > 0 Where, = the mean of the selected site = parameters of the gamma distribution [gamma( )] = gamma function (∫e -u u (Φ-1) du)

Overdispersion (aka heterogeneity) λ f(λ) As discussed in the previous slide, 100 intersections with the exact same characteristics (traffic flow, geometric design, etc.), including the number of crashes per year, will (or are expected to) have different Poisson mean λ values. The distribution of these means is assumed to be gamma distributed. Distribution of the Poisson means follows the gamma distribution

Overdispersion (aka heterogeneity) There are three reasons why the gamma probability function has been a popular assumptions: 1. The mean λ is allowed only to take a positive value; 2. The Gamma PDF is very flexible; it can move and stretched to fit a variety of shapes; and 3. It makes the algebra simple and often yields “closed form “ results. Note: Nobody has proved so far that the mean varies according to a gamma probability function. People use it because it is easy to manipulate. Some researchers have used a lognormal function to characterize the distribution of the mean, which is a little more complex. You can also use more complicated distributions (e.g., Conway-Maxwell-Poisson).

Overdispersion (aka heterogeneity) The mean and variance of gamma probability density function can be estimated as follows: To estimate and from data, you can use the following equations and s 2 are estimated using the equations shown above for the Poisson.

Negative Binomial (or Poisson-gamma) It can be shown that if the mean of a Poisson distribution is gamma distributed, the joint mixed distribution gives rise to the Negative Binomial distribution. The derivation is shown as follows:

Overdispersion (aka heterogeneity) Note: Poisson-gamma distribution is often characterized by the setting : This is known as the one parameter gamma distribution. The relationships shown above will become useful for describing the mean and variance of the negative binomial regression model.

Negative Binomial (or Poisson-gamma) The mean and variance of the Negative Binomial distribution (one parameter) are estimated using the following equations: Note: if, the second part of the variance function tend towards 0. This means that the Negative Binomial becomes a Poisson distribution since the mean and variance are now equal. Note: For modeling purposes, the term is usually estimated directly from the data. This will be addressed later in the course.

Negative Binomial (or Poisson-gamma) Alternative forms of the PDF:

Analysis method known as “in-dept” analysis or clinical study. This method does not rely on the statistical nature of the crash process. It seeks to determine the deterministic mechanism that lead to an accident or a crash. This method is very common for extremely rare events, such aircraft or space shuttle crashes. The data needed to carry out such investigations and reconstruction are very extensive. Often, it is not available on typical computerized crash records.

Speed Limit Policy Site Characteristics Speed Distribution Outcome

An example: analysis of de-icing roads and traffic club for children using causal chain approach. This study aims at estimating the reduction in risk factors for each chain. See Elvik (2003) in Accident Analysis & Prevention, Vol. 35, pp

De-icing Roads

Traffic Club