Download presentation
Presentation is loading. Please wait.
1
Experience with the Gravity Model
2
Introduction There is demand for air travel between every city-pair in the world (can be very small) We have imperfect data on the actual travel (for many of the larger demands) The “Gravity Model” is the long-standing traditional formulation Demand is bigger, the bigger the origin city Demand is bigger, the bigger the destination Demand is smaller, the longer the distance Other things may also matter
3
Prologue We have some experience and prejudices
Doubling the origin city size should double the demand Doubling the destination size should double the demand Cost may be a better metric than distance The “zero” demands should not be left out Air Demand <200 miles competes with ground Common Language helps demand Common “alphabet” helps demand Different incomes hurts demand Gravity works better at distributing total outbound demand than at estimating size of total travel Leisure destinations are origin-specific and arbitrary
4
Act 1: We go Exploring Guidelines (“Pirates’ Code”) : US domestic data
Take the easiest first Use places you know about Examine the results in detail US domestic data Best reporting quality One country, one language, one income Lots of points Use Seattle (SEA), Boston (BOS), Chicago (CHI) Disparate types of cities I’ve lived there
5
SEA, BOS, & CHI Passenger data from US ticket sample
Origin-Destination reporting Some “breakage” of interline trips Domestic points only: similar fares, taxes, hassle Origin gravity weight: Not population or income or ..... Use outbound departing passengers Focus results on distributing to destinations Destination gravity weight: Arriving passengers to destination O-D data as used is not directional Original source has home city for trips All further data (outside US) will not So we will use US data in non-directional form
6
Starter Formulation Calibrate gravity model
Pax = WO · WD / Distα Where Pax = Origin-Destination Demand WO = Origin weight (size) WD = Destination weight (size) Dist = Intercity Distance α = Distance exponent (calibrated variable) Use Log formulation, linear least squares fit Examine forecast to actual Passengers/Demand Allow origin size WO to be a calibrated variable One each for SEA, BOS, CHI
7
Early Observations: Some Wild Outliers
Here “Draw” is the ratio of actual to forecast demand, indexed. Distance exponent here is -0.6.
8
“Fitzing” with data Most Dist<200 had low actuals
Demand diverted to surface modes not in data SEA high actuals were: Points in Alaska Had trips interlining in SEA--with “broken’ data Were dedicated Seattle points—like college towns BOS high actuals were: Leisure destinations, for Boston Characteristics of high actuals Destinations had small number of origin cities Destinations had one large demand –to origin Some were secondary airports in a city End of Act 1
9
Act 2: Our First Regressions
We eliminate all points<200 miles Due to ground competition We eliminate all points with <12 origins Tend to be captive-to-single-origin points We did a big side-study on share-of-largest origin We generate “zeros” by destination (16%) When 1 or 2 of SEA, BOS, CHI lack demand Due to log form, zeros don’t work We try .3, .1, .01, and .001 for zeros We get rising α with smaller zeros, significantly We include only 5% zeros, but get same reactions Smallest value in data is 1 passenger per day. Rounded.
10
Small Demands are a Real Problem
Regression results driven by zero points Least squares in log form gives equal weight to each demand point Log form emphasizes percentage error Actual needs are different: Forecast big demands with smaller % errors Forecast the small demands merely as “small”
11
Compromise Ignore small demands and zeros
Require Pax>10 for all three cities Or drop the destination Merge multiple airports in a city to a single city destination (We had been using airports => cities) We now get same answers, with or without remaining outliers (errors below ½ or above 2) Errors on large demands more reasonable Most small markets forecast as small Exceptions are large for one origin Could be large for other two, but no online services Outliers were 24% of data. Before these requirements, answers changed as outliers were removed, interatively.
12
Define “draw” as ratio = actual / forecast
Early Observations Define “draw” as ratio = actual / forecast
13
Lessons Learned So Far Distance exponent α = -0.66
NOT the same as domestic fares (Fare ~ Dist0.2) Do not include zero demand points Destinations with few origins tend to be “captive” Do not use them in generic calibrations To improve errors in forecasts of large demands, use only points with large demands Result will forecast small demands small, mostly Use Cities, not airports
14
More Lessons City WO fairly consistent with city size
More about this on next slide Ran against Pax data adjusted to standard fares Many “under-forecasts” were in discount markets Ran international destinations True “O & D” not from US ticketing source Distance exponent α of -1.5 (much different) Demands 1/5th of forecasts from domestic Suggests language, or other barriers count Goods research found borders act like mi. Passengers adjusted to “standard” national fare trend formula using price elasticity of -1.2 Research on goods trade between the US and Canada. I read it, but I did not note the reference, as I was not thinking about this research at the time.
15
Play within the Play Observed different ratios to total outbound travel for SEA, BOS, CHI (Wo). But not very different Ran all US domestic pairs (Pax>10) Using just a single variable (WO · WD), with exponent β Results: Distance exponent α = (had been -0.66) City-Size exponent β = 0.85 Suggests larger cities have smaller demands Maybe because higher % of demands are >1 and therefore are captured by data base. (Bigger W = ∑ demands.) Also small cities show more short-haul, which was removed Otherwise, large cities have more direct services & lower fares ! Interpretation allows β = 1.0 to be “reasonable” Drive for beta=1.00 is from the “intuition” or “logic” that doubling the city should double the travel (ceteris paribus), and doubling the destination size, ditto.
16
Act 3: European Regional
New set of data points London, Copenhagen, Istanbul 200 mi < Distance < 2800 mi All 3 (LON, CPH, IST) have Pax > 10 219 points Regression Results Distance exponent α near -0.80 Origin-Specific adjustments not significant Removing outliers has small effect on answers Some really big errors in really big markets Tends to confirm US data experiences
17
Europe: All Points Distance > 200 mi Pax > 10
Least squares regression Distance exponent α goes to -1.2 Weights (WO · WD) exponent β = 0.4 Gives almost all demands near 40 Results Not Satisfactory Distance exponent seems “wrong” (beyond -1) City size (weights) exponent β too far from 1.0 Unsatisfactory forecasts by inspection Most big markets forecast too small Most smaller market forecast too big
18
Go Back to Detailed Look
All markets with Pax > 200 Drop 12 high-side outliers Redefine Error: Not percentage-error-squared (log least sq.) Not Diff = (Passengers – Forecast) Compromise: Diff0.75 Compromise is halfway between size and % Iterate Difference (Pax-fore) in absolute value, no - sign
19
Iterative Procedure Start with Distance and Weight Exponents = 1
Adjust scaling so median Forecast/Pax = 1.00 Adjust Weight exponent β to reduce Error Readjust scaling on each try Adjust Distance exponent α to reduce Error Iterate to find min ∑ Diff (min Error) An “ugly,” unofficial, but practical, process
20
Results from “Procedure”
Distance Exponent α likes to be -1.05 Could be “cultural distance” City Weights Exponent β likes to be 1.25 Why??? Two effects are independent Many “too big” forecasts for small demands
21
Poor Fit of Forecast to Data
Forecast is least sq regression resulting in distance exponent of -1.15 Procedure is outlined in slides above. Results of procedure with exponents of -1.0 for distance and 1.0 for city weights were very similar. And still lousy
22
One Last Regression All Europe – Classic Gravity formula
Pax > 10, Dist > 200 Distance exponent fixed at 1.00 City weight exponent fixed at 1.00 Allowed “factor” for “same country” Was about 5x, as for US vs International Nice scatter Fewer unreasonable forecasts Huge errors everywhere Distance fixed because > 1.00 is too big Weight fixed because it “makes sense” from doubling city= doubling demand
23
Gravity Forecast is Very Poor
24
Obituary on Gravity Model
Forecasts are really bad Outliers have large effect on answer Need to be removed Zeros have large effect on answer Forecasts more sensible when not included Results will be misleading Small markets will be forecast as medium
25
Overall Conclusion Air travel between cities is
Strongly influenced by city-pair specific factors Not amenable to gravity model approach If you have to have a forecast Calibrate from existing larger culturally similar cities to same destinations Recognize the “same country” effect is large (maybe 5x)
26
More Gravity: Long Haul
All world markets Distance > 3100 mi (5000 km) Passengers > 20 No existing nonstop service Least Squares Regressions Four Equations (log calibrations): Traditional: calibrate ratio to gravity term Distance exponent α only (-1.37) Whole Gravity term exponent only (0.19) Separate City Size β and Distance α exponents (β = 0.18 and α = -0.03)
27
“Best Fit” was not useful measured by either % or value errors
Models 3 & 4 fit best Fit achieved by low variance No forecasts at large values No forecasts at small values Most forecasts near 40 This is a pretty worthless “forecast” Model 2 had much worse % misses than 1 Traditional Gravity form had least harmful answers
28
Traditional Gravity was “Best” But not “Good”
This is traditional gravity: Pax = K * (Origin Size * Destin Size)/Distance Has been rescaled (forecast x 1.5) so averages are about right for Paxx divisions (next slide)
29
Median Forecasts are Weakly Correlated with Actuals
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.