Download presentation
Presentation is loading. Please wait.
1
MINING HISTORICAL DELAY DATA IN RAILWAYS
Fabrizio Cerreto PhD student Transport modelling DTU Management Engineering
2
About the PhD Research Project
MSc in Transport Engineering: Sapienza University of Rome + TU Delft Operation/punctuality analyst – NTV SpA PhD Student at DTU in IPTOP: Understanding delays in railways Analytical model (delay propagation) Empirical data analysis Micro-simulation Add Presentation Title in Footer via ”Insert”; ”Header & Footer” Mining historical delay data in railways 9 November November 2018
3
About the PhD Research Project
MSc in Transport Engineering: Sapienza University of Rome + TU Delft Operation/punctuality analyst – NTV SpA PhD Student at DTU in IPTOP: Understanding delays in railways Analytical model (delay propagation) Empirical data analysis Realized running times real timetable supplements Earliness of trains Correlations heatmaps Principal components Delay profiles clustering Micro-simulation Add Presentation Title in Footer via ”Insert”; ”Header & Footer” Mining historical delay data in railways 9 November November 2018
4
Background – research motivation
Timetable allowance Running time supplements Headway buffers Practical design: Good practices for magnitude (e.g. Capacity consumption – UIC406) Rule of the thumb for distribution (national rules: uniform, concentrated) Understanding delays Causes Recurrent patterns Robust design Timetable supplements Headway buffers Primary delay prevention Mining historical delay data in railways 9 November November 2018
5
Data Train timestamps at every station Train characteristics
Station and operation Schedule Delay Information from dispatchers Combined data from DSB: Rolling stock plan and operation ~ 150M records Scheduled time Train ID Station ID Delay Record Type Input Source Product Operator Cause Group code Cause Type Code Delay Report Code Delay Report Cause 06NOV13 09:12:00 2720 KA -5.17 I DWH RV DSB PASSAGER 611 NULL 09:12:30 -6 U 09:16:00 NA -3.67 Mining historical delay data in railways 9 November November 2018
6
Vestbane: Copenhagen - Roskilde
Q3 2014 Time frame København H – Roskilde ~30 km Line Semi-periodic timetable Express trains from/to Copenhagen Traffic Heterogeneous Most important section Freight +Regional + National + International High interest from authorities Reasons Mining historical delay data in railways 9 November November 2018
7
Previous results: Copenhagen – Roskilde Realized Running Times Actual Running time supplements
2nd percentile 2nd percentile Mining historical delay data in railways
8
Previous results: Copenhagen – Roskilde Frequent delay patterns
1 Loses time Høje Tåstrup Gains time Early at Roskilde Late at Copenhagen 2 Valby 3 Copenhagen Roskilde Mining historical delay data in railways
9
Previous results: Copenhagen – Roskilde Realized Running Times Bias in Timestamping
Detection points Departure bias Arrival bias PLATFORM Timetable points Track circuits Mining historical delay data in railways
10
Kystbane: Copenhagen - Helsingør
Timetable year /12/2013 – 14/12/2014 Time frame København H – Helsingør ~50 km Line Cyclic timetable Well isolated High interest from authorities Reasons Mining historical delay data in railways 9 November November 2018
11
Northbound Timetable: Copenhagen - Helsingør
3 stopping patterns 6÷9 trains/h Standardized rolling stock Analyze separately Period of day Changes in operation Rush hour reinforcement Skip-stop from Copenhagen Stop-train from Sweden Mining historical delay data in railways 9 November November 2018
12
Data transpose / column split
New variables: delay change scheduled running time realized running time More… Observations/Rows: train-date Fields/Columns: station records 20÷25 variables more Date Train ID Data KH U KH I KN U KN I KK U KK I … 22-apr-14 1314 Delay 0.4 0.22 -0.17 -0.93 Delay_change 0.18 0.39 0.76 0.47 Sch_run_time 3 3.5 1 2.5 0.5 Real_run_time 3.18 3.89 1.76 1.57 2.97 Mining historical delay data in railways 9 November November 2018
13
First glance: scatterplots and distributions of delays
SYM Highly correlated Highly non-normal Mining historical delay data in railways 9 November November 2018
14
First glance: scatterplots and distributions of delay changes
SYM Non-correlated Highly non-normal Mining historical delay data in railways 9 November November 2018
15
Delay and Delay change profiles
Train 1309 22/4/2014 Mining historical delay data in railways 9 November November 2018
16
Pooled data Mining historical delay data in railways
9 November November 2018
17
Issues with non-normality Tests for changes in operation
to Helsingør Nørreport closed for renovation Trains skipped stop Hidden timetable supplement 22/4/2014: Nørreport opens again to main line trains Test: Before Vs. after Nørreport re-opening Parametric multivariate tests require normality Univariate t-test at stations Result: significantly different operation. Dataset shrunk to Roskilde/Odense to Sweden Mining historical delay data in railways 9 November November 2018
18
Correlation heatmaps Northbound ØK ØP Southbound
Significantly different patterns by direction by stopping patternt Smooth fades vs sharp changes Northbound ØK ØP Southbound Mining historical delay data in railways 9 November November 2018
19
Principal components analysis
Capture intrinsic variability in the data Resampling - Noise reduction Dimensions reduction: data handling Mining historical delay data in railways 9 November November 2018
20
Principal components analysis: example
95% Variability explained with only 2 PC Drawback: Strongly affected by non-normality Eigenvalues of the Correlation Matrix Principal Component Eigenvalue Difference Proportion Cumulative 1 90.13% 90.1% 2 5.14% 95.3% 3 1.91% 97.2% 4 1.13% 98.3% 5 0.61% 98.9% 6 0.33% 99.2% Mining historical delay data in railways 9 November November 2018
21
Clustering: K-means Simple Fast Converges almost always
k must be chosen - metrics Clusters not fixed, no reference Mining historical delay data in railways 9 November November 2018
22
Clustering on Delay Northbound trains
Layered Fuzzy Mining historical delay data in railways 9 November November 2018
23
Clustering on Delay change Northbound trains
Mining historical delay data in railways 9 November November 2018
24
Clustering on Delay change Northbound trains
Mining historical delay data in railways 9 November November 2018
25
Clustering on Delay Southbound trains
Mining historical delay data in railways 9 November November 2018
26
Clustering on delay change southbound trains
Not clustered Fuzzy Mining historical delay data in railways 9 November November 2018
27
Conclusions Real running time supplement Vs. Scheduled
Calibrate with measured offset Non-normality is an issue Multivariate statistical tests PCA Clustering depends on direction Delay Vs Delay change Direction Towards bottlenecks Delays changes are distributed Clustering on Delay From bottlenecks Delays changes are concentrated at the bottleneck Clustering on Delay change Correlation heatmap explains clustering on delay changes Mining historical delay data in railways 9 November November 2018
28
Data mining: next steps
Understanding factors that influence clustering Causes of delays – Identify Primary delays from historical data Regression/Classification into clusters Period of the day Period of the year Weekday Composition Composition changes Dynamics in delay propagation Observations: days of operation Define variables Include changes in the plan Cluster days to forecast operation – short term Mining historical delay data in railways 9 November November 2018
29
Thanks for your attention
Fabrizio Cerreto PhD student Transport modelling DTU Management Engineering
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.