MINING HISTORICAL DELAY DATA IN RAILWAYS

Slides:



Advertisements
Similar presentations
UMR Statistical Analysis Inland Navigation Appointment System Study Upper Mississippi River Locks Center For Transportation Studies University Of.
Advertisements

Covariance Matrix Applications
Use of Kalman filters in time and frequency analysis John Davis 1st May 2011.
1 February 2009 Analysis of capacity on double-track railway lines Olov Lindfeldt February 2008.
Ing. Tomáš Vicherek, Ing. Vlastimil Polach, Ph.D. Research and development Automatic Route Setting According to Train Paths in Anticipated Time Schedule.
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
Stochastic optimization of a timetable M.E. van Kooten Niekerk.
10/17/071 Read: Ch. 15, GSF Comparing Ecological Communities Part Two: Ordination.
1 When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS, Syracuse University.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Capacity for Rail KAJT Dagarna, Dala-Storsund Pavle Kecman - LiU Anders Peterson - LiU Martin Joborn – LiU, SICS Magnus Wahlborg - Trafikverket.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Wolf-Gerrit Früh Christina Skittides With support from SgurrEnergy Preliminary assessment of wind climate fluctuations and use of Dynamical Systems Theory.
Enhancing Interactive Visual Data Analysis by Statistical Functionality Jürgen Platzer VRVis Research Center Vienna, Austria.
Railway Operations: Issues and Objectives Capacity management Infrastructure planning Timetable preparation Management of day-to-day movement of trains.
Background Subtraction based on Cooccurrence of Image Variations Seki, Wada, Fujiwara & Sumi Presented by: Alon Pakash & Gilad Karni.
Digital Media Lab 1 Data Mining Applied To Fault Detection Shinho Jeong Jaewon Shim Hyunsoo Lee {cinooco, poohut,
Hamed Pouryousef ; Pasi Lautala, Ph.D, P.E. Hamed Pouryousef ; Pasi Lautala, Ph.D, P.E. Michigan Tech. University Michigan Tech. University PhD Candidate.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Comparative Evaluation of a Novel Concept Design Method Presented by: Damian Rogers, PhD Candidate Ryerson University, Toronto, Canada.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Nurissaidah Ulinnuha. Introduction Student academic performance ( ) Logistic RegressionNaïve Bayessian Artificial Neural Network Student Academic.
Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
1 VaR Models VaR Models for Energy Commodities Parametric VaR Historical Simulation VaR Monte Carlo VaR VaR based on Volatility Adjusted.
Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.
Energy Consumption Forecast Using JMP® Pro 11 Time Series Analysis
Principal Component Analysis
Missing data: Why you should care about it and what to do about it
Generalized and Hybrid Fast-ICA Implementation using GPU
September 2016 Michael Osmann Model developer
OPERATING SYSTEMS CS 3502 Fall 2017
Amos Tanay Nir Yosef 1st HCA Jamboree, 8/2017
Optimal Train Scheduling Problem
Exploring Microarray data
Alfons A. M. Schaafsma, Vincent A
Track circuit reliability assessment for preventing railway accidents
Analysis of capacity on double-track railway lines
Outlier Processing via L1-Principal Subspaces
Train scheduling based on speed profiles
Empirical analysis of Chinese airport network as a complex weighted network Methodology Section Presented by Di Li.
Principal Component Analysis (PCA)
Machine Learning Basics
Chapter 7 – K-Nearest-Neighbor
Dimension Reduction via PCA (Principal Component Analysis)
Reasons for not attending to present at UKSim 2018
Random walk initialization for training very deep feedforward networks
Enhanced-alignment Measure for Binary Foreground Map Evaluation
Baselining PMU Data to Find Patterns and Anomalies
Quality Control at a Local Brewery
By Lewis Dijkstra, PhD Deputy Head of the Economic Analysis Unit,
Nephrops UWTV surveys in the Skagerrak and Kattegat (FU 3-4)
MRP and ERP.
Descriptive Statistics vs. Factor Analysis
Short term forecast model
Multivariate Analysis of a Carbonate Chemistry Time-Series Study
Multivariate Linear Regression Models
Update timetable in PCS dossier from national system
PCA of Waimea Wave Climate
Dataset: Time-depth-recorder (TDR) raw data 1. Date 2
MATH 6380J Mini-Project 1: Realization of Recent Trends in Machine Learning Community in Recent Years by Pattern Mining of NIPS Words Chan Lok Chun
On the Causes of the Shrinking of Lake Chad
Examining Data.
Occupancy data analytics and prediction: A case study
Road Sensor Data Marco Puts
Dimension Reduction PCA and tSNE
Supporting precise data analysis without releasing patient records: the Simulacrum in action Cong Chen, Paul Clarke, Lora Frayling, Sally Vernon, Brian.
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

MINING HISTORICAL DELAY DATA IN RAILWAYS Fabrizio Cerreto PhD student Transport modelling DTU Management Engineering

About the PhD Research Project MSc in Transport Engineering: Sapienza University of Rome + TU Delft Operation/punctuality analyst – NTV SpA PhD Student at DTU in IPTOP: Understanding delays in railways Analytical model (delay propagation) Empirical data analysis Micro-simulation Add Presentation Title in Footer via ”Insert”; ”Header & Footer” Mining historical delay data in railways 9 November 20189 November 2018

About the PhD Research Project MSc in Transport Engineering: Sapienza University of Rome + TU Delft Operation/punctuality analyst – NTV SpA PhD Student at DTU in IPTOP: Understanding delays in railways Analytical model (delay propagation) Empirical data analysis Realized running times  real timetable supplements Earliness of trains Correlations heatmaps Principal components Delay profiles clustering Micro-simulation Add Presentation Title in Footer via ”Insert”; ”Header & Footer” Mining historical delay data in railways 9 November 20189 November 2018

Background – research motivation Timetable allowance Running time supplements Headway buffers Practical design: Good practices for magnitude (e.g. Capacity consumption – UIC406) Rule of the thumb for distribution (national rules: uniform, concentrated) Understanding delays Causes Recurrent patterns Robust design Timetable supplements Headway buffers Primary delay prevention Mining historical delay data in railways 9 November 20189 November 2018

Data Train timestamps at every station Train characteristics Station and operation Schedule Delay Information from dispatchers Combined data from DSB: Rolling stock plan and operation ~ 150M records 2010-2016 Scheduled time Train ID Station ID Delay Record Type Input Source Product Operator Cause Group code Cause Type Code Delay Report Code Delay Report Cause 06NOV13 09:12:00 2720 KA -5.17 I DWH RV DSB PASSAGER 611 NULL 09:12:30 -6 U 09:16:00 NA -3.67 Mining historical delay data in railways 9 November 20189 November 2018

Vestbane: Copenhagen - Roskilde Q3 2014 Time frame København H – Roskilde ~30 km Line Semi-periodic timetable Express trains from/to Copenhagen Traffic Heterogeneous Most important section Freight +Regional + National + International High interest from authorities Reasons Mining historical delay data in railways 9 November 20189 November 2018

Previous results: Copenhagen – Roskilde Realized Running Times Actual Running time supplements 2nd percentile 2nd percentile Mining historical delay data in railways

Previous results: Copenhagen – Roskilde Frequent delay patterns 1 Loses time Høje Tåstrup Gains time Early at Roskilde Late at Copenhagen 2 Valby 3 Copenhagen Roskilde Mining historical delay data in railways

Previous results: Copenhagen – Roskilde Realized Running Times Bias in Timestamping Detection points Departure bias Arrival bias PLATFORM Timetable points Track circuits Mining historical delay data in railways

Kystbane: Copenhagen - Helsingør Timetable year 2014 15/12/2013 – 14/12/2014 Time frame København H – Helsingør ~50 km Line Cyclic timetable Well isolated High interest from authorities Reasons Mining historical delay data in railways 9 November 20189 November 2018

Northbound Timetable: Copenhagen - Helsingør 3 stopping patterns 6÷9 trains/h Standardized rolling stock Analyze separately Period of day Changes in operation Rush hour reinforcement Skip-stop from Copenhagen Stop-train from Sweden Mining historical delay data in railways 9 November 20189 November 2018

Data transpose / column split New variables: delay change scheduled running time realized running time More… Observations/Rows: train-date Fields/Columns: station records 20÷25 variables more Date Train ID Data KH U KH I KN U KN I KK U KK I … 22-apr-14 1314 Delay 0.4 0.22 -0.17 -0.93 Delay_change 0.18 0.39 0.76 0.47 Sch_run_time 3 3.5 1 2.5 0.5 Real_run_time 3.18 3.89 1.76 1.57 2.97 Mining historical delay data in railways 9 November 20189 November 2018

First glance: scatterplots and distributions of delays SYM Highly correlated Highly non-normal Mining historical delay data in railways 9 November 20189 November 2018

First glance: scatterplots and distributions of delay changes SYM Non-correlated Highly non-normal Mining historical delay data in railways 9 November 20189 November 2018

Delay and Delay change profiles Train 1309 22/4/2014 Mining historical delay data in railways 9 November 20189 November 2018

Pooled data Mining historical delay data in railways 9 November 20189 November 2018

Issues with non-normality Tests for changes in operation to Helsingør Nørreport closed for renovation Trains skipped stop Hidden timetable supplement 22/4/2014: Nørreport opens again to main line trains Test: Before Vs. after Nørreport re-opening Parametric multivariate tests require normality Univariate t-test at stations Result: significantly different operation. Dataset shrunk to Roskilde/Odense to Sweden Mining historical delay data in railways 9 November 20189 November 2018

Correlation heatmaps Northbound ØK ØP Southbound Significantly different patterns by direction by stopping patternt Smooth fades vs sharp changes Northbound ØK ØP Southbound Mining historical delay data in railways 9 November 20189 November 2018

Principal components analysis Capture intrinsic variability in the data Resampling - Noise reduction Dimensions reduction: data handling Mining historical delay data in railways 9 November 20189 November 2018

Principal components analysis: example 95% Variability explained with only 2 PC Drawback: Strongly affected by non-normality Eigenvalues of the Correlation Matrix Principal Component Eigenvalue Difference Proportion Cumulative 1 17.12432 16.14742 90.13% 90.1% 2 0.976895 0.613815 5.14% 95.3% 3 0.36308 0.148796 1.91% 97.2% 4 0.214285 0.099198 1.13% 98.3% 5 0.115087 0.053289 0.61% 98.9% 6 0.061797 0.020083 0.33% 99.2% Mining historical delay data in railways 9 November 20189 November 2018

Clustering: K-means Simple Fast Converges almost always k must be chosen - metrics Clusters not fixed, no reference Mining historical delay data in railways 9 November 20189 November 2018

Clustering on Delay Northbound trains Layered Fuzzy Mining historical delay data in railways 9 November 20189 November 2018

Clustering on Delay change Northbound trains Mining historical delay data in railways 9 November 20189 November 2018

Clustering on Delay change Northbound trains Mining historical delay data in railways 9 November 20189 November 2018

Clustering on Delay Southbound trains Mining historical delay data in railways 9 November 20189 November 2018

Clustering on delay change southbound trains Not clustered Fuzzy Mining historical delay data in railways 9 November 20189 November 2018

Conclusions Real running time supplement Vs. Scheduled Calibrate with measured offset Non-normality is an issue Multivariate statistical tests PCA Clustering depends on direction Delay Vs Delay change Direction Towards bottlenecks Delays changes are distributed Clustering on Delay From bottlenecks Delays changes are concentrated at the bottleneck Clustering on Delay change Correlation heatmap explains clustering on delay changes Mining historical delay data in railways 9 November 20189 November 2018

Data mining: next steps Understanding factors that influence clustering Causes of delays – Identify Primary delays from historical data Regression/Classification into clusters Period of the day Period of the year Weekday Composition Composition changes Dynamics in delay propagation Observations: days of operation Define variables Include changes in the plan Cluster days to forecast operation – short term Mining historical delay data in railways 9 November 20189 November 2018

Thanks for your attention Fabrizio Cerreto PhD student Transport modelling DTU Management Engineering