Predictive Linear Risk Terrain Model

Predictive Linear Risk Terrain Model
For Impaired Crashes Partnership with MHSO, the grant we have impaired analysis and traffic safety improvements. Alicia Shipley - GIS Analyst Washington College GIS Program Chestertown, Maryland

Goal of linear risk terrain model
Predict impaired driving crashes along one-mile road segments Assist in Maryland’s mission Towards Zero Deaths Ultimately, we are trying to save lives! The linear risk terrain model goal is to be able to predict one mile road segment areas of where impaired crashes are occurring within a specific county. Maryland’s traffic safety community has a unified mission to move the state towards zero deaths, how can we assist in that goal? Creating analysis reports on previous trend data can only help so much. Noah’s Law, SHSP’s, SPIDRE Team (these efforts are helping towards zero deaths). It’s necessary now to take that next step to be able to predict where crashes are going to occur based on that historical data so that the officers and traffic safety professionals can evaluate those areas in order to make the most informed decisions on how to prevent the crashes. My job is to be able to tell them with the highest accuracy rate possible where crashes will occur next year. With the technology and tools within ArcGIS we are at the point now where a predictive analysis is possible.

One-Mile Road Segments VS Hotspots
Why Linear? One-Mile Road Segments VS Hotspots We chose to do a linear analysis because it is more precise. Hotspots are only able to show you general areas, whereas linear can pin point exact sections of roadways. Since roadways are where the crashes are occurring it is more accurate to do an analysis on the roadway itself other than on general areas. Hotspots also tend to gravitate toward highly populated areas.

Prince George’s County, Maryland
Data Used Prince George’s County, Maryland Data used for creating the model: Impaired Crashes 2012 to 2014 Impaired Electronic Citations (E-TIX) 2012 to 2014 Liquor License Locations Impaired E-TIX Locations to Home Routes 2012 to 2014 Data used for testing the model: Impaired Crashes 2015 For the model I used impaired crashes, a dataset retrieved from Maryland’s State Highway Administration, E-TIX, which are electronic citations retrieved from Maryland’s District and Judiciary Courts, Liquor License Establishments in Prince George’s County which includes every location that has an active liquor license, and finally we developed a dataset of Impaired E-TIX stop locations to the offender’s home address, predicting the quickest route the offender would take to get home after drinking. I used a trend data from 2012 to 2014 to create the model. I used Impaired crashes from 2015 to use to test the model in how accurate it’s prediction was.

Why I Chose This Data Available Data Crashes E-TIX
Liquor License Establishment Locations E-TIX stops to Home Routes The purpose of using this data. Use RAVEN (explain briefly what it is) clips to show how we saw trends from the impaired layers and the liquor establishments. Experience with the data. Reference old poster. Is there a correlation between trends and predicting data. Are crashes and ETIX occurring in the same areas? Do liquor establishment locations have a correlation to where impaired crashes and/or E-TIX stops are occurring?

Many factors go into testing the model
Many factors go into testing the model. We have our results: that we want to predict the 665 segments in the 2015 crash data with our model. Can we replicate exact segments that the 2015 crashes occurred on? That’s what we were hoping to find out in the process of testing the model. I quickly ran into a challenge when testing this model and that was over-prediction. How many additional routes are being predicted that are not lining up with the 2015 segments. We can’t give law enforcement over a thousand routes to patrol. We want to narrow it down to as few routes as possible because if not that would be a resource drain. Once I understood my challenges in over and under predicting, the model started to make a lot more sense to me. I was then able to look at the numbers with an understanding of what they truly represented.

2012 to 2014 crash segments: 638 out of 665 = 96%
220% over-predicting Liquor License segments: 406 out of 665 = 61% E-TIX segments: 602 out of 665 = 90% 246% over-predicting Impaired E-TIX Stops to Home Route = 663 out of 665, 99% 1,051% over-predicting First Round of Testing the Model Used the original/unweighted datasets to compare to the 2015 testing dataset in order to determine which datasets are the most accurate. The first round of testing proved to be a lot more accurate than I thought it would be. Through using tools within ArcGIS I was able to predict 96% accuracy when comparing the 2012 to 2014 crashes to the 2015 crashes. The first round of testing was using the raw datasets to see which one has the highest accuracy so that in the second round of testing I could weight them properly. After seeing the percentages of the roadways that were being predicted through each dataset overall, I took it a step deeper to see how accurate the crashes were in each category, low, medium, and high. Did the 2012 to 2014 crashes have a high percentage of predicting the road segments with a low priority or perhaps could I get lucky and have the 2012 to 2014 crashes have a high percentage in predicting the higher priority crash road segments. I need to mention how the accuracy is high because most of those datasets have a high number of segments, which I will list, which is over-predicting.

Predicting Priorities
Low: 90% Medium: 81% High: 43% Low: 51% Medium: 34% High: 14% Low: 34% Medium: 24% High: 7% Low: 65% Medium: 51% High: 50% I then decided to break down the segments into low, medium, and high to see how the model is predicting priorities. Ultimately, we want to model to predict all of the high segments. I took each dataset and compared them each separately to the 2015 crash test data by priority. After using a tested method within ArcGIS software, I was able to gather the percentages of each category. The percentages appeared to be as I suspected, that the low category would have the highest percentage, since there are more roadways that are populated that have a low number of crashes to occur. I was disappointed to see that the high segments, which are the ones I am really trying to predict had 50% or under. Ultimately the goal is to predict the high segments, the roadways that the majority of crashes are occurring. We aren’t worried about predicting the roadways where one crash happened in a three year period. Liquor licenses proved not to be a highly accurate dataset, which is proving that there is not a high correlation between locations of liquor establishments and impaired crashes. Seeing the percentages for each dataset propelled me into the next step of this analysis.

Over Predicting How many additional routes are being populated that are not lining up with our test routes? How many additional routes are being populated that aren’t lining up? Weighted analysis only high priority comparing to all categories in We don’t want to over predict or under predict. We don’t want to give the officers over 1,000 routes to patrol. We want to give them the most important areas. I was able to get pretty high percentages, making it appear as if I am predicting 100% of the routes, but the reality is that the outputs are creating so many segments that they are all bound to line up with 2015 segments. It is over-predicting by thousands of segments and that is not practical for officers to patrol each roadway. We want to be able to predict the fewest amount of segments that line up the best with the analysis. So with over-prediction in mind, I had to narrow down the analysis to cut down on the amount of segments the model produces.

Which of the weighted analyses would prove to be more accurate?
Weighted Analysis Which of the weighted analyses would prove to be more accurate? Next I wanted to take the analysis a little deeper and weight each dataset according to how accurate they were. I first tried weighting to datasets in this order: Crashes, E-TIX, E-TIX to Home Routes, Liquor License, figuring that crashes are the most important risk factor and Liquor License Locations being the least important. It predicted 665 out of 665, 100%, but this is not an accurate representation because it is over-predicting by 1,113%. So I decided I will only choose the high count segments to test to cut down on the segments to prevent too high over-prediction. Once I cut out the low segments, the first weighted analysis predicted 73% of the routes, with an over-prediction of only 23%. I was pretty satisfied with my results, but decided that I need to try weighting it few different ways to see if I could get the percentage any higher. The next weighted sequence I tried was Crashes, E-TIX, Liquor License, and E-TIX to Home Routes. This weighted dataset proved to be unsuccessful compared to the first. It only predicted 66% of the segments with only 8% over-prediction. Although the over-prediction percentage was low, the accuracy was not that high. This proved two things, that my first assumption that Liquor License locations were least important was correct and also that the E-TIX to Home Routes were more important than I thought. So naturally my next sequence of the weighted analysis was Crashes, E-TIX to Home Routes, E-TIX, and Liquor License. This weighted analysis predicted 84% of the road segments, higher than my first weighted analysis, but the catch was that it over predicted by 58%. This is a low over prediction percentage. The question then arose about which of the weighted analyses would prove to be more accurate, the 73% with 23% over-prediction, or the 84% with a small over-prediction.

Results Results What We Predicted What Actually Happened
Which model to choose out of the three? How will other inputs effect the model to make it more accurate? So overall, we don’t really have a final answer. With the initial testing that I did, none of the results predicted quite what we were hoping for. There is a lot more to test to refine the model to ultimately produce the highest accuracy with no over-prediction 66% accuracy, 8% over-prediction. We can get the accuracy up. Our goal is to improve accuracy while reducing over-prediction. We want the over prediction to be 0%. Within the initial testing that I did that the none of the results predicted what I was hoping. Pick one result and drill into it. Visually it looks as if it does not line up, but statistically it is showing to be the What We Predicted What Actually Happened

Future steps What other factors can we add into our model and what factors do we need to consider further as we develop the model? Additional risk factors. What other data do we need that could help the analysis further? How can we make this predictive analysis more accurate? Trying to combine all the conditions. How Geography affects areas (MD has a lot of different terrain, mountains, cities, rural.) Challenges in different counties Comparing rural vs urban Seasonal (Beach areas) Liquor license locations will be different based on rural vs urban (rural LL are clustered together and densely populated areas) Data anomalies Events Liquor Licenses On and off site Closing times Violations ETIX Previous offenders Crashes Home addresses (PII) Fatalities

Future steps (continued)
Roadways Interstates vs. state vs. county routes Month, Time, Day Weather Construction can force different routes Colleges Census Data Income Education Crime (DDACTS) Other types of Driving Aggressive Distracted

Predictive Linear Risk Terrain Model

Similar presentations

Presentation on theme: "Predictive Linear Risk Terrain Model"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Predictive Linear Risk Terrain Model

Similar presentations

Presentation on theme: "Predictive Linear Risk Terrain Model"— Presentation transcript:

Similar presentations

About project

Feedback