Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Clustering NTSB Accidents Data Lishuai Li, Rafael Palacios, R. John Hansman JUP Quarterly Meeting Jan. 2010.

Similar presentations


Presentation on theme: "1 Clustering NTSB Accidents Data Lishuai Li, Rafael Palacios, R. John Hansman JUP Quarterly Meeting Jan. 2010."— Presentation transcript:

1 1 Clustering NTSB Accidents Data Lishuai Li, Rafael Palacios, R. John Hansman JUP Quarterly Meeting Jan. 2010

2 2 Introduction  Aviation safety has been improved significantly over the past 50 years.  It is difficult to improve safety by making up for problems occurred in individual accident for the current systems.  Each accident is often induced by various anomalies. To identify patterns, correlations, and trends in large amounts of aviation accidents data can help us to understand problems and to prevent future incidents. Boeing, Statistical Summary of Commercial Jet Airplane Accidents, July 2009 Data Source: National Transportation Safety Board

3 3 Methodology  Research Method: Use data-mining techniques to identify patterns in accidents data Identify accidents with similar characteristics Incorporate findings with narratives to find causalities  Data: Subset of NTSB accident database system (ADMS2000) Event Type: Accident only, excluding incident FAR Part: Part 91 (General Aviation); Part 121 (Air Carriers) Aircraft Type: Airplanes only Year: from 2000 to 2005 Other database will be considered in future work  Data-mining tools: Clustering (e.g. k-means): use a distance function to search for partitioning of records such that the intra-cluster distance is minimal and the inter-cluster distance is maximum Other data-mining techniques will be considered and used in future study

4 4 Clustering Method  K-means clustering is a partitioning method.  Data can be partitioned into k mutually exclusive clusters.  K-means clustering finds a partition in which objects within each cluster are as close to each other as possible, and as far from objects in other clusters as possible. Each data point represents an accident. The attributes of that accident determine where the data point is. K-means clustering can be used to find accidents with similar attributes.

5 5 Preliminary Results of Clustering NTSB Accidents Data  For this preliminary study, we want to test if k-means clustering can be used to identify accidents with similar attributes specified.  Apply k-means clustering method to the subset of NTSB data (Part 91 & Part 121 Accidents from 2000 to 2005)  Accidents attributes used in clustering: Flight Plan Type, Injury Level, Visibility, Phase of Flight Location, Day of The Year

6 6 Phase of Flight & Visibility Characteristics for Part 91 Accidents (2000-2005)  General characteristics of accidents regarding individual variable are commonly known Accidents are more likely to happen in very low visibility conditions High rate of accidents during taking-offs and landings All events with visibility >10 are put into the same grouped as the ones with visibility =0

7 7 Phase of Flight & Visibility Characteristics by Flight Plan Type Phase of Flight Distribution of Part 91 Accidents (2000-2005) VFR vs. IFR Visibility Distribution of Part 91 Accidents (2000-2005) VFR vs. IFR

8 8 Phase of Flight & Visibility Characteristics by Injury Level Phase of Flight Distribution of Part 91 Accidents (2000-2005) Non-Fatal vs. Fatal Visibility Distribution of Part 91 Accidents (2000-2005) Non-Fatal vs. Fatal

9 9 Clustering by Flight Plan Type, Injury Level, Flight Phase, and Visibility  Combine all the information in 4 dimensions to cluster similar accidents  Accidents are clearly separated into 4 categories by Flight Plan Type and Visibility.  IFR accidents and Fatal accidents are more evenly spread over Phase of Flight and Visibility.  VFR/Non-Fatal accidents are concentrated in 3 regions: low visibility, or high visibility in initial phases and landings.

10 10 Accidents Characteristics by Clusters Fatal VFR/Other Non-Fatal VFR/Other Non-Fatal IFR Fatal IFR Phase of FlightVisibility Phase of FlightVisibility Phase of FlightVisibility Phase of FlightVisibility

11 11 Locations and Day of The Year of Part 91 Accidents (2000-2005) Total number of accidents included: 6819 Location DistributionTime Distribution

12 12 Clustering Part 91 Accidents by Location & Day of The Year  Accidents are automatically classified by location and time of the year.  The two variables, location and day of the year, are not enough to create clusters with potential safety implications.

13 13 Locations and Day of The Year of Part 121 Accidents (2000-2005) Total number of accidents included: 157 Location DistributionTime Distribution

14 14 Clustering Part 121 Accidents by Location & Day of the Year  Accidents sharing similar locations and time information are clustered together (12 clusters)

15 15 Accidents in Cluster 2  Cluster 2 includes 5 Caribbean accidents Accidents on 4/22/2002, 2/25/2003, 4/6/2003 4/24/2003 were caused by turbulence Accident on 2/8/2003 was caused by passenger stair handrail collapsing

16 16 Summary & Future Work  Data-mining method can combine multiple-dimensional information at the same time.  Accidents can be partitioned by clustering methods with specified attributes.  Future Work: Develop a systemic approach to include important variables in clustering method Explore other data-mining techniques to review safety data in a new way Investigate other possible safety data sources, e.g. accidents, ATC operation errors Identify patterns in accidents, or various anomalies, which can reveal subtle causalities underlying in the large amount of data

17 17 Thank You ! Questions?

18 18 Backup Slides


Download ppt "1 Clustering NTSB Accidents Data Lishuai Li, Rafael Palacios, R. John Hansman JUP Quarterly Meeting Jan. 2010."

Similar presentations


Ads by Google