Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transportation Mode Recognition using Smartphone Sensor Data

Similar presentations


Presentation on theme: "Transportation Mode Recognition using Smartphone Sensor Data"— Presentation transcript:

1 Transportation Mode Recognition using Smartphone Sensor Data
Arash Jahangiri & Hesham Rakha Presented by Hesham Rakha Samuel Reynolds Professor of Engineering, CEE Director of the Center for Sustainable Mobility, VTTI 11/10/2018

2 VTTI | Center for Sustainable Mobility
Outline Introduction Literature Review Data Collection Methodology Results Conclusions VTTI | Center for Sustainable Mobility 11/10/2018

3 VTTI | Center for Sustainable Mobility
Introduction Objective: Develop classifiers to identify transportation modes Modes: Car, Bus, Bike, Run, Walk Methods: Supervised machine learning techniques Data: Obtained from smartphone sensors Developed a custom data acquisition system VTTI | Center for Sustainable Mobility 11/10/2018

4 VTTI | Center for Sustainable Mobility
Introduction Applications Transportation Planning Traditional Approach: Questionnaires/Travel Diaries/ Telephone Interviews Environmental Applications Carbon footprint / health monitoring Safety Applications Incorporating mode information into crash prediction models VTTI | Center for Sustainable Mobility 11/10/2018

5 VTTI | Center for Sustainable Mobility
Literature Review Methods Applied Artificial Intelligence (AI) tools Fuzzy Expert Systems, Decision Trees, Bayesian Networks, Support Vector Machine (SVM), etc. Statistical Methods Supporting Techniques GIS maps Discrete Hidden Markov Models, Bootstrap aggregating VTTI | Center for Sustainable Mobility 11/10/2018

6 VTTI | Center for Sustainable Mobility
Literature Review Study Classes Accelerometer GPS GIS Different motorized? Positioning Window size (s) Accuracy (%) [4] 6 no yes No restrictions 30 93.5 [8] 4 1 93.6 [7] 8 10.24 82.1 [14] In pocket 5 93.9 [15] 3 96.9 [18] 8/6 >20 61.8/78.8 VTTI | Center for Sustainable Mobility 11/10/2018

7 VTTI | Driving Transportation with Technology
Literature Review Number of classes (3 - 8) Sensor Data: Accelerometer / GPS GIS maps Different motorized Device Positioning Window size ( seconds) Basically the factors that can affect model performance VTTI | Driving Transportation with Technology 11/10/2018

8 VTTI | Center for Sustainable Mobility
Unique Contributions Considered both motorized and non-motorized modes. Did not depend on device positioning. Did not use the information from GPS due to GPS sensor limitations. Used data from gyroscope, accelerometer and rotation vector sensors. Had travelers collect car and bus data on different road types with different speed limits. device positioning dependency = like travelers have to attach the smartphone to their bodies in other studies VTTI | Center for Sustainable Mobility 11/10/2018

9 VTTI | Driving Transportation with Technology
Unique Contributions Had travelers collect data for situations similar to traffic jam conditions. Applied all common machine learning procedures: Complete model selection, regularization, feature selection, and feature scaling. Created and assessed a large number of features. Created the features based on statistical measures of dispersion as well as derivatives to incorporate feature time dependency. Similar to traffic jam = at or near intersections going with the queue Complete model selection = consideration of tuning parameters (depends on the algorithm) Regularization = to deal with overfitting Feature selection = to find and use most relevant variables Feature scaling = normalizing variables to a specified range; I used (-1,1) VTTI | Driving Transportation with Technology 11/10/2018

10 VTTI | Center for Sustainable Mobility
Data Collection Developed Smartphone App Ten individuals / two different android phones Car, bicycle, bus, walk, and run About 25 hours of data Sensors: Accelerometer, Gyroscope, and Rotation Vector Preprocessing Data Synchronization Interpolation & resampling Data obtained from different sensors were not synchronized, so first we did interpolation to have a continuous data stream, then resampled at a desired rate VTTI | Center for Sustainable Mobility 11/10/2018

11 VTTI | Center for Sustainable Mobility
Methodology Methods: Support Vector Machine (SVM) Tree-based Methods K-Nearest Neighbor (K-NN) Feature Selection: Minimum Redundancy Maximum Relevance (mRMR) Model Selection: Five-fold Cross-Validation Out-of-bag error for bagging and random forest methods Cross-Validation: used 5-fold cross validation, in which data is divided into 5 parts; one part is set aside for validation the rest is used to train the model. then do similar steps 5 times. each time with another part as the validation. Out-of-bag error: is for Bagging and Random Forest, similar to cross validation, when creating different trees, a part of the data is not used; the error is computed only based on this unused part and is called Out-of-bag error VTTI | Center for Sustainable Mobility 11/10/2018

12 VTTI | Center for Sustainable Mobility
Methodology – SVM Large margin classifier Single SVM model Ensemble of SVM models 𝑚𝑖𝑛 𝑤,𝑏,𝜉 𝑤 𝑇 𝑤+𝐶 𝑛=1 𝑁 𝜉 𝑛 Equation 1 𝑦 𝑛 𝑤 𝑇 𝜙 𝑥 𝑛 +𝑏 ≥1− 𝜉 𝑛 , 𝑛=1,…,𝑁 Equation 2 𝜉 𝑛 ≥0 , 𝑛=1,…,𝑁 Equation 3 Objective Function: minimizing the first term is basically equivalent to maximizing the margin between classes, and the second term consists of an error term multiplied by the regularization (penalty) parameter denoted by C Ensemble here means developing a series of SVM models using subsets of data, then do averaging to obtain results (the idea is similar to Bagging or Random Forest) More details of SVM : Equation (2) ensures that margin of at least 1 exists with consideration of some violations. The value of 1 resulted from normalizing 𝑤. Equation (3) restricts the data points to the points that have positive errors. SVM applies the function 𝜙 . to transform data from the current n-dimensional 𝑋 space into a higher dimensional 𝑍 space in which the decision boundaries between classes are easier to identify. This transformation could be computationally very expensive; consequently, to solve the problem, the SVM only needs to obtain vector inner products in the space of interest. Hence, SVM takes advantage of some functions known as Kernels that return the vector inner product in the desired Z space. We used Gaussian Kernel. Where, 𝑤 Parameters to define decision boundary between classes 𝐶 Regularization (or penalty) parameter 𝜉 𝑛 Error parameter to denote margin violation 𝑏 Intercept associated with decision boundaries 𝜙 𝑥 𝑛 Function to transform data from X space into some Z space VTTI | Center for Sustainable Mobility 11/10/2018

13 Methodology – Tree based Models
Single Tree Bagging : Ensemble of trees / all variables used in each tree Random Forest: Ensemble of trees / restricted number of variables used in each tree Criteria used to choose the best split at each node Cross-Entropy. The range of − 𝑘=1 𝐾 𝑃 𝑘 𝑚 𝑙𝑜𝑔 𝑃 𝑘 𝑚 is (0,1), 0 being the most purer and 1 the least purer. When splitting data, the algorithm tries to use splits that result in purer nodes. For example, in case we have 2 classes (a and b): Split 1: results in a node with 70% of data being class a and 30% class b Split 2: results in a node with 90% of data being class a and 10% class b Cross-Entropy for Split 2 is lower than that of split 1 and thus purer. Other criteria can also be used (like Gini Index) − 𝑘=1 𝐾 𝑃 𝑘 𝑚 𝑙𝑜𝑔 𝑃 𝑘 𝑚 where, 𝑃 𝑘 𝑚 Proportion of class 𝑘 observations in node 𝑚 VTTI | Center for Sustainable Mobility 11/10/2018

14 VTTI | Center for Sustainable Mobility
Methodology - KNN Classifies test observations based on classes of the K-nearest neighbors 𝑦 𝑗 𝑡𝑒𝑠𝑡 = 1 𝐾 𝑋 𝑗 𝑡𝑟𝑎𝑖𝑛 ∈ 𝑁 𝐾 𝑦 𝑗 𝑡𝑟𝑎𝑖𝑛 where, 𝑋 𝑗 𝑡𝑟𝑎𝑖𝑛 , 𝑋 𝑗 𝑡𝑒𝑠𝑡 Observation vectors for train and test sets 𝑦 𝑗 𝑡𝑟𝑎𝑖𝑛 , 𝑦 𝑗 𝑡𝑒𝑠𝑡 Response (or target) values corresponding to the observations 𝑋 𝑗 𝑡𝑟𝑎𝑖𝑛 and 𝑋 𝑗 𝑡𝑒𝑠𝑡 𝐾 Number of neighbors Just as a note, the formulation shows averaging, but for classification problems (our case), it does not mean that we average the responses, but we do majority votes. VTTI | Center for Sustainable Mobility 11/10/2018

15 Methodology – Feature Selection
Measures used to create features: No. Measure 1 𝑚𝑒𝑎𝑛 𝑥 𝑖 𝑡 11 𝑚𝑒𝑎𝑛 𝑥 𝑖 𝑡 2 𝑚𝑎𝑥 𝑥 𝑖 𝑡 12 𝑚𝑎𝑥 𝑥 𝑖 𝑡 3 𝑚𝑖𝑛 𝑥 𝑖 𝑡 13 𝑚𝑖𝑛 𝑥 𝑖 𝑡 4 𝑣𝑎𝑟 𝑥 𝑖 𝑡 14 𝑣𝑎𝑟 𝑥 𝑖 𝑡 5 𝑠𝑡𝑑 𝑥 𝑖 𝑡 15 𝑠𝑡𝑑 𝑥 𝑖 𝑡 6 𝑟𝑎𝑛𝑔𝑒 𝑥 𝑖 𝑡 16 𝑟𝑎𝑛𝑔𝑒 𝑥 𝑖 𝑡 7 𝑖𝑞𝑟 𝑥 𝑖 𝑡 17 𝑖𝑞𝑟 𝑥 𝑖 𝑡 8 𝑠𝑖𝑔𝑛𝐶ℎ𝑎𝑛𝑔𝑒 𝑥 𝑖 𝑡 18 𝑠𝑖𝑔𝑛𝐶ℎ𝑎𝑛𝑔𝑒 𝑥 𝑖 𝑡 9 𝑒𝑛𝑒𝑟𝑔𝑦 𝑥 𝑖 𝑡 19 𝑠𝑝𝑒𝑐𝑡𝑟𝑎𝑙𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑥 𝑖 𝑡 VTTI | Driving Transportation with Technology 11/10/2018

16 Methodology – Feature Selection
mRMR Used for feature selection Maximize the relevance between the feature and the target class Minimize the redundancy between that feature and the already selected features max 𝑥 𝑖 ∈ 𝑀−𝐹 𝑀𝐼( 𝑥 𝑖 ,𝑐) and min 𝑥 𝑖 ∈ 𝑀−𝐹 𝐹 𝑥 𝑗 ∈𝐹 𝑀𝐼( 𝑥 𝑖 , 𝑥 𝑗 ) 𝑀𝐼(𝑥,𝑦) Mutual Information of 𝑥 and 𝑦 𝑥 𝑖 The feature to be examined 𝑥 𝑗 A previously selected feature 𝑀 / 𝐹 Set of all features / Set of the selected features c Target class MI:  is a measure of the variables' mutual dependence and determines how similar the joint distribution p(X,Y) is to the products of factored marginal distribution p(X)p(Y). VTTI | Center for Sustainable Mobility 11/10/2018

17 Results – Model Selection
Number of neighbors KNN Regularization SVM Gaussian parameter SVM Number of features RF Tuning parameters for different models Number of trees Bag, RF VTTI | Center for Sustainable Mobility 11/10/2018

18 VTTI | Center for Sustainable Mobility
Results - Comparison Overall Accuracy: Calculated by dividing the total number of correct detections by the total number of test data. F-Score: Combined measure of the Recall and the Precision Youden’s index: A measure to assess the ability of a model to avoid failure Discriminant power: Shows how well a model discriminates between different classes Recall and Precision are calculated based on true positives, true negatives: The recall measure is calculated by dividing the total number of true positives by the total number of actual positives. The Precision measure is computed by dividing the total number of true positives by the total number of predicted positives. VTTI | Center for Sustainable Mobility 11/10/2018

19 VTTI | Driving Transportation with Technology
Results - Comparison Overall Accuracy KNN 91.2% Bag 94.4% DT 87.27% SVM 94.62% DT.P 86.3% E.SVMs 94.41% RF 95.1% KNN: K nearest neighbor DT: Decision Tree DT.P: pruned Decision Tree (just making a huge tree smaller), the accuracy drops a bit but a huge tree is pruned to a smaller one RF: Random Forest, the best overall performance Bag: Bagging SVM: Support Vector Machine, the best for specific modes (walk and Run) E.SVM: Ensemble of SVM VTTI | Driving Transportation with Technology 11/10/2018

20 Results – Feature Importance
Identified important features Based on Mean Decrease Accuracy & Mean Decrease Gini No. Feature Name 1 𝑠𝑝𝑒𝑐𝑡𝑟𝑎𝑙𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑎 𝑥 11 𝑚𝑒𝑎𝑛 𝑎 𝑧 2 𝑟𝑎𝑛𝑔𝑒 𝑎 𝑦 12 𝑖𝑞𝑟 𝑎 𝑥 3 𝑚𝑎𝑥 𝑎 𝑦 13 𝑣𝑎𝑟 𝑔 𝑥 4 𝑚𝑎𝑥 𝑔 𝑦 14 𝑚𝑖𝑛 𝑎 𝑦 5 𝑚𝑖𝑛 𝑔 𝑦 15 𝑟𝑎𝑛𝑔𝑒 𝑎 𝑥 6 𝑟𝑎𝑛𝑔𝑒 𝑔 𝑥 16 𝑒𝑛𝑒𝑟𝑔𝑦 𝑎 𝑥 7 𝑠𝑝𝑒𝑐𝑡𝑟𝑎𝑙𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑎 𝑦 17 𝑟𝑎𝑛𝑔𝑒 𝑔 𝑥 8 𝑚𝑎𝑥 𝑎 𝑧 18 𝑚𝑒𝑎𝑛 𝑔 𝑧 9 𝑚𝑒𝑎𝑛 𝑔 𝑥 19 𝑠𝑡𝑑 𝑎 𝑦 10 𝑚𝑖𝑛 𝑎 𝑧 20 𝑠𝑝𝑒𝑐𝑡𝑟𝑎𝑙𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑔 𝑥 Mean Decrease Accuracy that shows how the detection accuracy is decreased if a feature was excluded, averaged over all trees, and normalized by the standard deviation of the differences in accuracy and (2) Mean Decrease Gini that shows how a single feature contributed to decrease the Gini index (a measure similar to cross-entropy) over all the trees. Spectral Entropy: assuming the data as a distribution, this was used as a measure to show the peaky spots of a distribution Energy: if data treated as signal, Energy of a signal is 𝑥 2 over the time window of interest VTTI | Center for Sustainable Mobility 11/10/2018

21 VTTI | Center for Sustainable Mobility
Conclusions Developed smartphone app to obtain sensor data Used Accelerometer, Gyroscope, & Rotation Vector sensors Transportation modes: Bike, Car, Walk, Run, and Bus A time window of one second Applied machine learning to develop detection models Most difficult modes to distinguish: Car and Bus (motorized modes) Best overall performance with Random Forest SVM outperformed the RF in certain modes (walk and run) Selected 80 features using mRMR, of which top 20 were identified VTTI | Center for Sustainable Mobility 11/10/2018

22 Future Recommendations
Adding more data Applying approaches to examine the data as a sequence Considering other transportation modes (e.g. metro) Conducting error analysis incorporate that knowledge into the models to enhance the detection performance VTTI | Driving Transportation with Technology 11/10/2018

23 VTTI | Driving Transportation with Technology
Thank you! Questions/Comments ? VTTI | Driving Transportation with Technology 11/10/2018


Download ppt "Transportation Mode Recognition using Smartphone Sensor Data"

Similar presentations


Ads by Google