Deep Predictive Model for Autonomous Driving

Deep Predictive Model for Autonomous Driving
Wongun Choi

Scene Type Image classification: from where the image is taken? City

Static Scene Elements Semantic segmentation: what is the pixel? Road
Sidewalk

Dynamic Objects Object detection: where are certain types of objects?

Dynamic Objects Multiple target tracking: how each object has been moving?

Planning? ?

Future Prediction Behavior prediction: how each objects will be moving?

Challenges Multi-modal inputs

Challenges Multi-modal inputs Multi-modal future

Challenges Multi-modal inputs Multi-modal future Accurate time horizon

Challenges Multi-modal inputs Multi-modal future Accurate time horizon
Large search space / Limited training data

Previous Works Conditional Variational Autoencoder, Walker et al 2016.
Adversarial Transformers, Vondrick et al 2017. No previous work address all the challenges critical for the prediction in driving scenario.

Previous Works Conditional Variational Autoencoder, Walker et al 2016.
Adversarial Transformers, Vondrick et al 2017. Activity Forecasting, Kitani et al 2012. No previous work address all the challenges critical for the prediction in driving scenario. Guided Cost Learning, Finn et al 2016.

DESIRE: Deep Stochastic IOC RNN Encoder-decoder
N. Lee, W. Choi, P. Vernaza, C. Choy, P. Torr, and M. Chandraker, CVPR 2017 End-to-end trainable framework for behavior prediction. Diverse hypotheses generation via cVAE. Data efficient learning via IOC based framework to rank the hypotheses. Iterative refinement of the hypotheses. Sample Generation Scoring and Refinement

Overall Model Images / preprocessed BEV map

Sampling with cVAE Encoding the past trajectory.
Reconstruct the future trajectory. Latent variable z with KLD regularization. Encoding the future trajectory. Train only. Images / preprocessed BEV map

Sampling with cVAE Images / preprocessed BEV map During training, cVAE is learned to reconstruct the target future trajectory given the past trajectory, while enforcing z to match the prior distribution (KLD). During testing, z is drawn from the prior distribution. The latent random variable z encourages to learn diverse predictions. We condition the sampler solely on the past dynamics information, which leads to better generalization. Kingma and Welling 2013, Walker et al 2016

Ranking with IOC RNN decoder provide score of states of samples.
Encoding the past trajectory. Global regression vector is learned by using the last hidden vector. Images / preprocessed BEV map CNN learns the static spatial context (e.g., favored drivable location, turn direction, etc).

Ranking with IOC Scene context via CNN features.
Interaction among dynamic agents. Dynamics. Images / preprocessed BEV map Need some work to improve!!!

Ranking with IOC Images / preprocessed BEV map
Need some work to improve!!!

Ranking with IOC The CNN learns the static cost features.
Images / preprocessed BEV map The CNN learns the static cost features. SCF module combines dynamics, scene context and interactions to provide time-varying cost function. Regression vector is learned to refine “blind” samples further. The model is learned with max-entropy IOC framework in an end-to-end manner. Ziebart et al 2008, Finn et al 2016

Experiments Datasets Set-up KITTI dataset Stanford Drone Dataset
24 video sequences, about 6,000 frames 2,500 prediction instances. Preprocessed BEV maps using velodyne points and semantic segmentation. Stanford Drone Dataset 16,000 prediction instances. Use the images directly. Set-up Predict 40 frames (4 sec) in the future given 20 frames past trajectory. 4 / 5 fold cross validation.

Experiments Baselines Linear prediction.
RNN ED: a deterministic RNN autoencoder without scene/interaction. RNN ED-SI: a deterministic RNN autoencoder with scene/interaction. CVAE. DESIRE-S: the proposed method with scene context. DESIRE-SI: the proposed method with scene context and interaction.

Experiments

Iterative feed-back

Conclusion We propose an end-to-end trainable model for bahavior prediction. Our model can produce multi-modal future prediction with an accurate temporal horizon. The scene context fusion module naturally integrates multiple cues. IOC based framework enables us to learn a predictive model.

Questions & career: wongun@nec-labs.com

Deep Predictive Model for Autonomous Driving

Similar presentations

Presentation on theme: "Deep Predictive Model for Autonomous Driving"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Deep Predictive Model for Autonomous Driving

Similar presentations

Presentation on theme: "Deep Predictive Model for Autonomous Driving"— Presentation transcript:

Similar presentations

About project

Feedback