Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deep Predictive Model for Autonomous Driving

Similar presentations

Presentation on theme: "Deep Predictive Model for Autonomous Driving"— Presentation transcript:

1 Deep Predictive Model for Autonomous Driving
Wongun Choi


3 Scene Type Image classification: from where the image is taken? City

4 Static Scene Elements Semantic segmentation: what is the pixel? Road

5 Dynamic Objects Object detection: where are certain types of objects?

6 Dynamic Objects Object detection: where are certain types of objects?

7 Dynamic Objects Multiple target tracking: how each object has been moving?

8 Planning? ?

9 Future Prediction Behavior prediction: how each objects will be moving?

10 Challenges Multi-modal inputs

11 Challenges Multi-modal inputs Multi-modal future

12 Challenges Multi-modal inputs Multi-modal future Accurate time horizon

13 Challenges Multi-modal inputs Multi-modal future Accurate time horizon
Large search space / Limited training data

14 Previous Works Conditional Variational Autoencoder, Walker et al 2016.
Adversarial Transformers, Vondrick et al 2017. No previous work address all the challenges critical for the prediction in driving scenario.

15 Previous Works Conditional Variational Autoencoder, Walker et al 2016.
Adversarial Transformers, Vondrick et al 2017. Activity Forecasting, Kitani et al 2012. No previous work address all the challenges critical for the prediction in driving scenario. Guided Cost Learning, Finn et al 2016.

16 DESIRE: Deep Stochastic IOC RNN Encoder-decoder
N. Lee, W. Choi, P. Vernaza, C. Choy, P. Torr, and M. Chandraker, CVPR 2017 End-to-end trainable framework for behavior prediction. Diverse hypotheses generation via cVAE. Data efficient learning via IOC based framework to rank the hypotheses. Iterative refinement of the hypotheses. Sample Generation Scoring and Refinement

17 Overall Model Images / preprocessed BEV map

18 Sampling with cVAE Encoding the past trajectory.
Reconstruct the future trajectory. Latent variable z with KLD regularization. Encoding the future trajectory. Train only. Images / preprocessed BEV map

19 Sampling with cVAE Images / preprocessed BEV map During training, cVAE is learned to reconstruct the target future trajectory given the past trajectory, while enforcing z to match the prior distribution (KLD). During testing, z is drawn from the prior distribution. The latent random variable z encourages to learn diverse predictions. We condition the sampler solely on the past dynamics information, which leads to better generalization. Kingma and Welling 2013, Walker et al 2016

20 Ranking with IOC RNN decoder provide score of states of samples.
Encoding the past trajectory. Global regression vector is learned by using the last hidden vector. Images / preprocessed BEV map CNN learns the static spatial context (e.g., favored drivable location, turn direction, etc).

21 Ranking with IOC Scene context via CNN features.
Interaction among dynamic agents. Dynamics. Images / preprocessed BEV map Need some work to improve!!!

22 Ranking with IOC Images / preprocessed BEV map
Need some work to improve!!!

23 Ranking with IOC The CNN learns the static cost features.
Images / preprocessed BEV map The CNN learns the static cost features. SCF module combines dynamics, scene context and interactions to provide time-varying cost function. Regression vector is learned to refine “blind” samples further. The model is learned with max-entropy IOC framework in an end-to-end manner. Ziebart et al 2008, Finn et al 2016

24 Experiments Datasets Set-up KITTI dataset Stanford Drone Dataset
24 video sequences, about 6,000 frames 2,500 prediction instances. Preprocessed BEV maps using velodyne points and semantic segmentation. Stanford Drone Dataset 16,000 prediction instances. Use the images directly. Set-up Predict 40 frames (4 sec) in the future given 20 frames past trajectory. 4 / 5 fold cross validation.

25 Experiments Baselines Linear prediction.
RNN ED: a deterministic RNN autoencoder without scene/interaction. RNN ED-SI: a deterministic RNN autoencoder with scene/interaction. CVAE. DESIRE-S: the proposed method with scene context. DESIRE-SI: the proposed method with scene context and interaction.

26 Experiments

27 Experiments

28 Experiments

29 Experiments

30 Experiments

31 Experiments

32 Experiments

33 Experiments

34 Iterative feed-back

35 Iterative feed-back

36 Iterative feed-back

37 Conclusion We propose an end-to-end trainable model for bahavior prediction. Our model can produce multi-modal future prediction with an accurate temporal horizon. The scene context fusion module naturally integrates multiple cues. IOC based framework enables us to learn a predictive model.

38 Questions & career:

Download ppt "Deep Predictive Model for Autonomous Driving"

Similar presentations

Ads by Google