Pedestrian Behavior Prediction Models
Pedestrian Behavior Prediction Models
Last Updated: 01/23/2025 | All information is accurate and up-to-date
Background and Objectives
- In most cases, automated vehicles (AVs) can drive smoothly on highways and freeways.
- However, AVs still face challenges when it comes to driving in urban settings
- One key challenge: “Unpredictable” and rapidly changing behaviors of pedestrians and vulnerable road users
- The existing driving strategy is over-conservative targeting to avoid crashes based on short-term kinematics calculations
- The main research objective is to better predict pedestrian behaviors with deep-learning algorithms to support driving decision-making during pedestrian encounters in complex road scenes
Two-Tower Ego-Centric Pedestrian Trajectory Prediction
Key Features
- Multi-modal inputs;
- Two-tower model to decompose egocentric pedestrian trajectories based on ego-vehicle and pedestrian movements;
- Inferences of pedestrian future moving directions.


Algorithm Results on JAAD Benchmark Dataset
| Method | Average Displacement Error | Final Displacement Error |
| PIE | 22.83 | 49.44 |
| BiPed | 21.13 | 48.88 |
| Two-Tower Model | 17.92 | 41.33 |
PIE: Rasouli, A., Kotseruba, I., Kunic, T. and Tsotsos, J.K., 2019. Pie: A large-scale dataset and models for pedestrian intention estimation and trajectory prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 6262-6271).
BiPed: Rasouli, A.; Rohani, M.; and Luo, J. 2021. Bifold and Semantic Reasoning for Pedestrian Behavior Prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 15600–15610.
Algorithm Results on PSI Benchmark Dataset
| Method | Average Displacement Error | Final Displacement Error |
| PIE | 35.39 | 61.50 |
| Two-Tower Model | 22.34 | 46.63 |
Dual-View Pedestrian Trajectory Prediction
Key Features
- Multi-modal inputs;
- Predicting bird’s eye trajectory, ego-centric trajectory, and pedestrian actions simultaneously.
- Multi-task learning to improve prediction accuracy.


Algorithm Results on nuScenes Benchmark Dataset
| Method | Bitrap | SGNet | Ours |
| Average Displacement Error Bird’s eye view | 49 | 46 | 28 |
| Final Displacement Error Bird’s eye view | 57 | 55 | 41 |
| Average Displacement Error Egocentric view | 92 | 89 | 61 |
| Final Displacement Error Egocentric view | 112 | 102 | 86 |
Bitrap: Yao, Y., Atkins, E., Johnson-Roberson, M., Vasudevan, R. and Du, X., 2021. Bitrap: Bi-directional pedestrian trajectory prediction with multi-modal goal estimation. IEEE Robotics and Automation Letters, 6(2), pp.1463-1470.
SGNet: Wang, C., Wang, Y., Xu, M. and Crandall, D.J., 2022. Stepwise goal-driven networks for trajectory prediction. IEEE Robotics and Automation Letters, 7(2), pp.2716-2723.
Pedestrian Intention Prediction Models
Key Features of VR-GCN
- Scene graph with 32 objects;
- CNN + GCN + LSTM as the main structure;
- Pose information incorporated;

Key Features of TrEP
- Transformer-based feature extraction and encoding;
- Evidential learning for robust performance and calibrated uncertainty.

Algorithm Results on nuScenesBenchmark Dataset
| Method | Accuracy | Balanced Accuracy | F1 |
| VR-GCN | 0.74 | 0.61 | 0.64 |
| TrEP | 0.85 | 0.77 | 0.90 |
