Pedestrian Behavior Prediction Models

Last Updated: 01/23/2025 | All information is accurate and up-to-date

Background and Objectives

In most cases, automated vehicles (AVs) can drive smoothly on highways and freeways.
However, AVs still face challenges when it comes to driving in urban settings
- One key challenge: “Unpredictable” and rapidly changing behaviors of pedestrians and vulnerable road users
- The existing driving strategy is over-conservative targeting to avoid crashes based on short-term kinematics calculations
The main research objective is to better predict pedestrian behaviors with deep-learning algorithms to support driving decision-making during pedestrian encounters in complex road scenes

Two-Tower Ego-Centric Pedestrian Trajectory Prediction

Key Features

Multi-modal inputs;
Two-tower model to decompose egocentric pedestrian trajectories based on ego-vehicle and pedestrian movements;
Inferences of pedestrian future moving directions.

A diagram of the Two-tower Ego-centric Pedestrian Trajectory prediction model structure. — Model Structure

Multiple images of pedestrians crossing roads. — Prediction Results at 1.5 Seconds

Algorithm Results on JAAD Benchmark Dataset

Method	Average Displacement Error	Final Displacement Error
PIE	22.83	49.44
BiPed	21.13	48.88
Two-Tower Model	17.92	41.33

PIE: Rasouli, A., Kotseruba, I., Kunic, T. and Tsotsos, J.K., 2019. Pie: A large-scale dataset and models for pedestrian intention estimation and trajectory prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 6262-6271).

BiPed: Rasouli, A.; Rohani, M.; and Luo, J. 2021. Bifold and Semantic Reasoning for Pedestrian Behavior Prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 15600–15610.

Algorithm Results on PSI Benchmark Dataset

Method	Average Displacement Error	Final Displacement Error
PIE	35.39	61.50
Two-Tower Model	22.34	46.63

Dual-View Pedestrian Trajectory Prediction

Key Features

Multi-modal inputs;
Predicting bird’s eye trajectory, ego-centric trajectory, and pedestrian actions simultaneously.
Multi-task learning to improve prediction accuracy.

A diagram of the Dual-View Pedestrian Trajectory Prediction model structure. — Model Structure

Multiple images of pedestrians crossing roads and the model predicting their trajectory. — Prediction Results at 2 Seconds

Algorithm Results on nuScenes Benchmark Dataset

Method	Bitrap	SGNet	Ours
Average Displacement Error Bird’s eye view	49	46	28
Final Displacement Error Bird’s eye view	57	55	41
Average Displacement Error Egocentric view	92	89	61
Final Displacement Error Egocentric view	112	102	86

Bitrap: Yao, Y., Atkins, E., Johnson-Roberson, M., Vasudevan, R. and Du, X., 2021. Bitrap: Bi-directional pedestrian trajectory prediction with multi-modal goal estimation. IEEE Robotics and Automation Letters, 6(2), pp.1463-1470.

SGNet: Wang, C., Wang, Y., Xu, M. and Crandall, D.J., 2022. Stepwise goal-driven networks for trajectory prediction. IEEE Robotics and Automation Letters, 7(2), pp.2716-2723.

Pedestrian Intention Prediction Models

Key Features of VR-GCN

Scene graph with 32 objects;
CNN + GCN + LSTM as the main structure;
Pose information incorporated;

Graph-Convolutional-Network-based Pedestrian Intent Prediction (VR-GCN)

Key Features of TrEP

Transformer-based feature extraction and encoding;
Evidential learning for robust performance and calibrated uncertainty.

Transformer-based Evidential Learning for Intent Prediction (TrEP)

Algorithm Results on nuScenesBenchmark Dataset

Method	Accuracy	Balanced Accuracy	F1
VR-GCN	0.74	0.61	0.64
TrEP	0.85	0.77	0.90