Human Cognition Models to Inspire AVs in Interaction Scenes
Human Cognition Models to Inspire AVs in Interaction Scenes
Last Updated: 01/22/2025 | All information is accurate and up-to-date
Zhang, Z., Elahi, M., Domeyer, J., and Tian, R., “Driver Temporal Segmentation of Pedestrian Crossing Intentions during Negotiations,” in IEEE Transactions on Intelligent Vehicles, (Under Review)
Issues of Pedestrian Behavior Prediction Models
- Limited generalizability towards inherent behavioral uncertainty and contingency
- Lower accuracy in predicting sudden behavior changes
- Reduced performance for longer prediction horizons (2-6 seconds)
- AV-pedestrian negotiation has higher requirements compared with safety functionalities
Possible Solutions
- Generative models for pedestrian trajectories
- Generating multiple trajectories or trajectory heatmap
- Rethink how human drivers negotiate with pedestrians
Driver Scene Understanding Model
We propose the event-segmentation-based scene understanding model based on the Theory of Mind to explain driver cognition during pedestrian interactions.
Main Assumption: driver and pedestrian negotiate crossing intentions
- Intention is a commitment to certain actions within a time boundary
- Pedestrians have present-oriented (low-level) and future-oriented (high-level) intentions
- Pedestrian Situated Intent (PSI) is the pedestrian’s intention to cross the conflicting area before the ego-vehicle in dynamically changing situations involving the car, pedestrian, and contextual environment.

- Step 1: A driver automatically segments perceptual inputs at a coarse level (pedestrian intention).
- Step 2: Within each segment, drivers can predict fine-level events (i.e., pedestrian actions) more accurately by comparing working memory with long-term memory.
- Step 3: Coarse-level segmentation boundaries are identified when the prediction of fine-level events is no longer accurate, meaning estimated pedestrian intention changes.
- Step 4: Working memory is updated to rebuild the course level segment (pedestrian intention) boundaries and loop back to step
Video Experiment Process
- Ask a group of representative human drivers to estimate the pedestrian situated intent changes when watching prerecorded pedestrian encountering videos from the driver’s view.
- From the first frame to the last frame during the pedestrian encounter
- Each human driver needs to estimate the pedestrian’s intent to cross in front of the car
- Provide descriptions about the reasoning process when the intent estimation changes
- Provide driving decisions

Estimation: Not sure
The pedestrian is standing between the two lanes and obeying traffic. The car ahead is slowing down. It is a busy road with fast moving.

Estimation: Not cross
The pedestrian looks like a child. He is still standing between the two lanes and obeying traffic. He has been looking behind at the other side, his body facing diagonally, and his feet pointed in the same direction. The car ahead has already stopped. There are still cars going in his way.

Estimation: Cross
The pedestrian is still standing between the two lanes and making back-and-forth movements. He may be looking for an opportunity to cross. Someone in the car ahead might be calling him as well. Cars have slowed down, so he may jump to the side.

Estimation: Not cross
The pedestrian has been looking to cross to the other side. Now that the opposite lanes were empty, he started to run to the other side and would not be in front of this car even though it was closer to this side.
Experiment and Data Analysis Process

- Elahi, M.F., Luo, X. and Tian, R., 2020, July. A framework for modeling knowledge graphs via processing natural descriptions of vehicle-pedestrian interactions. In International Conference on Human-Computer Interaction (pp. 40-50). Cham: Springer International Publishing.
- Elahi, M.F., Sreeram, J.G., Luo, X. and Tian, R., 2021, September. A Novel Adaptation of Information Extraction Algorithm to Process Natural Text Descriptions of Pedestrian Encounters. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC) (pp. 1906-1912). IEEE.
- Sreeram, J.G., Luo, X. and Tian, R., 2021. Contextual and Behavior Factors Extraction from Pedestrian Encounter Scenes Using Deep Language Models. In Big Data Analytics and Knowledge Discovery: 23rd International Conference, DaWaK 2021, Virtual Event, September 27–30, 2021, Proceedings 23 (pp. 131-136). Springer International Publishing.
- Elahi, M., Tian, R., and Luo, X., 2022. Flexible and Scalable Annotation Tool to Develop Scene Understanding Datasets. Workshop on Human-in-the-Loop Data Analytics (HILDA 2022), ACM SIGMOD/PODS Conference, June 12-17, Philadelphia, PA.
- Elahi, M., Jing, T., Ding, Z., and Tian, R., MinDReaD: Mining Decision-Making Reasoning Data at Micro Level, International Journal of Human-Computer Interaction, (Under Revision).
Demo of Experiment Results
Benchmark Dataset
- Pedestrian Situated Intent (PSI) Benchmark Dataset (http://situated-intent.net/pedestrian_dataset/)
- 210 videos are randomly sampled from the naturalistic driving dataset
- 75 subjects
- Age ranges from 19 to 77
- Personality and driving styles are recorded for all the subjects
- Each subject completed 1.5 hours of training and 15 hours of video annotation experiment
