Human Cognition Models to Inspire AVs in Interaction Scenes

Human Cognition Models to Inspire AVs in Interaction Scenes

Last Updated: 01/22/2025 | All information is accurate and up-to-date

Zhang, Z., Elahi, M., Domeyer, J., and Tian, R., “Driver Temporal Segmentation of Pedestrian Crossing Intentions during Negotiations,” in IEEE Transactions on Intelligent Vehicles, (Under Review)

Driver Scene Understanding Model

We propose the event-segmentation-based scene understanding model based on the Theory of Mind to explain driver cognition during pedestrian interactions.

Main Assumption: driver and pedestrian negotiate crossing intentions

  • Intention is a commitment to certain actions within a time boundary
  • Pedestrians have present-oriented (low-level) and future-oriented (high-level) intentions
  • Pedestrian Situated Intent (PSI) is the pedestrian’s intention to cross the conflicting area before the ego-vehicle in dynamically changing situations involving the car, pedestrian, and contextual environment.
A diagram explaining the four steps in the event-segmentation-based understanding model.
Event-segmentation-based scene understanding model
  • Step 1: A driver automatically segments perceptual inputs at a coarse level (pedestrian intention).
  • Step 2: Within each segment, drivers can predict fine-level events (i.e., pedestrian actions) more accurately by comparing working memory with long-term memory.
  • Step 3: Coarse-level segmentation boundaries are identified when the prediction of fine-level events is no longer accurate, meaning estimated pedestrian intention changes.
  • Step 4: Working memory is updated to rebuild the course level segment (pedestrian intention) boundaries and loop back to step

Experiment and Data Analysis Process

A flexible and scalable annotation tool diagram for micro-level behaviors and reasonings
Flexible and Scalable Annotation Tool for Micro-Level Behaviors and Reasonings >>> NLP-based Human Reasoning Cue Extraction Algorithm
  1. Elahi, M.F., Luo, X. and Tian, R., 2020, July. A framework for modeling knowledge graphs via processing natural descriptions of vehicle-pedestrian interactions. In International Conference on Human-Computer Interaction (pp. 40-50). Cham: Springer International Publishing.
  2. Elahi, M.F., Sreeram, J.G., Luo, X. and Tian, R., 2021, September. A Novel Adaptation of Information Extraction Algorithm to Process Natural Text Descriptions of Pedestrian Encounters. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC) (pp. 1906-1912). IEEE.
  3. Sreeram, J.G., Luo, X. and Tian, R., 2021. Contextual and Behavior Factors Extraction from Pedestrian Encounter Scenes Using Deep Language Models. In Big Data Analytics and Knowledge Discovery: 23rd International Conference, DaWaK 2021, Virtual Event, September 27–30, 2021, Proceedings 23 (pp. 131-136). Springer International Publishing.
  4. Elahi, M., Tian, R., and Luo, X., 2022. Flexible and Scalable Annotation Tool to Develop Scene Understanding Datasets. Workshop on Human-in-the-Loop Data Analytics (HILDA 2022), ACM SIGMOD/PODS Conference, June 12-17, Philadelphia, PA.
  5. Elahi, M., Jing, T., Ding, Z., and Tian, R., MinDReaD: Mining Decision-Making Reasoning Data at Micro Level, International Journal of Human-Computer Interaction, (Under Revision).

Benchmark Dataset

  • Pedestrian Situated Intent (PSI) Benchmark Dataset (http://situated-intent.net/pedestrian_dataset/)
  • 210 videos are randomly sampled from the naturalistic driving dataset
  • 75 subjects
    • Age ranges from 19 to 77
    • Personality and driving styles are recorded for all the subjects
    • Each subject completed 1.5 hours of training and 15 hours of video annotation experiment