System and method for pedestrian road crossing intention detection

US12423950B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12423950-B2
Application numberUS-202318311393-A
CountryUS
Kind codeB2
Filing dateMay 3, 2023
Priority dateMay 3, 2023
Publication dateSep 23, 2025
Grant dateSep 23, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for classifying a road crossing intention of a pedestrian includes a processor including a pretrained image encoder generating an image embedding based upon an input image. The system further includes a remote server receiving the image embedding. The remote server device further references a plurality of pretrained image and text embeddings each corresponding to either a positive road crossing intention or a negative road crossing intention. The remote server device further determines a plurality of proximity values evaluating whether the input image is closer to the positive road crossing intention or the negative road crossing intention, evaluating the image embedding against each of the pretrained embeddings. The remote server device further classifies a road crossing intention of the pedestrian based upon the plurality of proximity values. The system further includes generates a road crossing intention output based upon the road crossing intention of the pedestrian.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer vision system for classifying a road crossing intention of a pedestrian, the system comprising: a camera device providing an input image of the pedestrian; a processor including a pretrained image encoder configured for generating an image embedding based upon the input image; a remote server device configured for: receiving the image embedding from the processor; referencing a plurality of pretrained image embeddings corresponding to a positive road crossing intention; referencing a plurality of pretrained text caption embeddings corresponding to the positive road crossing intention; referencing a plurality of pretrained image embeddings corresponding to a negative road crossing intention; referencing a plurality of pretrained corresponding text embeddings corresponding to the negative road crossing intention; determining a plurality of proximity values evaluating whether the input image is closer to the positive road crossing intention or the negative road crossing intention, the plurality of proximity values including: a first portion of the plurality of proximity values including a proximity of the image embedding based upon the input image to each of the plurality of pretrained image embeddings corresponding to the positive road crossing intention; a second portion of the plurality of proximity values including a proximity of the image embedding based upon the input image to each of the plurality of pretrained text caption embeddings corresponding to the positive road crossing intention; a third portion of the plurality of proximity values including a proximity of the image embedding based upon the input image to each of the plurality of pretrained image embeddings corresponding to the negative road crossing intention; and a fourth portion of the plurality of proximity values including a proximity of the image embedding based upon the input image to each of the plurality of pretrained corresponding text embeddings corresponding to the negative road crossing intention; and classifying a road crossing intention of the pedestrian based upon the plurality of proximity values; and a computer vision controller configured for: receiving the road crossing intention of the pedestrian; and generating a road crossing intention output based upon the road crossing intention of the pedestrian. 2. The system of claim 1 , wherein the remote server device is further configured for: evaluating a classification error of the classifying; and iteratively correcting the plurality of pretrained text caption embeddings corresponding to the positive road crossing intention or the plurality of pretrained text caption embeddings corresponding to the negative road crossing intention to minimize the classification error. 3. The system of claim 1 , wherein the remote server device includes a neural network; and wherein the neural network is configured for classifying the road crossing intention. 4. The system of claim 3 , wherein the neural network is trained to determine the plurality of proximity values. 5. The system of claim 3 , wherein the neural network is configured for utilizing the plurality of proximity values as inputs; and wherein the neural network is further configured for classifying the road crossing intention as an output. 6. The system of claim 1 , wherein classifying the road crossing intention of the pedestrian based upon the plurality of proximity values includes: determining an average proximity value of the image embedding corresponding to the positive road crossing intention as an average of the first portion and the second portion; determining an average proximity value of the image embedding corresponding to the negative road crossing intention as an average of the third portion and the fourth portion; determining a smaller of the average proximity value of the image embedding corresponding to the positive road crossing intention and the average proximity value of the image embedding corresponding to the negative road crossing intention to determine a minimum overall proximity measure; and classifying the road crossing intention based upon the minimum overall proximity measure. 7. The system of claim 1 , wherein classifying the road crossing intention of the pedestrian based upon the plurality of proximity values includes: determining a minimum proximity value of the image embedding corresponding to the positive road crossing intention as a minimum value of the first portion and the second portion; determining a minimum proximity value of the image embedding corresponding to the negative road crossing intention as a minimum value of the third portion and the fourth portion; determining a smaller of the minimum proximity value of the image embedding corresponding to the positive road crossing intention and the minimum proximity value of the image embedding corresponding to the negative road crossing intention to determine a minimum overall proximity measure; and classifying the road crossing intention based upon the minimum overall proximity measure. 8. The system of claim 1 , wherein classifying the road crossing intention of the pedestrian based upon the plurality of proximity values includes: determining a maximum proximity value of the image embedding corresponding to the positive road crossing intention as a maximum value of the first portion and the second portion; determining a maximum proximity value of the image embedding corresponding to the negative road crossing intention as a maximum value of the third portion and the fourth portion; determining a smaller of the maximum proximity value of the image embedding corresponding to the positive road crossing intention and the maximum proximity value of the image embedding corresponding to the negative road crossing intention to determine a minimum overall proximity measure; and classifying the road crossing intention based upon the minimum overall proximity measure. 9. The system of claim 1 , wherein the pretrained text caption embeddings corresponding to the positive road crossing intention are trained with text captions including “intending to cross the road”, “intend to cross the road”, “about to cross the road”, “crossing the road”, “planning to cross the road”, “aiming to cross the road”, or “about to be on the road”. 10. The system of claim 1 , wherein the pretrained image encoder is configured for generating the image embedding including data related to a sidewalk within the input image; and wherein the pretrained text caption embeddings corresponding to the negative road crossing intention are trained with text captions including “along the sidewalk”, “staying on the sidewalk”, “facing away from the road”, “away from the road”, or “remaining on the sidewalk”. 11. The system of claim 1 , wherein the camera device is within a vehicle. 12. The system of claim 11 , wherein the computer vision controller is within the vehicle; and wherein the vehicle generates an alert based upon the road crossing intention output. 13. The system of claim 1 , wherein the computer vision controller is an infrastructure device within an operating environment of the pedestrian; and wherein the infrastructure device generates an alert based upon the road crossing intention output. 14. The system of claim 1 , wherein the processor is within the remote server device. 15. The system of claim 1 , wherein the processor is within the computer vision controller. 16. A computer vision system for classifying a road crossing intention of a pedestrian, the system comprising: a device inc

Assignees

Inventors

Classifications

  • using neural networks · CPC title

  • Static body considered as a whole, e.g. static pedestrian or occupant recognition · CPC title

  • G06V20/52Primary

    Surveillance or monitoring of activities, e.g. for recognising suspicious objects (recognising microscopic objects G06V20/69) · CPC title

  • G06V20/58Primary

    Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title

  • where the origin of the information is a central station · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12423950B2 cover?
A system for classifying a road crossing intention of a pedestrian includes a processor including a pretrained image encoder generating an image embedding based upon an input image. The system further includes a remote server receiving the image embedding. The remote server device further references a plurality of pretrained image and text embeddings each corresponding to either a positive road…
Who is the assignee on this patent?
Gm Global Tech Operations Llc
What technology area does this patent fall under?
Primary CPC classification G06V20/52. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 23 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).