Detecting and identifying objects represented in sensor data generated by multiple sensor systems

US12597262B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12597262-B2
Application numberUS-202217899734-A
CountryUS
Kind codeB2
Filing dateAug 31, 2022
Priority dateJul 13, 2022
Publication dateApr 7, 2026
Grant dateApr 7, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system includes a first sensor system of a first modality and a second sensor system of a second modality. The system further includes a computing system that is configured to detect and identify objects represented in sensor signals output by the first and second sensor systems. The computing system employs a hierarchical arrangement of transformers to fuse features of first sensor data output by the first sensor system and second sensor data output by the second sensor system.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system comprising: a first sensor system that generates first sensor data, the first sensor data corresponding to a first modality; a second sensor system that generates second sensor data, the second sensor data corresponding to a second modality; a computing system that is in communication with the first sensor system and the second sensor system, wherein the computing system comprises: a processor; and memory that stores computer-executable instructions that, when executed by the processor, cause the processor to perform acts comprising: generating, by a first transformer, a first output based upon the first sensor data, wherein the first output comprises identities of objects determined by the first transformer to be represented in the first sensor data and corresponding locations of the objects in the first sensor data; generating, by a second transformer, a second output based upon the second sensor data, wherein the second output comprises identities of objects determined by the second transformer to be represented in the second sensor data and corresponding locations of the objects in the second sensor data; and generating, by a third transformer, a third output based upon the first output and the second output, wherein the third transformer comprises an encoder and a decoder, wherein the encoder processes the second output and the decoder, using cross-attention and self-attention, processes the first output, wherein the cross-attention correlates objects between the first output and the second output, wherein the self-attention evaluates consistency of relationships between the correlated objects, and wherein the third output comprises identities of objects determined by the third transformer to be in an environment of the system and corresponding locations of the objects in the environment of the system. 2 . The system of claim 1 , wherein the first sensor system is a camera and the second sensor system is a radar sensor system. 3 . The system of claim 1 , wherein at least one of the first sensor system or the second sensor system is a lidar system. 4 . The system of claim 1 , the acts further comprising: extracting features from the first sensor data; and providing the features and positional encodings to the first transformer, wherein the first transformer generates the first output based upon the extracted features and the positional encodings. 5 . The system of claim 4 , the acts further comprising: extracting second features from the second sensor data; and providing the second features and second positional encodings to the second transformer, wherein the second transformer generates the second output based upon the extracted second features and the second positional encodings. 6 . The system of claim 1 , wherein the first sensor data is an image and the second sensor data is a point cloud. 7 . The system of claim 1 , wherein: the first transformer outputs the first output comprising first vectors, where each vector in the first vectors corresponds to a respective region in the first sensor data and each vector in the first vectors indicates a type of object predicted as being included in the region in the first sensor data; the second transformer outputs the second output comprising second vectors, wherein each vector in the second vectors corresponds to a respective region in the second sensor data and each vector in the second vectors indicates a type of object predicted as being included in the region in the second sensor data; and the third transformer receives the first vectors and the second vectors as input data and outputs the third output comprising third vectors, where each vector in the third vectors corresponds to a respective region in the environment of the system and each vector in the third vectors indicates a type of object predicted as being included in the region in the environment of the system. 8 . A method performed by a computing system, the method comprising: generating, by a first transformer, a first output based upon first sensor data generated by a first sensor system, wherein the first output comprises identities of objects determined by the first transformer to be represented in the first sensor data and corresponding locations of the objects in the first sensor data, and further wherein the first sensor data is in a first modality; generating, by a second transformer, a second output based upon second sensor data generated by a second sensor system, wherein the second output comprises identities of objects determined by the second transformer to be represented in the second sensor data and corresponding locations of the objects in the second sensor data, and further wherein the second sensor data is in a second modality; and generating, by a third transformer, a third output based upon the first output and the second output, wherein the third transformer comprises an encoder and a decoder, wherein the encoder processes the second output and the decoder, using cross-attention and self-attention, processes the first output, wherein the cross-attention correlates objects between the first output and the second output, wherein the self-attention evaluates consistency of relationships between the correlated objects, and wherein the third output comprises identities of objects determined by the third transformer to be in an environment of the first sensor system and the second sensor system and corresponding locations of the objects in the environment of the first sensor system and the second sensor system. 9 . The method of claim 8 , wherein the first sensor system is a camera and the second sensor system is a radar sensor system. 10 . The method of claim 8 , wherein at least one of the first sensor system or the second sensor system is a lidar system. 11 . The method of claim 8 , further comprising: extracting features from the first sensor data; and providing the features and positional encodings to the first transformer, wherein the first transformer generates the first output based upon the extracted features and the positional encodings. 12 . The method of claim 11 , further comprising: extracting second features from the second sensor data; and providing the second features and second positional encodings to the second transformer, wherein the second transformer generates the second output based upon the extracted second features and the second positional encodings. 13 . The method of claim 8 , wherein the first sensor data is an image and the second sensor data is a point cloud. 14 . The method of claim 8 , wherein: the first transformer outputs the first output comprising first vectors, where each vector in the first vectors corresponds to a respective region in the first sensor data and each vector in the first vectors indicates a type of object predicted as being included in the region in the first sensor data; the second transformer outputs the second output comprising second vectors, wherein each vector in the second vectors corresponds to a respective region in the second sensor data and each vector in the second vectors indicates a type of object predicted as being included in the region in the second sensor data; and the third transformer receives the first vectors and the second vectors as input data and outputs the third output comprising third vectors, where each vector in the third vectors corresponds to a respective region in the environment of the first sensor system and the second sensor system and each vector in the third vectors indicates a type of object predicted as being included in the reg

Assignees

Inventors

Classifications

  • of land vehicles · CPC title

  • Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders · CPC title

  • of land vehicles · CPC title

  • of aircraft or spacecraft · CPC title

  • of marine craft · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12597262B2 cover?
A system includes a first sensor system of a first modality and a second sensor system of a second modality. The system further includes a computing system that is configured to detect and identify objects represented in sensor signals output by the first and second sensor systems. The computing system employs a hierarchical arrangement of transformers to fuse features of first sensor data outp…
Who is the assignee on this patent?
Gm Cruise Holdings Llc
What technology area does this patent fall under?
Primary CPC classification G06V20/56. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 07 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).