Detecting and identifying objects represented in sensor data generated by multiple sensor systems

US2024020983A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024020983-A1
Application numberUS-202217899734-A
CountryUS
Kind codeA1
Filing dateAug 31, 2022
Priority dateJul 13, 2022
Publication dateJan 18, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system includes a first sensor system of a first modality and a second sensor system of a second modality. The system further includes a computing system that is configured to detect and identify objects represented in sensor signals output by the first and second sensor systems. The computing system employs a hierarchical arrangement of transformers to fuse features of first sensor data output by the first sensor system and second sensor data output by the second sensor system.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system comprising: a first sensor system that generates first sensor data, the first sensor data corresponding to a first modality; a second sensor system that generates second sensor data, the second sensor data corresponding to a second modality; a computing system that is in communication with the first sensor system and the second sensor system, wherein the computing system comprises: a processor; and memory that stores computer-executable instructions that, when executed by the processor, cause the processor to perform acts comprising: generating, by a first transformer, first output based upon the first sensor data, wherein the first output comprises identities of objects determined by the first transformer to be represented in the first sensor data; generating, by a second transformer, second output based upon the second sensor data, wherein the second output comprises identities of objects determined by the second transformer to be represented in the second sensor data; and generating, by a third transformer, third output based upon the first output and the second output, wherein the third output comprises identities of objects determined by the third transformer to be in an environment of the system. 2 . The system of claim 1 , wherein the third output further comprises locations of the identified objects. 3 . The system of claim 1 , wherein the first sensor system is a camera and the second sensor system is radar sensor system. 4 . The system of claim 1 , wherein at least one of the first sensor system or the second sensor system is a lidar system. 5 . The system of claim 1 , the acts further comprising: extracting features from the first sensor data; and providing the features and positional encodings to the first transformer, wherein the first transformer generates the first output based upon the extracted features and the positional encodings. 6 . The system of claim 5 , the acts further comprising: extracting second features from the second sensor data; and providing the second features and second positional encodings to the second transformer, wherein the second transformer generates the second output based upon the extracted second features and the second positional encodings. 7 . The system of claim 1 , wherein the first sensor data is an image and the second sensor data is a point cloud. 8 . A method performed by a computing system, the method comprising: generating, by a first transformer, first output based upon first sensor data generated by a first sensor system, wherein the first output comprises identities of objects determined by the first transformer to be represented in the first sensor data, and further wherein the first sensor data is in a first modality; generating, by a second transformer, second output based upon second sensor data generated by a second sensor system, wherein the second output comprises identities of objects determined by the second transformer to be represented in the second sensor data, and further wherein the second sensor data is in a second modality; and generating, by a third transformer, third output based upon the first output and the second output, wherein the third output comprises identities of objects determined by the third transformer to be in an environment of the computing system. 9 . The method of claim 8 , wherein the third output further comprises locations of the identified objects. 10 . The method of claim 8 , wherein the first sensor system is a camera and the second sensor system is radar sensor system. 11 . The method of claim 8 , wherein at least one of the first sensor system or the second sensor system is a lidar system. 12 . The method of claim 8 , further comprising: extracting features from the first sensor data; and providing the features and positional encodings to the first transformer, wherein the first transformer generates the first output based upon the extracted features and the positional encodings. 13 . The method of claim 12 , further comprising: extracting second features from the second sensor data; and providing the second features and second positional encodings to the second transformer, wherein the second transformer generates the second output based upon the extracted second features and the second positional encodings. 14 . The method of claim 8 , wherein the first sensor data is an image and the second sensor data is a point cloud. 15 . A computer-readable storage medium comprising instructions that, when executed by a processor, cause the processor to perform acts comprising: generating, by a first transformer, first output based upon first sensor data generated by a first sensor system, wherein the first output comprises identities of objects determined by the first transformer to be represented in the first sensor data, and further wherein the first sensor data is in a first modality; generating, by a second transformer, second output based upon second sensor data generated by a second sensor system, wherein the second output comprises identities of objects determined by the second transformer to be represented in the second sensor data, and further wherein the second sensor data is in a second modality; and generating, by a third transformer, third output based upon the first output and the second output, wherein the third output comprises identities of objects determined by the third transformer to be in an environment of the first sensor system and the second sensor system. 16 . The computer-readable storage medium of claim 15 , wherein the third output further comprises locations of the identified objects. 17 . The computer-readable storage medium of claim 15 , wherein the first sensor system is a camera and the second sensor system is radar sensor system. 18 . The computer-readable storage medium of claim 15 , wherein at least one of the first sensor system or the second sensor system is a lidar system. 19 . The computer-readable storage medium of claim 15 , the acts further comprising: extracting features from the first sensor data; and providing the features and positional encodings to the first transformer, wherein the first transformer generates the first output based upon the extracted features and the positional encodings. 20 . The computer-readable storage medium of claim 19 , the acts further comprising: extracting second features from the second sensor data; and providing the second features and second positional encodings to the second transformer, wherein the second transformer generates the second output based upon the extracted second features and the second positional encodings.

Assignees

Inventors

Classifications

  • G06V20/56Primary

    exterior to a vehicle by using sensors mounted on the vehicle · CPC title

  • of land vehicles · CPC title

  • Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders · CPC title

  • of land vehicles · CPC title

  • G01S13/867Primary

    Combination of radar systems with cameras · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024020983A1 cover?
A system includes a first sensor system of a first modality and a second sensor system of a second modality. The system further includes a computing system that is configured to detect and identify objects represented in sensor signals output by the first and second sensor systems. The computing system employs a hierarchical arrangement of transformers to fuse features of first sensor data outp…
Who is the assignee on this patent?
Gm Cruise Holdings Llc
What technology area does this patent fall under?
Primary CPC classification G06V20/56. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 18 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).