What technology area does this patent fall under?

Primary CPC classification G06V10/44. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jan 23 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Multi-modal encoder channel fusion with cross-modality awareness

US2025029355A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2025029355-A1
Application number	US-202318354074-A
Country	US
Kind code	A1
Filing date	Jul 18, 2023
Priority date	Jul 18, 2023
Publication date	Jan 23, 2025
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This disclosure provides systems, methods, and devices for vehicle driving assistance systems that support image processing. In a first aspect, a method includes receiving an image frame representing a scene; receiving point cloud data representing the scene; determining first sets of image frame features; determining second sets of point cloud data features based on a plurality of voxels representing the point cloud data; determining a third set of features of the image frame based on a first set of features of the plurality of first sets of features of the image frame and a second set of features of the plurality of second sets of features of the point cloud data; and outputting fused data that combines the third set of features of the image frame and a fourth set of features of the point cloud data. Other aspects and features are also claimed and described.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for image processing for use in a vehicle assistance system, comprising: receiving an image frame representing a scene; receiving point cloud data representing the scene; determining a plurality of first sets of features of the image frame; determining a plurality of second sets of features of the point cloud data based on a plurality of voxels representing the point cloud data; determining a third set of features of the image frame based on a first set of features of the plurality of first sets of features of the image frame and a second set of features of the plurality of second sets of features of the point cloud data; and outputting fused data that combines the third set of features of the image frame and a fourth set of features of the point cloud data. 2 . The method of claim 1 , further comprising: determining the first set of features and the second set of features based on a first statistical indicator associated with the plurality of first sets of features of the image frame and a second statistical indicator associated with the plurality of second sets of features of the point cloud data. 3 . The method of claim 1 , wherein each of the plurality of first sets of features of the image frame corresponds to a respective stage of a plurality of first stages of a first encoder, and wherein each of the plurality of second sets of features of the point cloud data corresponds to a respective stage of a plurality of second stages of a second encoder. 4 . The method of claim 3 , wherein the respective stage corresponding to the first set of features of the image frame corresponds to the respective stage corresponding to the second set of features of the point cloud data. 5 . The method of claim 1 , wherein the second set of features includes a plurality of pairs of perspective view features of the point cloud data and BEV features of the point cloud data. 6 . The method of claim 5 , wherein the third set of features of the image frame is determined by an encoder, the method further comprising: adding perspective view features of a pair of the plurality of pairs as a first channel of the encoder; and adding BEV features of the pair as a second channel of the encoder. 7 . The method of claim 5 , wherein the plurality of perspective view features and the plurality of BEV features are each determined from the plurality of voxels using global max pooling. 8 . The method of claim 1 , wherein the point cloud data is received from a ranging sensor. 9 . The method of claim 1 , further comprising detecting an object represented in the image frame based on the fused data. 10 . The method of claim 9 , further comprising controlling a function of a vehicle based on the object detected. 11 . An apparatus, comprising: a memory storing processor-readable code; and at least one processor coupled to the memory, the at least one processor configured to execute the processor-readable code to cause the at least one processor to perform operations including: receiving an image frame representing a scene; receiving point cloud data representing the scene; determining a plurality of first sets of features of the image frame; determining a plurality of second sets of features of the point cloud data based on a plurality of voxels representing the point cloud data; determining a third set of features of the image frame based on a first set of features of the plurality of first sets of features of the image frame and a second set of features of the plurality of second sets of features of the point cloud data; and outputting fused data that combines the third set of features of the image frame and a fourth set of features of the point cloud data. 12 . The apparatus of claim 11 , the operations further comprising: determining the first set of features and the second set of features based on first statistical indicators associated with the plurality of first sets of features of the image frame and second statistical indicators associated with the plurality of second sets of features of the point cloud data. 13 . The apparatus of claim 11 , wherein each of the plurality of first sets of features of the image frame corresponds to a respective stage of a plurality of first stages of a first encoder, and wherein each of the plurality of second sets of features of the point cloud data corresponds to a respective stage of a plurality of second stages of a second encoder. 14 . The apparatus of claim 13 , wherein the respective stage corresponding to the first set of features of the image frame corresponds to the respective stage corresponding to the second set of features of the point cloud data. 15 . The apparatus of claim 12 , wherein the second set of features includes a plurality of pairs of perspective view features of the point cloud data and BEV features of the point cloud data. 16 . The apparatus of claim 15 , wherein the third set of features of the image frame is determined by an encoder, the operations further comprising: adding perspective view features of a pair of the plurality of pairs as a first channel of the encoder; and adding BEV features of the pair as a second channel of the encoder. 17 . The apparatus of claim 15 , wherein the plurality of perspective view features and the plurality of BEV features are each determined from the plurality of voxels using global max pooling. 18 . The apparatus of claim 11 , wherein the point cloud data is received from a LiDAR sensor or a radar sensor. 19 . The apparatus of claim 11 , wherein the operations further including detecting an object represented in the image frame based on the fused data. 20 . The apparatus of claim 19 , wherein the operations further include controlling a function of a vehicle based on the object detected. 21 . A non-transitory computer-readable medium storing instructions that, when executed by a processor, cause the processor to perform operations comprising: receiving an image frame representing a scene; receiving point cloud data representing the scene; determining a plurality of first sets of features of the image frame; determining a plurality of second sets of features of the point cloud data based on a plurality of voxels representing the point cloud data; determining a third set of features of the image frame based on a first set of features of the plurality of first sets of features of the image frame and a second set of features of the plurality of second sets of features of the point cloud data; and outputting fused data that combines the third set of features of the image frame and a fourth set of features of the point cloud data. 22 . The non-transitory, computer-readable medium of claim 21 , the operations further comprising: determining the first set of features and the second set of features based on first statistical indicators associated with the plurality of first sets of features of the image frame and second statistical indicators associated with the plurality of second sets of features of the point cloud data. 23 . The non-transitory, computer-readable medium of claim 21 , wherein each of the plurality of third sets of features of the image frame corresponds to a respective stage of a plurality of first stages of a first encoder, and wherein each of the plurality of fourth sets of features of the point cloud data corresponds to a respective stage of a plurality of second stag

Assignees

Qualcomm Inc

Inventors

Classifications

G06V2201/07
Target detection · CPC title
G06V10/758
Involving statistics of pixels or of feature values, e.g. histogram matching · CPC title
G06V10/806
of extracted features · CPC title
G06V10/44Primary
Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components · CPC title
G06V20/56
exterior to a vehicle by using sensors mounted on the vehicle · CPC title

Patent family

Related publications grouped by family.

View patent family 91700035

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025029355A1 cover?: This disclosure provides systems, methods, and devices for vehicle driving assistance systems that support image processing. In a first aspect, a method includes receiving an image frame representing a scene; receiving point cloud data representing the scene; determining first sets of image frame features; determining second sets of point cloud data features based on a plurality of voxels repre…
Who is the assignee on this patent?: Qualcomm Inc
What technology area does this patent fall under?: Primary CPC classification G06V10/44. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jan 23 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Method, apparatus for superimposing laser point clouds and high-precision map and electronic device

Dual mode map for autonomous vehicle

Determining traffic control features based on telemetry patterns within digital image representations of vehicle telemetry data

Three-Dimensional Object Detection

Frequently asked questions