Who is the assignee on this patent?

Toyota Res Inst Inc, Toyota Motor Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06V20/56. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 08 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Network for multisweep 3D detection

US12354342B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12354342-B2
Application number	US-202217733160-A
Country	US
Kind code	B2
Filing date	Apr 29, 2022
Priority date	Apr 29, 2022
Publication date	Jul 8, 2025
Grant date	Jul 8, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and other embodiments described herein relate to a multi-task model that integrates recurrent models to improve handling of multi-sweep inputs. In one embodiment, a method includes acquiring sensor data from multiple modalities. The method includes separately encoding respective segments of the sensor data according to an associated one of the different modalities to form encoded features using separate encoders of a network. The method includes accumulating, in a detector, sparse features associated with sparse sensor inputs of the multiple modalities to densify the sparse features into dense features. The method includes providing observations according to the encoded features and the sparse features using the network.

First claim

Opening claim text (preview).

What is claimed is: 1. A perception system, comprising: one or more processors; a memory communicably coupled to the one or more processors and storing instructions that, when executed by the one or more processors, cause the one or more processors to: acquire sensor data from multiple modalities; separately encode respective segments of the sensor data according to an associated one of the multiple modalities to form encoded features using separate encoders of a network; accumulate, in a detector, sparse features associated with sparse sensor inputs of the multiple modalities to densify the sparse features into dense features by processing depth maps from two separate time steps using a detector of the network that includes two separate pipelines for the respective depth maps; and provide observations according to the encoded features and the sparse features using the network. 2. The perception system of claim 1 , wherein the instructions to acquire the sensor data from the multiple modalities include instructions to receive the sensor data from at least one multi-sweep sensor. 3. The perception system of claim 1 , wherein accumulate the sparse features include instructions to the two separate pipelines having recurrent neural networks (RNNs). 4. The perception system of claim 3 , wherein the instructions to provide the observations include instructions to process the sparse features according to a detection head and a flow head of the detector to generate three-dimensional bounding boxes and scene flow for the sensor data. 5. The perception system of claim 1 , wherein the instructions to separately encode the sensor data include instructions to selectively apply separate encoders to the respective segments of the sensor data according to available ones of the different modalities and fusing the encoded features together using separate fusion heads. 6. The perception system of claim 1 , wherein the instructions to provide the observations include instructions to apply separate decoders to the encoded features that are associated with different functions of the network. 7. The perception system of claim 1 , wherein the instructions to accumulate the sparse features include instructions to generate depth maps for respective segments of the sensor data associated with sparse inputs, including at least one of radar and LiDAR. 8. The perception system of claim 1 , wherein the instructions to accumulate include instructions to accumulate the sparse features using a parallel pipeline that includes separate recurrent neural networks operating on successive inputs from the encoders. 9. A non-transitory computer-readable medium including instructions that, when executed by one or more processors, cause the one or more processors to: acquire sensor data from multiple modalities; separately encode respective segments of the sensor data according to an associated one of the multiple modalities to form encoded features using separate encoders of a network; accumulate, in a detector, sparse features associated with sparse sensor inputs of the multiple modalities to densify the sparse features into dense features by processing depth maps from two separate time steps using a detector of the network that includes two separate pipelines for the respective depth maps; and provide observations according to the encoded features and the sparse features using the network. 10. The non-transitory computer-readable medium of claim 9 , wherein the instructions to acquire the sensor data from the multiple modalities include instructions to receive the sensor data from at least one multi-sweep sensor. 11. The non-transitory computer-readable medium of claim 9 , wherein the two separate pipelines having recurrent neural networks (RNNs). 12. The non-transitory computer-readable medium of claim 11 , wherein the instructions to provide the observations include instructions to process the sparse features according to a detection head and a flow head of the detector to generate three-dimensional bounding boxes and scene flow for the sensor data. 13. The non-transitory computer-readable medium of claim 9 , wherein the instructions to separately encode the sensor data include instructions to selectively apply separate encoders to the respective segments of the sensor data according to available ones of the different modalities and fusing the encoded features together using separate fusion heads. 14. A method, comprising: acquiring sensor data from multiple modalities; separately encoding respective segments of the sensor data according to an associated one of the multiple modalities to form encoded features using separate encoders of a network; accumulating, in a detector, sparse features associated with sparse sensor inputs of the multiple modalities to densify the sparse features into dense features by processing depth maps from two separate time steps using a detector of the network that includes two separate pipelines for the respective depth maps; and providing observations according to the encoded features and the sparse features using the network. 15. The method of claim 14 , wherein acquiring the sensor data from the multiple modalities includes receiving the sensor data from at least one multi-sweep sensor. 16. The method of claim 14 , wherein the two separate pipelines having recurrent neural networks (RNNs). 17. The method of claim 16 , wherein providing the observations includes processing the sparse features according to a detection head and a flow head of the detector to generate three-dimensional bounding boxes and scene flow for the sensor data. 18. The method of claim 14 , wherein separately encoding the sensor data includes selectively applying separate encoders to the respective segments of the sensor data according to available ones of the multiple modalities and fusing the encoded features together using separate fusion heads. 19. The method of claim 14 , wherein providing the observations includes applying separate decoders to the encoded features that are associated with different functions of the network. 20. The method of claim 14 , wherein accumulating the sparse features includes generating depth maps for respective segments of the sensor data associated with sparse inputs, including at least one of radar and LiDAR.

Assignees

Inventors

Classifications

G01S7/417
involving the use of neural networks · CPC title
G01S13/865
Combination of radar systems with lidar systems · CPC title
G06V20/56Primary
exterior to a vehicle by using sensors mounted on the vehicle · CPC title
G01S17/931
of land vehicles · CPC title
G06V10/62
relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking · CPC title

Patent family

Related publications grouped by family.

View patent family 88512417

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12354342B2 cover?: Systems, methods, and other embodiments described herein relate to a multi-task model that integrates recurrent models to improve handling of multi-sweep inputs. In one embodiment, a method includes acquiring sensor data from multiple modalities. The method includes separately encoding respective segments of the sensor data according to an associated one of the different modalities to form enco…
Who is the assignee on this patent?: Toyota Res Inst Inc, Toyota Motor Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06V20/56. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 08 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).