Network for multisweep 3D detection

US12354342B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12354342-B2
Application numberUS-202217733160-A
CountryUS
Kind codeB2
Filing dateApr 29, 2022
Priority dateApr 29, 2022
Publication dateJul 8, 2025
Grant dateJul 8, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and other embodiments described herein relate to a multi-task model that integrates recurrent models to improve handling of multi-sweep inputs. In one embodiment, a method includes acquiring sensor data from multiple modalities. The method includes separately encoding respective segments of the sensor data according to an associated one of the different modalities to form encoded features using separate encoders of a network. The method includes accumulating, in a detector, sparse features associated with sparse sensor inputs of the multiple modalities to densify the sparse features into dense features. The method includes providing observations according to the encoded features and the sparse features using the network.

First claim

Opening claim text (preview).

What is claimed is: 1. A perception system, comprising: one or more processors; a memory communicably coupled to the one or more processors and storing instructions that, when executed by the one or more processors, cause the one or more processors to: acquire sensor data from multiple modalities; separately encode respective segments of the sensor data according to an associated one of the multiple modalities to form encoded features using separate encoders of a network; accumulate, in a detector, sparse features associated with sparse sensor inputs of the multiple modalities to densify the sparse features into dense features by processing depth maps from two separate time steps using a detector of the network that includes two separate pipelines for the respective depth maps; and provide observations according to the encoded features and the sparse features using the network. 2. The perception system of claim 1 , wherein the instructions to acquire the sensor data from the multiple modalities include instructions to receive the sensor data from at least one multi-sweep sensor. 3. The perception system of claim 1 , wherein accumulate the sparse features include instructions to the two separate pipelines having recurrent neural networks (RNNs). 4. The perception system of claim 3 , wherein the instructions to provide the observations include instructions to process the sparse features according to a detection head and a flow head of the detector to generate three-dimensional bounding boxes and scene flow for the sensor data. 5. The perception system of claim 1 , wherein the instructions to separately encode the sensor data include instructions to selectively apply separate encoders to the respective segments of the sensor data according to available ones of the different modalities and fusing the encoded features together using separate fusion heads. 6. The perception system of claim 1 , wherein the instructions to provide the observations include instructions to apply separate decoders to the encoded features that are associated with different functions of the network. 7. The perception system of claim 1 , wherein the instructions to accumulate the sparse features include instructions to generate depth maps for respective segments of the sensor data associated with sparse inputs, including at least one of radar and LiDAR. 8. The perception system of claim 1 , wherein the instructions to accumulate include instructions to accumulate the sparse features using a parallel pipeline that includes separate recurrent neural networks operating on successive inputs from the encoders. 9. A non-transitory computer-readable medium including instructions that, when executed by one or more processors, cause the one or more processors to: acquire sensor data from multiple modalities; separately encode respective segments of the sensor data according to an associated one of the multiple modalities to form encoded features using separate encoders of a network; accumulate, in a detector, sparse features associated with sparse sensor inputs of the multiple modalities to densify the sparse features into dense features by processing depth maps from two separate time steps using a detector of the network that includes two separate pipelines for the respective depth maps; and provide observations according to the encoded features and the sparse features using the network. 10. The non-transitory computer-readable medium of claim 9 , wherein the instructions to acquire the sensor data from the multiple modalities include instructions to receive the sensor data from at least one multi-sweep sensor. 11. The non-transitory computer-readable medium of claim 9 , wherein the two separate pipelines having recurrent neural networks (RNNs). 12. The non-transitory computer-readable medium of claim 11 , wherein the instructions to provide the observations include instructions to process the sparse features according to a detection head and a flow head of the detector to generate three-dimensional bounding boxes and scene flow for the sensor data. 13. The non-transitory computer-readable medium of claim 9 , wherein the instructions to separately encode the sensor data include instructions to selectively apply separate encoders to the respective segments of the sensor data according to available ones of the different modalities and fusing the encoded features together using separate fusion heads. 14. A method, comprising: acquiring sensor data from multiple modalities; separately encoding respective segments of the sensor data according to an associated one of the multiple modalities to form encoded features using separate encoders of a network; accumulating, in a detector, sparse features associated with sparse sensor inputs of the multiple modalities to densify the sparse features into dense features by processing depth maps from two separate time steps using a detector of the network that includes two separate pipelines for the respective depth maps; and providing observations according to the encoded features and the sparse features using the network. 15. The method of claim 14 , wherein acquiring the sensor data from the multiple modalities includes receiving the sensor data from at least one multi-sweep sensor. 16. The method of claim 14 , wherein the two separate pipelines having recurrent neural networks (RNNs). 17. The method of claim 16 , wherein providing the observations includes processing the sparse features according to a detection head and a flow head of the detector to generate three-dimensional bounding boxes and scene flow for the sensor data. 18. The method of claim 14 , wherein separately encoding the sensor data includes selectively applying separate encoders to the respective segments of the sensor data according to available ones of the multiple modalities and fusing the encoded features together using separate fusion heads. 19. The method of claim 14 , wherein providing the observations includes applying separate decoders to the encoded features that are associated with different functions of the network. 20. The method of claim 14 , wherein accumulating the sparse features includes generating depth maps for respective segments of the sensor data associated with sparse inputs, including at least one of radar and LiDAR.

Assignees

Inventors

Classifications

  • involving the use of neural networks · CPC title

  • Combination of radar systems with lidar systems · CPC title

  • G06V20/56Primary

    exterior to a vehicle by using sensors mounted on the vehicle · CPC title

  • of land vehicles · CPC title

  • relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12354342B2 cover?
Systems, methods, and other embodiments described herein relate to a multi-task model that integrates recurrent models to improve handling of multi-sweep inputs. In one embodiment, a method includes acquiring sensor data from multiple modalities. The method includes separately encoding respective segments of the sensor data according to an associated one of the different modalities to form enco…
Who is the assignee on this patent?
Toyota Res Inst Inc, Toyota Motor Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V20/56. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 08 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).