Who is the assignee on this patent?

Beijing Baidu Netcom Sci & Tech Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06V10/774. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Joint perception model training method, joint perception method, device, and storage medium

US12346405B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12346405-B2
Application number	US-202218055393-A
Country	US
Kind code	B2
Filing date	Nov 14, 2022
Priority date	Mar 8, 2022
Publication date	Jul 1, 2025
Grant date	Jul 1, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided are a joint perception model training method, a joint perception method, a device, and a storage medium. The joint perception model training method includes: acquiring sample images and perception tags of the sample images; acquiring a preset joint perception model, where the joint perception model includes a feature extraction network and a joint perception network; performing feature extraction on the sample images through the feature extraction network to obtain target sample features; performing joint perception through the joint perception network according to the target sample features to obtain perception prediction results; and training the preset joint perception model according to the perception prediction results and the perception tags, where the joint perception includes executing at least two perception tasks.

First claim

Opening claim text (preview).

What is claimed is: 1. A joint perception model training method executed by an electronic device and comprising: acquiring sample images and perception tags of the sample images; acquiring a preset deep learning joint perception model, wherein the preset deep learning joint perception model comprises a feature extraction network and a joint perception network; performing feature extraction on the sample images through the feature extraction network to obtain target sample features; performing joint perception through the joint perception network according to the target sample features to obtain perception prediction results; and training the preset deep learning joint perception model according to the perception prediction results and the perception tags, wherein the joint perception comprises executing at least two perception tasks; wherein the feature extraction network comprises a base module and at least two first multi-path modules; wherein performing feature extraction on the sample images through the feature extraction network to obtain the target sample features comprises: performing downsampling on the sample images through the base module to obtain initial sample features of each scale of at least two scales, wherein the at least two first multi-path modules have a one-to-one correspondence with the at least two scales; and for the initial sample features of each scale of at least two scales, performing feature extraction of different perception tasks on the initial sample features of the scale through a first multi-path module corresponding to the scale among the at least two first multi-path modules to obtain target sample features under the scale; wherein each first multi-path module of the at least two first multi-path modules comprises a first split subnetwork, a first feature extraction subnetwork, and a first fusion subnetwork; and wherein performing feature extraction of different perception tasks on the initial sample features of the scale through the first multi-path module corresponding to the scale among the at least two first multi-path modules to obtain the target sample features under the scale comprises: splitting the initial sample features of the scale through the first split subnetwork in the first multi-path module corresponding to the scale according to a channel dimension to obtain first to-be-fused sample features and first to-be-processed sample features; performing feature extraction on the first to-be-processed sample features through the first feature extraction subnetwork in the first multi-path module corresponding to the scale to obtain first target intermediate sample features; and performing feature fusion on the first to-be-fused sample features and the first target intermediate sample features through the first fusion subnetwork in the first multi-path module corresponding to the scale to obtain the target sample features under the scale. 2. The method according to claim 1 , wherein the first feature extraction subnetwork comprises a first global perception module, a first local perception module, and a first perception fusion module; and performing feature extraction on the first to-be-processed sample features through the first feature extraction subnetwork in the first multi-path module corresponding to the scale to obtain the first target intermediate sample features comprises: performing global feature extraction on the first to-be-processed sample features through the first global perception module in the first multi-path module corresponding to the scale to obtain a first global intermediate sample feature; performing local feature extraction on the first to-be-processed sample features through the first local perception module in the first multi-path module corresponding to the scale to obtain first local intermediate sample features; and performing feature fusion on the first global intermediate sample feature and the first local intermediate sample features through the first perception fusion module in the first multi-path module corresponding to the scale to obtain the first target intermediate sample features. 3. The method according to claim 2 , wherein the first local perception module comprises at least two first local perception branches; and performing local feature extraction on the first to-be-processed sample features through the first local perception module in the first multi-path module corresponding to the scale to obtain the first local intermediate sample features comprises: performing local feature extraction on the first to-be-processed sample features through different first local perception branches in the first multi-path module corresponding to the scale under different receptive fields to obtain first local intermediate sample features under corresponding receptive fields. 4. The method according to claim 3 , wherein the first local perception module further comprises a first bypass branch; and performing local feature extraction on the first to-be-processed sample features through the first local perception module in the first multi-path module corresponding to the scale to obtain the first local intermediate sample features further comprises: in a case where the first bypass branch is a direct connection structure, directly taking the first to-be-processed sample features as the first local intermediate sample features; or, in a case where the first bypass branch comprises a first batch module, performing normalization processing on the first to-be-processed sample features through the first batch module in the first multi-path module corresponding to the scale to obtain the first local intermediate sample features. 5. The method according to claim 1 , wherein the joint perception network comprises a detection head module; and performing joint perception through the joint perception network according to the target sample features to obtain the perception prediction results comprises: performing joint perception on the target sample features through different task perception branches in the detection head module to obtain perception prediction results of the perception tasks. 6. The method according to claim 5 , wherein the detection head module comprises a second multi-path module and at least two task perception branches; and performing joint perception on the target sample features through the different task perception branches in the detection head module to obtain the perception prediction results of the perception tasks comprises: performing feature extraction of a same category of targets under different perception tasks on the target sample features through the second multi-path module to obtain perception sample features; and determining the perception prediction results of the perception tasks through each of the at least two task perception branches according to the perception sample features. 7. The method according to claim 6 , wherein the second multi-path module comprises a second split subnetwork, a second feature extraction subnetwork, and a second fusion subnetwork; and performing feature extraction of the same category of targets under different perception tasks on the target sample features through the second multi-path module to obtain the perception sample features comprises: splitting the target sample features through the second split subnetwork according to a channel dimension to obtain second to-be-fused sample features and second to-be-processed sample features; performing feature extraction on the second to-be-processed sample features through the second feature extraction subnetwork to obtain second target intermediate sample features; and performing feature fusion on the second to-be-fused sample features and the second target intermediate s

Assignees

Beijing Baidu Netcom Sci & Tech Co Ltd

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06V10/82
using neural networks · CPC title
G06V10/806
of extracted features · CPC title
G06V20/46
Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title
G06N20/20
Ensemble learning · CPC title

Patent family

Related publications grouped by family.

View patent family 81033009

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12346405B2 cover?: Provided are a joint perception model training method, a joint perception method, a device, and a storage medium. The joint perception model training method includes: acquiring sample images and perception tags of the sample images; acquiring a preset joint perception model, where the joint perception model includes a feature extraction network and a joint perception network; performing feature…
Who is the assignee on this patent?: Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06V10/774. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Image processing method, electronic device and readable storage medium

Defending multimodal fusion models against single-source adversaries

Object recognition method and apparatus

Image processing method and device, training method of neural network, and storage medium

Systems and Methods for Autonomous Vehicle Systems Simulation

Multi-task perception network with applications to scene understanding and advanced driver-assistance system

Frequently asked questions