Joint perception model training method, joint perception method, device, and storage medium

US12346405B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12346405-B2
Application numberUS-202218055393-A
CountryUS
Kind codeB2
Filing dateNov 14, 2022
Priority dateMar 8, 2022
Publication dateJul 1, 2025
Grant dateJul 1, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided are a joint perception model training method, a joint perception method, a device, and a storage medium. The joint perception model training method includes: acquiring sample images and perception tags of the sample images; acquiring a preset joint perception model, where the joint perception model includes a feature extraction network and a joint perception network; performing feature extraction on the sample images through the feature extraction network to obtain target sample features; performing joint perception through the joint perception network according to the target sample features to obtain perception prediction results; and training the preset joint perception model according to the perception prediction results and the perception tags, where the joint perception includes executing at least two perception tasks.

First claim

Opening claim text (preview).

What is claimed is: 1. A joint perception model training method executed by an electronic device and comprising: acquiring sample images and perception tags of the sample images; acquiring a preset deep learning joint perception model, wherein the preset deep learning joint perception model comprises a feature extraction network and a joint perception network; performing feature extraction on the sample images through the feature extraction network to obtain target sample features; performing joint perception through the joint perception network according to the target sample features to obtain perception prediction results; and training the preset deep learning joint perception model according to the perception prediction results and the perception tags, wherein the joint perception comprises executing at least two perception tasks; wherein the feature extraction network comprises a base module and at least two first multi-path modules; wherein performing feature extraction on the sample images through the feature extraction network to obtain the target sample features comprises: performing downsampling on the sample images through the base module to obtain initial sample features of each scale of at least two scales, wherein the at least two first multi-path modules have a one-to-one correspondence with the at least two scales; and for the initial sample features of each scale of at least two scales, performing feature extraction of different perception tasks on the initial sample features of the scale through a first multi-path module corresponding to the scale among the at least two first multi-path modules to obtain target sample features under the scale; wherein each first multi-path module of the at least two first multi-path modules comprises a first split subnetwork, a first feature extraction subnetwork, and a first fusion subnetwork; and wherein performing feature extraction of different perception tasks on the initial sample features of the scale through the first multi-path module corresponding to the scale among the at least two first multi-path modules to obtain the target sample features under the scale comprises: splitting the initial sample features of the scale through the first split subnetwork in the first multi-path module corresponding to the scale according to a channel dimension to obtain first to-be-fused sample features and first to-be-processed sample features; performing feature extraction on the first to-be-processed sample features through the first feature extraction subnetwork in the first multi-path module corresponding to the scale to obtain first target intermediate sample features; and performing feature fusion on the first to-be-fused sample features and the first target intermediate sample features through the first fusion subnetwork in the first multi-path module corresponding to the scale to obtain the target sample features under the scale. 2. The method according to claim 1 , wherein the first feature extraction subnetwork comprises a first global perception module, a first local perception module, and a first perception fusion module; and performing feature extraction on the first to-be-processed sample features through the first feature extraction subnetwork in the first multi-path module corresponding to the scale to obtain the first target intermediate sample features comprises: performing global feature extraction on the first to-be-processed sample features through the first global perception module in the first multi-path module corresponding to the scale to obtain a first global intermediate sample feature; performing local feature extraction on the first to-be-processed sample features through the first local perception module in the first multi-path module corresponding to the scale to obtain first local intermediate sample features; and performing feature fusion on the first global intermediate sample feature and the first local intermediate sample features through the first perception fusion module in the first multi-path module corresponding to the scale to obtain the first target intermediate sample features. 3. The method according to claim 2 , wherein the first local perception module comprises at least two first local perception branches; and performing local feature extraction on the first to-be-processed sample features through the first local perception module in the first multi-path module corresponding to the scale to obtain the first local intermediate sample features comprises: performing local feature extraction on the first to-be-processed sample features through different first local perception branches in the first multi-path module corresponding to the scale under different receptive fields to obtain first local intermediate sample features under corresponding receptive fields. 4. The method according to claim 3 , wherein the first local perception module further comprises a first bypass branch; and performing local feature extraction on the first to-be-processed sample features through the first local perception module in the first multi-path module corresponding to the scale to obtain the first local intermediate sample features further comprises: in a case where the first bypass branch is a direct connection structure, directly taking the first to-be-processed sample features as the first local intermediate sample features; or, in a case where the first bypass branch comprises a first batch module, performing normalization processing on the first to-be-processed sample features through the first batch module in the first multi-path module corresponding to the scale to obtain the first local intermediate sample features. 5. The method according to claim 1 , wherein the joint perception network comprises a detection head module; and performing joint perception through the joint perception network according to the target sample features to obtain the perception prediction results comprises: performing joint perception on the target sample features through different task perception branches in the detection head module to obtain perception prediction results of the perception tasks. 6. The method according to claim 5 , wherein the detection head module comprises a second multi-path module and at least two task perception branches; and performing joint perception on the target sample features through the different task perception branches in the detection head module to obtain the perception prediction results of the perception tasks comprises: performing feature extraction of a same category of targets under different perception tasks on the target sample features through the second multi-path module to obtain perception sample features; and determining the perception prediction results of the perception tasks through each of the at least two task perception branches according to the perception sample features. 7. The method according to claim 6 , wherein the second multi-path module comprises a second split subnetwork, a second feature extraction subnetwork, and a second fusion subnetwork; and performing feature extraction of the same category of targets under different perception tasks on the target sample features through the second multi-path module to obtain the perception sample features comprises: splitting the target sample features through the second split subnetwork according to a channel dimension to obtain second to-be-fused sample features and second to-be-processed sample features; performing feature extraction on the second to-be-processed sample features through the second feature extraction subnetwork to obtain second target intermediate sample features; and performing feature fusion on the second to-be-fused sample features and the second target intermediate s

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • using neural networks · CPC title

  • of extracted features · CPC title

  • Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title

  • Ensemble learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12346405B2 cover?
Provided are a joint perception model training method, a joint perception method, a device, and a storage medium. The joint perception model training method includes: acquiring sample images and perception tags of the sample images; acquiring a preset joint perception model, where the joint perception model includes a feature extraction network and a joint perception network; performing feature…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V10/774. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).