Image processing method, electronic device and readable storage medium
US-2023080693-A1 · Mar 16, 2023 · US
US12346405B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12346405-B2 |
| Application number | US-202218055393-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 14, 2022 |
| Priority date | Mar 8, 2022 |
| Publication date | Jul 1, 2025 |
| Grant date | Jul 1, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Provided are a joint perception model training method, a joint perception method, a device, and a storage medium. The joint perception model training method includes: acquiring sample images and perception tags of the sample images; acquiring a preset joint perception model, where the joint perception model includes a feature extraction network and a joint perception network; performing feature extraction on the sample images through the feature extraction network to obtain target sample features; performing joint perception through the joint perception network according to the target sample features to obtain perception prediction results; and training the preset joint perception model according to the perception prediction results and the perception tags, where the joint perception includes executing at least two perception tasks.
Opening claim text (preview).
What is claimed is: 1. A joint perception model training method executed by an electronic device and comprising: acquiring sample images and perception tags of the sample images; acquiring a preset deep learning joint perception model, wherein the preset deep learning joint perception model comprises a feature extraction network and a joint perception network; performing feature extraction on the sample images through the feature extraction network to obtain target sample features; performing joint perception through the joint perception network according to the target sample features to obtain perception prediction results; and training the preset deep learning joint perception model according to the perception prediction results and the perception tags, wherein the joint perception comprises executing at least two perception tasks; wherein the feature extraction network comprises a base module and at least two first multi-path modules; wherein performing feature extraction on the sample images through the feature extraction network to obtain the target sample features comprises: performing downsampling on the sample images through the base module to obtain initial sample features of each scale of at least two scales, wherein the at least two first multi-path modules have a one-to-one correspondence with the at least two scales; and for the initial sample features of each scale of at least two scales, performing feature extraction of different perception tasks on the initial sample features of the scale through a first multi-path module corresponding to the scale among the at least two first multi-path modules to obtain target sample features under the scale; wherein each first multi-path module of the at least two first multi-path modules comprises a first split subnetwork, a first feature extraction subnetwork, and a first fusion subnetwork; and wherein performing feature extraction of different perception tasks on the initial sample features of the scale through the first multi-path module corresponding to the scale among the at least two first multi-path modules to obtain the target sample features under the scale comprises: splitting the initial sample features of the scale through the first split subnetwork in the first multi-path module corresponding to the scale according to a channel dimension to obtain first to-be-fused sample features and first to-be-processed sample features; performing feature extraction on the first to-be-processed sample features through the first feature extraction subnetwork in the first multi-path module corresponding to the scale to obtain first target intermediate sample features; and performing feature fusion on the first to-be-fused sample features and the first target intermediate sample features through the first fusion subnetwork in the first multi-path module corresponding to the scale to obtain the target sample features under the scale. 2. The method according to claim 1 , wherein the first feature extraction subnetwork comprises a first global perception module, a first local perception module, and a first perception fusion module; and performing feature extraction on the first to-be-processed sample features through the first feature extraction subnetwork in the first multi-path module corresponding to the scale to obtain the first target intermediate sample features comprises: performing global feature extraction on the first to-be-processed sample features through the first global perception module in the first multi-path module corresponding to the scale to obtain a first global intermediate sample feature; performing local feature extraction on the first to-be-processed sample features through the first local perception module in the first multi-path module corresponding to the scale to obtain first local intermediate sample features; and performing feature fusion on the first global intermediate sample feature and the first local intermediate sample features through the first perception fusion module in the first multi-path module corresponding to the scale to obtain the first target intermediate sample features. 3. The method according to claim 2 , wherein the first local perception module comprises at least two first local perception branches; and performing local feature extraction on the first to-be-processed sample features through the first local perception module in the first multi-path module corresponding to the scale to obtain the first local intermediate sample features comprises: performing local feature extraction on the first to-be-processed sample features through different first local perception branches in the first multi-path module corresponding to the scale under different receptive fields to obtain first local intermediate sample features under corresponding receptive fields. 4. The method according to claim 3 , wherein the first local perception module further comprises a first bypass branch; and performing local feature extraction on the first to-be-processed sample features through the first local perception module in the first multi-path module corresponding to the scale to obtain the first local intermediate sample features further comprises: in a case where the first bypass branch is a direct connection structure, directly taking the first to-be-processed sample features as the first local intermediate sample features; or, in a case where the first bypass branch comprises a first batch module, performing normalization processing on the first to-be-processed sample features through the first batch module in the first multi-path module corresponding to the scale to obtain the first local intermediate sample features. 5. The method according to claim 1 , wherein the joint perception network comprises a detection head module; and performing joint perception through the joint perception network according to the target sample features to obtain the perception prediction results comprises: performing joint perception on the target sample features through different task perception branches in the detection head module to obtain perception prediction results of the perception tasks. 6. The method according to claim 5 , wherein the detection head module comprises a second multi-path module and at least two task perception branches; and performing joint perception on the target sample features through the different task perception branches in the detection head module to obtain the perception prediction results of the perception tasks comprises: performing feature extraction of a same category of targets under different perception tasks on the target sample features through the second multi-path module to obtain perception sample features; and determining the perception prediction results of the perception tasks through each of the at least two task perception branches according to the perception sample features. 7. The method according to claim 6 , wherein the second multi-path module comprises a second split subnetwork, a second feature extraction subnetwork, and a second fusion subnetwork; and performing feature extraction of the same category of targets under different perception tasks on the target sample features through the second multi-path module to obtain the perception sample features comprises: splitting the target sample features through the second split subnetwork according to a channel dimension to obtain second to-be-fused sample features and second to-be-processed sample features; performing feature extraction on the second to-be-processed sample features through the second feature extraction subnetwork to obtain second target intermediate sample features; and performing feature fusion on the second to-be-fused sample features and the second target intermediate s
Combinations of networks · CPC title
using neural networks · CPC title
of extracted features · CPC title
Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title
Ensemble learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.