Who is the assignee on this patent?

Beijing Baidu Netcom Sci & Tech Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06T3/4046. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 06 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Image processing method and apparatus, device, and storage medium

US11893708B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11893708-B2
Application number	US-202117505889-A
Country	US
Kind code	B2
Filing date	Oct 20, 2021
Priority date	Jan 20, 2021
Publication date	Feb 6, 2024
Grant date	Feb 6, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided are an image processing method and apparatus, a device, and a storage medium, relating to the technical field of image processing, in particular to the artificial intelligence fields such as computer vision and deep learning. The specific implementation scheme is as follows: inputting a to-be-processed image into an encoding network to obtain a basic image feature, wherein the encoding network includes at least two cascaded overlapping encoding sub-networks which perform encoding and fusion processing on input data at at least two resolutions; and inputting the basic image feature into a decoding network to obtain a target image feature for pixel point classification, wherein the decoding network includes at least one cascaded overlapping decoding sub-network to perform decoding and fusion processing on input data at at least two resolutions respectively.

First claim

Opening claim text (preview).

What is claimed is: 1. An image processing method, comprising: inputting a to-be-processed image into an encoding network to obtain a basic image feature, wherein the encoding network comprises at least two cascaded overlapping encoding sub-networks, wherein each of non-head overlapping encoding sub-networks of the at least two overlapping encoding sub-networks performs encoding and fusion processing on input data of the each of non-head overlapping encoding sub-networks at at least two resolutions; and inputting the basic image feature into a decoding network to obtain a target image feature for pixel point classification, wherein the decoding network comprises at least one cascaded overlapping decoding sub-network, wherein each overlapping decoding sub-network of the at least one cascaded overlapping decoding sub-network performs decoding and fusion processing on input data of the each overlapping decoding sub-network at at least two resolutions. 2. The method according to claim 1 , wherein input data of a head overlapping encoding sub-network in the encoding network is the to-be-processed image; and input data of the each of non-head overlapping encoding sub-networks in the encoding network is determined according to output data of a previous overlapping encoding sub-network of the each of the non-head overlapping encoding sub-networks, and resolutions of pieces of input data of the non-head overlapping encoding sub-networks sequentially decrease. 3. The method according to claim 2 , wherein the each of the non-head overlapping encoding sub-networks in the encoding network comprises at least two encoding convolutional layers, and resolutions of pieces of input data of the at least two encoding convolutional layers are different from each other. 4. The method according to claim 3 , wherein for each of the at least two encoding convolutional layers of the each of the non-head overlapping encoding sub-networks in the encoding network, performing, by an encoding convolutional layer, feature extraction on input data of the encoding convolutional layer to obtain an encoding feature; adjusting, by the encoding convolutional layer, encoding features outputted by other encoding convolutional layers to obtain a first adjustment result, wherein the encoding convolutional layer and the other encoding convolutional layers belong to a same overlapping encoding sub-network; and performing, by the encoding convolutional layer, feature fusion on the first adjustment result and the encoding feature to obtain output data of the encoding convolutional layer. 5. The method according to claim 4 , wherein the each of the non-head overlapping encoding sub-networks comprises a high-resolution encoding convolutional layer and a low-resolution encoding convolutional layer; wherein the high-resolution encoding convolutional layer comprises a high-resolution encoding feature extraction unit and a high-resolution encoding feature fusion unit; and the low-resolution encoding convolutional layer comprises a low-resolution encoding feature extraction unit and a low-resolution encoding feature fusion unit; the high-resolution encoding feature extraction unit performs feature extraction on input data of the high-resolution encoding feature extraction unit to obtain a high-resolution encoding feature; the low-resolution encoding feature extraction unit performs feature extraction on input data of the low-resolution encoding feature extraction unit to obtain a low-resolution encoding feature; the high-resolution encoding feature fusion unit performs feature fusion on an upsampling result of the low-resolution encoding feature and the high-resolution encoding feature to obtain output data of the high-resolution encoding feature fusion unit, wherein the upsampling result has a same resolution as the high-resolution encoding feature; and the low-resolution encoding feature fusion unit performs feature fusion on a downsampling result of the high-resolution encoding feature and the low-resolution encoding feature to obtain output data of the low-resolution encoding feature fusion unit, wherein the downsampling result has a same resolution as the low-resolution encoding feature. 6. The method according to claim 1 , wherein input data of a head overlapping decoding sub-network in the decoding network is determined according to output data of a tail overlapping encoding sub-network in the encoding network; and input data of each of non-head overlapping decoding sub-networks in the decoding network is determined according to output data of a previous overlapping decoding sub-network of each of the non-head overlapping decoding sub-networks, and resolutions of pieces of input data of the non-head overlapping decoding sub-networks sequentially increase. 7. The method according to claim 6 , wherein the each of non-tail overlapping decoding sub-networks in the decoding network comprises at least two decoding convolutional layers, and resolutions of pieces of input data of the at least two decoding convolutional layers are different from each other. 8. The method according to claim 7 , wherein for each of the at least two decoding convolutional layers of the each of the non-tail overlapping decoding sub-networks in the decoding network, performing, by an decoding convolutional layer, feature reconstruction on input data of the decoding convolutional layer to obtain a decoding feature; and adjusting, by the decoding convolutional layer, decoding features outputted by other decoding convolutional layers to obtain a second adjustment result, wherein the decoding convolutional layer and the other decoding convolutional layers belong to a same overlapping decoding sub-network; and performing, by the decoding convolutional layer, feature fusion on the second adjustment result and the decoding feature to obtain output data of the decoding convolutional layer. 9. The method according to claim 8 , wherein the each of the non-tail overlapping decoding sub-networks comprises a high-resolution decoding convolutional layer and a low-resolution decoding convolutional layer; wherein the high-resolution decoding convolutional layer comprises a high-resolution decoding feature reconstruction unit and a high-resolution decoding feature fusion unit; and the low-resolution decoding convolutional layer comprises a low-resolution decoding feature reconstruction unit and a low-resolution decoding feature fusion unit; the high-resolution decoding feature reconstruction unit performs feature reconstruction on input data of the high-resolution decoding feature reconstruction unit to obtain a high-resolution decoding feature; the low-resolution decoding feature reconstruction unit performs feature reconstruction on input data of the low-resolution decoding feature reconstruction unit to obtain a low-resolution decoding feature; the high-resolution decoding feature fusion unit performs feature fusion on an upsampling result of the low-resolution decoding feature and the high-resolution decoding feature to obtain output data of the high-resolution decoding feature fusion unit, wherein the upsampling result has a same resolution as the high-resolution decoding feature; and the low-resolution decoding feature fusion unit performs feature fusion on a downsampling result of the high-resolution decoding feature and the low-resolution decoding feature to obtain output data of the low-resolution decoding feature fusion unit, wherein the downsampling result has a same resolution as the low-resolution decoding feature. 10. A non-transitory computer-readable storage medium having computer instructions stored thereon, wherein the computer instructions are configured to cause a computer to perfo

Assignees

Beijing Baidu Netcom Sci & Tech Co Ltd

Inventors

Classifications

G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06T3/4046Primary
using neural networks · CPC title
G06F18/213
Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods · CPC title
G06F18/253
of extracted features · CPC title

Patent family

Related publications grouped by family.

View patent family 75757442

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11893708B2 cover?: Provided are an image processing method and apparatus, a device, and a storage medium, relating to the technical field of image processing, in particular to the artificial intelligence fields such as computer vision and deep learning. The specific implementation scheme is as follows: inputting a to-be-processed image into an encoding network to obtain a basic image feature, wherein the encoding…
Who is the assignee on this patent?: Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06T3/4046. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 06 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).