Method and Apparatus for Detecting Salient Object in Image
US-2020143194-A1 · May 7, 2020 · US
US11430205B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11430205-B2 |
| Application number | US-201916723539-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 20, 2019 |
| Priority date | Jun 23, 2017 |
| Publication date | Aug 30, 2022 |
| Grant date | Aug 30, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and an apparatus for detecting a salient object in an image includes separately performing convolution processing corresponding to at least two convolutional layers on a to-be-processed image to obtain at least two first feature maps of the to-be-processed image, performing superposition processing on at least two first feature maps included in a superposition set in at least two sets to obtain at least two second feature maps of the to-be-processed image, the at least two sets are in a one-to-one correspondence with the at least two second feature maps, and a resolution of a first feature map included in the superposition set is lower than or equal to a resolution of a second feature map corresponding to the superposition set, and splicing the at least two second feature maps to obtain a saliency map.
Opening claim text (preview).
What is claimed is: 1. A method for detecting a salient object in an image, comprising: separately performing a first convolution processing corresponding to at least two convolutional layers on the image to obtain at least two first feature maps of the image, wherein resolutions of the at least two first feature maps are lower than a resolution of the image, and wherein a resolution of each of the at least two first feature maps is different; processing the at least two first feature maps to obtain at least two second feature maps of the image, wherein the at least two second feature maps are obtained by performing a first superposition processing on a part or all of the at least two first feature maps, wherein a resolution of each of the at least two second feature maps is different, and wherein the resolution of each of the at least two second feature maps is higher than or equal to a maximum resolution in the part or all of the at least two first feature maps; and splicing the at least two second feature maps based on a third weight corresponding to each of the at least two second feature maps to obtain a saliency map of the image, wherein the third weight is based on a difference between a saliency map of a training image and a reference saliency map corresponding to the training image. 2. The method of claim 1 , wherein performing the first superposition processing on the part or all of the at least two first feature maps comprises: upsampling a first feature map of the at least two first feature maps, in the part or all of the at least two first feature maps, to obtain a third feature map corresponding to the first feature map, wherein a resolution of the first feature map is lower than the resolution of the at least one second feature map, and wherein a resolution of the third feature map is equal to the resolution of the at least one second feature map; and performing a second superposition processing on the third feature map and a second first feature map, in the part or all of the at least two first feature maps and on which upsampling is not performed, to obtain the at least one second feature map. 3. The method of claim 2 , wherein performing the second superposition processing on the third feature map and the second first feature map comprises: obtaining a first weight corresponding to the second first feature map and a second weight corresponding to the third feature map; and performing, based on the first weight or the second weight, a third superposition processing on the third feature map and the second first feature map to obtain the at least one second feature map. 4. The method of claim 3 , wherein the first weight or the second weight is obtained by training based on a difference between a saliency map of a training image and a reference saliency map corresponding to the training image. 5. The method of claim 2 , wherein performing the second superposition processing on the third feature map and the second first feature map comprises performing superposition, convolution, and pooling processing on the third feature map and the second first feature map to obtain the at least one second feature map. 6. The method of claim 1 , wherein splicing the at least two second feature maps comprises: performing a second convolution processing on the at least two second feature maps to obtain features of the at least two second feature maps; and splicing the features to obtain the saliency map of the image. 7. The method of claim 1 , further comprising performing a first guided filtering on the saliency map of the image based on the image to obtain a segmented image of the image. 8. The method of claim 7 , wherein the saliency map is a first saliency map, wherein a resolution of the first saliency map is lower than the resolution of the image, and wherein performing the first guided filtering on the saliency map of the image comprises: upsampling the first saliency map to obtain a second saliency map, wherein a resolution of the second saliency map is the same as the resolution of the image; and performing a second guided filtering on the second saliency map based on the image to obtain the segmented image. 9. An apparatus for detecting a salient object in an image, comprising: a memory comprising instructions; and a processor coupled to the memory, wherein the instructions cause the processor to be configured to: separately perform a first convolution processing corresponding to at least two convolutional layers on the image to obtain at least two first feature maps of the image, wherein resolutions of the at least two first feature maps are lower than a resolution of the image, and wherein a resolution of each of the at least two first feature maps is different; process the at least two first feature maps to obtain at least two second feature maps of the image, wherein the at least two second feature maps are obtained by performing a first superposition processing on a part or all of the at least two first feature maps, wherein a resolution of each of the at least two second feature maps is different, and wherein the resolution of each of the at least two second feature maps is higher than or equal to a maximum resolution in the part or all of the at least two first feature maps; and splice the at least two second feature maps based on a third weight corresponding to each of the at least two second feature maps to obtain a saliency map of the image, wherein the third weight is based on a difference between a saliency map of a training image and a reference saliency map corresponding to the training image. 10. The apparatus of claim 9 , wherein the instructions further cause the processor to be configured to: upsample a first feature map of the at least two first feature maps, in the part or all of the at least two first feature maps, to obtain a third feature map corresponding to the first feature map, wherein a resolution of the first feature map is lower than the resolution of the at least one second feature map, and wherein a resolution of the third feature map is equal to the resolution of the at least one second feature map; and perform a second superposition processing on the third feature map and a second first feature map, in the part or all of the at least two first feature maps and on which upsampling is not performed, to obtain the at least one second feature map. 11. The apparatus of claim 10 , wherein the instructions further cause the processor to be configured to: obtain a first weight corresponding to the second first feature map and a second weight corresponding to the third feature map; and perform, based on the first weight or the second weight, a third superposition processing on the third feature map and the second first feature map to obtain the at least one second feature map. 12. The apparatus of claim 11 , wherein the first weight or the second weight is obtained by training based on a difference between a saliency map of a training image and a reference saliency map corresponding to the training image. 13. The apparatus of claim 10 , wherein the instructions further cause the processor to be configured to perform superposition, convolution, and pooling processing on the third feature map and the second first feature map to obtain the at least one second feature map. 14. The apparatus of claim 9 , wherein the instructions further cause the processor to be configured to: perform a second convolution processing on the at least two second feature maps to obtain features of the at least two second feature maps; and splice the features to obtain the saliency map of the image.
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
References adjustable by an adaptive method, e.g. learning · CPC title
Classification techniques · CPC title
Extraction of image or video features · CPC title
based on sparsity criteria, e.g. with an overcomplete basis · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.