Method and device for detecting an object in an image

US11620815B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11620815-B2
Application numberUS-202217938457-A
CountryUS
Kind codeB2
Filing dateOct 6, 2022
Priority dateOct 15, 2021
Publication dateApr 4, 2023
Grant dateApr 4, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for detecting an object in an image includes: obtaining an image to be detected; generating a plurality of feature maps based on the image to be detected by a plurality of feature extracting networks in a neural network model trained for object detection, in which the plurality of feature extracting networks are connected sequentially, and input data of a latter feature extracting network in the plurality of feature extracting networks is based on output data and input data of a previous feature extracting network; and generating an object detection result based on the plurality of feature maps by an object detecting network in the neural network model.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for detecting an object in an image, comprising: obtaining an image to be detected; generating a plurality of feature maps based on the image to be detected by a plurality of feature extracting networks in a neural network model trained for object detection, wherein the plurality of feature extracting networks are connected sequentially, and input data of a latter feature extracting network in the plurality of feature extracting networks is based on output data and input data of a previous feature extracting network; and generating an object detection result based on the plurality of feature maps by an object detecting network in the neural network model, wherein the object detecting network comprises a position box detecting branch and an object classifying branch, the position box detecting branch comprises a first depthwise separable convolutional layer, a global average pooling layer and a second depthwise separable convolutional layer, and the object classifying branch comprises a third depthwise separable convolutional layer, a fourth depthwise separable convolutional layer and a fifth depthwise separable convolutional layer, there is a skip connection between the fifth depthwise separable convolutional layer and the third depthwise separable convolutional layer, and input data of the fifth depthwise separable convolutional layer is based on input data and output data of the third depthwise separable convolutional layer. 2. The method of claim 1 , wherein generating the plurality of feature maps comprises: generating a first feature map by a first feature extracting network based on the image to be detected; and generating a second feature map by a second feature extracting network based on the image to be detected and the first feature map. 3. The method of claim 1 , wherein generating the plurality of feature maps comprises: generating a feature extraction result by a first feature extracting network based on the image to be detected; generating a first feature map by a second feature extracting network based on the image to be detected and the feature extraction result; and generating a second feature map by a third feature extracting network based on the image to be detected, the feature extraction result and the first feature map. 4. The method of claim 1 , wherein generating the plurality of feature maps comprises: generating a residual convolution result by a residual convolutional network based on the image to be detected; and generating the plurality of feature maps by the plurality of feature extracting networks based on the residual convolution result. 5. The method of claim 4 , wherein generating the residual convolution result comprises: generating a first branch convolution result, by a first branch comprising a 3*3 convolutional layer and a 1*1 convolutional layer, based on the image to be detected; generating a second branch convolution result, by a second branch comprising a 1*1 convolutional layer, a 3*3 convolutional layer and a 1*1 convolutional layer, based on the image to be detected; and generating the residual convolution result based on the first branch convolution result and the second branch convolution result. 6. The method of claim 1 , wherein generating the object detection result comprises: generating a fused feature map by a feature pyramid network in the neural network model based on the plurality of feature maps; and generating the object detection result by the object detecting network based on the fused feature map. 7. The method of claim 6 , wherein generating the object detection result comprises: generating a first convolution result by the first depthwise separable convolutional layer based on the fused feature map; generating a pooling result by the global average pooling layer based on the first convolution result; and generating position box information of a detected object in the image to be detected, by the second depthwise separable convolutional layer, based on the pooling result. 8. The method of claim 7 , wherein a size of a convolution kernel of the first depthwise separable convolutional layer and a size of a convolution kernel of the second depthwise separable convolutional layer are configured to be 5*5. 9. The method of claim 6 , wherein generating the object detection result comprises: generating a second convolution result by the third depthwise separable convolutional layer based on the fused feature map; generating a third convolution result by the fourth depthwise separable convolutional layer based on the second convolution result; and generating type information of a detected object in the image to be detected by the fifth depthwise separable convolutional layer based on the second convolution result and the third convolution result. 10. The method of claim 1 , wherein the neural network model is generated through multiple rounds of training by an exponential moving average algorithm, and parameters of the exponential moving average algorithm are reset every preset number of training rounds. 11. The method of claim 1 , wherein the neural network model is generated through multiple rounds of training, and cosine decay is performed on a learning rate of a round of training in the multiple rounds of training based on a learning rate of a previous round of training. 12. The method of claim 1 , wherein the neural network model is generated by training with a gradient descent with momentum algorithm, and a regularization decay rate of the gradient descent with momentum algorithm is configured to be 4e-5. 13. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein, the memory stores instructions executable by the at least one processor, when the instructions are executed by the at least one processor, the at least one processor is configured to: obtain an image to be detected; generate a plurality of feature maps based on the image to be detected by a plurality of feature extracting networks in a neural network model trained for object detection, wherein the plurality of feature extracting networks are connected sequentially, and input data of a latter feature extracting network in the plurality of feature extracting networks is based on output data and input data of a previous feature extracting network; and generate an object detection result based on the plurality of feature maps by an object detecting network in the neural network model, wherein the object detecting network comprises a position box detecting branch and an object classifying branch, the position box detecting branch comprises a first depthwise separable convolutional layer, a global average pooling layer and a second depthwise separable convolutional layer, and the object classifying branch comprises a third depthwise separable convolutional layer, a fourth depthwise separable convolutional layer and a fifth depthwise separable convolutional layer, there is a skip connection between the fifth depthwise separable convolutional layer and the third depthwise separable convolutional layer, and input data of the fifth depthwise separable convolutional layer is based on input data and output data of the third depthwise separable convolutional layer. 14. The device of claim 13 , wherein the at least one processor is further configured to: generate a first feature map by a first feature extracting network based on the image to be detected; and generate a second feature map by a second feature extracting network based on the image to be detected and the first feature map.

Assignees

Inventors

Classifications

  • Target detection · CPC title

  • Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title

  • G06V10/82Primary

    using neural networks · CPC title

  • of extracted features · CPC title

  • G06F18/253Primary

    of extracted features · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11620815B2 cover?
A method for detecting an object in an image includes: obtaining an image to be detected; generating a plurality of feature maps based on the image to be detected by a plurality of feature extracting networks in a neural network model trained for object detection, in which the plurality of feature extracting networks are connected sequentially, and input data of a latter feature extracting netw…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V10/7715. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 04 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).