Computer vision system and method
US-2020234447-A1 · Jul 23, 2020 · US
US11620815B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11620815-B2 |
| Application number | US-202217938457-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 6, 2022 |
| Priority date | Oct 15, 2021 |
| Publication date | Apr 4, 2023 |
| Grant date | Apr 4, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for detecting an object in an image includes: obtaining an image to be detected; generating a plurality of feature maps based on the image to be detected by a plurality of feature extracting networks in a neural network model trained for object detection, in which the plurality of feature extracting networks are connected sequentially, and input data of a latter feature extracting network in the plurality of feature extracting networks is based on output data and input data of a previous feature extracting network; and generating an object detection result based on the plurality of feature maps by an object detecting network in the neural network model.
Opening claim text (preview).
What is claimed is: 1. A method for detecting an object in an image, comprising: obtaining an image to be detected; generating a plurality of feature maps based on the image to be detected by a plurality of feature extracting networks in a neural network model trained for object detection, wherein the plurality of feature extracting networks are connected sequentially, and input data of a latter feature extracting network in the plurality of feature extracting networks is based on output data and input data of a previous feature extracting network; and generating an object detection result based on the plurality of feature maps by an object detecting network in the neural network model, wherein the object detecting network comprises a position box detecting branch and an object classifying branch, the position box detecting branch comprises a first depthwise separable convolutional layer, a global average pooling layer and a second depthwise separable convolutional layer, and the object classifying branch comprises a third depthwise separable convolutional layer, a fourth depthwise separable convolutional layer and a fifth depthwise separable convolutional layer, there is a skip connection between the fifth depthwise separable convolutional layer and the third depthwise separable convolutional layer, and input data of the fifth depthwise separable convolutional layer is based on input data and output data of the third depthwise separable convolutional layer. 2. The method of claim 1 , wherein generating the plurality of feature maps comprises: generating a first feature map by a first feature extracting network based on the image to be detected; and generating a second feature map by a second feature extracting network based on the image to be detected and the first feature map. 3. The method of claim 1 , wherein generating the plurality of feature maps comprises: generating a feature extraction result by a first feature extracting network based on the image to be detected; generating a first feature map by a second feature extracting network based on the image to be detected and the feature extraction result; and generating a second feature map by a third feature extracting network based on the image to be detected, the feature extraction result and the first feature map. 4. The method of claim 1 , wherein generating the plurality of feature maps comprises: generating a residual convolution result by a residual convolutional network based on the image to be detected; and generating the plurality of feature maps by the plurality of feature extracting networks based on the residual convolution result. 5. The method of claim 4 , wherein generating the residual convolution result comprises: generating a first branch convolution result, by a first branch comprising a 3*3 convolutional layer and a 1*1 convolutional layer, based on the image to be detected; generating a second branch convolution result, by a second branch comprising a 1*1 convolutional layer, a 3*3 convolutional layer and a 1*1 convolutional layer, based on the image to be detected; and generating the residual convolution result based on the first branch convolution result and the second branch convolution result. 6. The method of claim 1 , wherein generating the object detection result comprises: generating a fused feature map by a feature pyramid network in the neural network model based on the plurality of feature maps; and generating the object detection result by the object detecting network based on the fused feature map. 7. The method of claim 6 , wherein generating the object detection result comprises: generating a first convolution result by the first depthwise separable convolutional layer based on the fused feature map; generating a pooling result by the global average pooling layer based on the first convolution result; and generating position box information of a detected object in the image to be detected, by the second depthwise separable convolutional layer, based on the pooling result. 8. The method of claim 7 , wherein a size of a convolution kernel of the first depthwise separable convolutional layer and a size of a convolution kernel of the second depthwise separable convolutional layer are configured to be 5*5. 9. The method of claim 6 , wherein generating the object detection result comprises: generating a second convolution result by the third depthwise separable convolutional layer based on the fused feature map; generating a third convolution result by the fourth depthwise separable convolutional layer based on the second convolution result; and generating type information of a detected object in the image to be detected by the fifth depthwise separable convolutional layer based on the second convolution result and the third convolution result. 10. The method of claim 1 , wherein the neural network model is generated through multiple rounds of training by an exponential moving average algorithm, and parameters of the exponential moving average algorithm are reset every preset number of training rounds. 11. The method of claim 1 , wherein the neural network model is generated through multiple rounds of training, and cosine decay is performed on a learning rate of a round of training in the multiple rounds of training based on a learning rate of a previous round of training. 12. The method of claim 1 , wherein the neural network model is generated by training with a gradient descent with momentum algorithm, and a regularization decay rate of the gradient descent with momentum algorithm is configured to be 4e-5. 13. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein, the memory stores instructions executable by the at least one processor, when the instructions are executed by the at least one processor, the at least one processor is configured to: obtain an image to be detected; generate a plurality of feature maps based on the image to be detected by a plurality of feature extracting networks in a neural network model trained for object detection, wherein the plurality of feature extracting networks are connected sequentially, and input data of a latter feature extracting network in the plurality of feature extracting networks is based on output data and input data of a previous feature extracting network; and generate an object detection result based on the plurality of feature maps by an object detecting network in the neural network model, wherein the object detecting network comprises a position box detecting branch and an object classifying branch, the position box detecting branch comprises a first depthwise separable convolutional layer, a global average pooling layer and a second depthwise separable convolutional layer, and the object classifying branch comprises a third depthwise separable convolutional layer, a fourth depthwise separable convolutional layer and a fifth depthwise separable convolutional layer, there is a skip connection between the fifth depthwise separable convolutional layer and the third depthwise separable convolutional layer, and input data of the fifth depthwise separable convolutional layer is based on input data and output data of the third depthwise separable convolutional layer. 14. The device of claim 13 , wherein the at least one processor is further configured to: generate a first feature map by a first feature extracting network based on the image to be detected; and generate a second feature map by a second feature extracting network based on the image to be detected and the first feature map.
Target detection · CPC title
Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title
using neural networks · CPC title
of extracted features · CPC title
of extracted features · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.