What technology area does this patent fall under?

Primary CPC classification G06T7/75. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 18 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Optimizations for dynamic object instance detection, segmentation, and structure mapping

US10565729B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10565729-B2
Application number	US-201815971997-A
Country	US
Kind code	B2
Filing date	May 4, 2018
Priority date	Dec 3, 2017
Publication date	Feb 18, 2020
Grant date	Feb 18, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a method includes a system accessing an image and generating a feature map using a first neural network. The system identifies a plurality of regions of interest in the feature map. A plurality of regional feature maps may be generated for the plurality of regions of interest, respectively. Using a second neural network, the system may detect at least one regional feature map in the plurality of regional feature maps that corresponds to a person depicted in the image, and generate a target region definition associated with a location of the person using the regional feature map. Based on the target region definition associated with the location of the person, a target regional feature map may be generated by sampling the feature map for the image. The system may process the target regional feature map to generate a keypoint mask and an instance segmentation mask.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising, by a computing system: accessing an image; generating a feature map for the image using a first neural network; identifying a plurality of regions of interest in the feature map; generating a plurality of regional feature maps for the plurality of regions of interest, respectively, by sampling the feature map for the image; processing the plurality of regional feature maps using a second neural network to: detect at least one regional feature map in the plurality of regional feature maps that corresponds to a person depicted in the image; and generate a target region definition associated with a location of the person using the regional feature map; generating, based on the target region definition associated with the location of the person, a target regional feature map by sampling the feature map for the image; and generating: a keypoint mask associated with the person by processing the target regional feature map using a third neural network; or an instance segmentation mask associated with the person by processing the target regional feature map using a fourth neural network. 2. The method of claim 1 , wherein the instance segmentation mask and keypoint mask are both generated and are being generated concurrently. 3. The method of claim 1 , wherein the first neural network comprises four or fewer convolutional layers. 4. The method of claim 3 , wherein each of the convolutional layers uses a kernel size of 3×3 or less. 5. The method of claim 1 , wherein the first neural network comprises a total of one pooling layer. 6. The method of claim 1 , wherein the first neural network comprises three or fewer inception modules. 7. The method of claim 6 , wherein each of the inception modules performs convolutional operations with kernel sizes of 5×5 or less. 8. The method of claim 1 , wherein each of the second neural network, third neural network, and fourth neural network is configured to process an input regional feature map using a total of one inception module. 9. A system comprising: one or more processors and one or more computer-readable non-transitory storage media coupled to one or more of the processors, the one or more computer-readable non-transitory storage media comprising instructions operable when executed by one or more of the processors to cause the system to perform operations comprising: accessing an image; generating a feature map for the image using a first neural network; identifying a plurality of regions of interest in the feature map; generating a plurality of regional feature maps for the plurality of regions of interest, respectively, by sampling the feature map for the image; processing the plurality of regional feature maps using a second neural network to: detect at least one regional feature map in the plurality of regional feature maps that corresponds to a person depicted in the image; and generate a target region definition associated with a location of the person using the regional feature map; generating, based on the target region definition associated with the location of the person, a target regional feature map by sampling the feature map for the image; and generating: a keypoint mask associated with the person by processing the target regional feature map using a third neural network; or an instance segmentation mask associated with the person by processing the target regional feature map using a fourth neural network. 10. The system of claim 9 , wherein the instance segmentation mask and keypoint mask are both generated and are being generated concurrently. 11. The system of claim 9 , wherein the first neural network comprises four or fewer convolutional layers. 12. The system of claim 11 , wherein each of the convolutional layers uses a kernel size of 3×3 or less. 13. The system of claim 9 , wherein the first neural network comprises a total of one pooling layer. 14. The system of claim 9 , wherein the first neural network comprises three or fewer inception modules. 15. One or more computer-readable non-transitory storage media embodying software that is operable when executed to cause one or more processors to perform operations comprising: accessing an image; generating a feature map for the image using a first neural network; identifying a plurality of regions of interest in the feature map; generating a plurality of regional feature maps for the plurality of regions of interest, respectively, by sampling the feature map for the image; processing the plurality of regional feature maps using a second neural network to: detect at least one regional feature map in the plurality of regional feature maps that corresponds to a person depicted in the image; and generate a target region definition associated with a location of the person using the regional feature map; generating, based on the target region definition associated with the location of the person, a target regional feature map by sampling the feature map for the image; and generating: a keypoint mask associated with the person by processing the target regional feature map using a third neural network; or an instance segmentation mask associated with the person by processing the target regional feature map using a fourth neural network. 16. The media of claim 15 , wherein the instance segmentation mask and keypoint mask are both generated and are being generated concurrently. 17. The media of claim 15 , wherein the first neural network comprises four or fewer convolutional layers. 18. The media of claim 17 , wherein each of the convolutional layers uses a kernel size of 3×3 or less. 19. The media of claim 15 , wherein the first neural network comprises a total of one pooling layer. 20. The media of claim 15 , wherein the first neural network comprises three or fewer inception modules.

Assignees

Facebook Inc

Inventors

Classifications

G06V10/764
using classification, e.g. of video objects · CPC title
G06T7/75Primary
involving models · CPC title
G06T7/73Primary
using feature-based methods · CPC title
G06N5/022
Knowledge engineering; Knowledge acquisition · CPC title
G06N3/084
Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

View patent family 66658140

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10565729B2 cover?: In one embodiment, a method includes a system accessing an image and generating a feature map using a first neural network. The system identifies a plurality of regions of interest in the feature map. A plurality of regional feature maps may be generated for the plurality of regions of interest, respectively. Using a second neural network, the system may detect at least one regional feature map…
Who is the assignee on this patent?: Facebook Inc
What technology area does this patent fall under?: Primary CPC classification G06T7/75. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 18 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Method and apparatus for detecting object, method and apparatus for training neural network, and electronic device

Visual object recognition

Three-dimensional (3d) convolution with 3d batch normalization

Context-based priors for object detection in images

Memory efficiency for convolutional neural networks operating on graphics processing units

Using a probabilistic model for detecting an object in visual data

Frequently asked questions