Optimizations for dynamic object instance detection, segmentation, and structure mapping

US10565729B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10565729-B2
Application numberUS-201815971997-A
CountryUS
Kind codeB2
Filing dateMay 4, 2018
Priority dateDec 3, 2017
Publication dateFeb 18, 2020
Grant dateFeb 18, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a method includes a system accessing an image and generating a feature map using a first neural network. The system identifies a plurality of regions of interest in the feature map. A plurality of regional feature maps may be generated for the plurality of regions of interest, respectively. Using a second neural network, the system may detect at least one regional feature map in the plurality of regional feature maps that corresponds to a person depicted in the image, and generate a target region definition associated with a location of the person using the regional feature map. Based on the target region definition associated with the location of the person, a target regional feature map may be generated by sampling the feature map for the image. The system may process the target regional feature map to generate a keypoint mask and an instance segmentation mask.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising, by a computing system: accessing an image; generating a feature map for the image using a first neural network; identifying a plurality of regions of interest in the feature map; generating a plurality of regional feature maps for the plurality of regions of interest, respectively, by sampling the feature map for the image; processing the plurality of regional feature maps using a second neural network to: detect at least one regional feature map in the plurality of regional feature maps that corresponds to a person depicted in the image; and generate a target region definition associated with a location of the person using the regional feature map; generating, based on the target region definition associated with the location of the person, a target regional feature map by sampling the feature map for the image; and generating: a keypoint mask associated with the person by processing the target regional feature map using a third neural network; or an instance segmentation mask associated with the person by processing the target regional feature map using a fourth neural network. 2. The method of claim 1 , wherein the instance segmentation mask and keypoint mask are both generated and are being generated concurrently. 3. The method of claim 1 , wherein the first neural network comprises four or fewer convolutional layers. 4. The method of claim 3 , wherein each of the convolutional layers uses a kernel size of 3×3 or less. 5. The method of claim 1 , wherein the first neural network comprises a total of one pooling layer. 6. The method of claim 1 , wherein the first neural network comprises three or fewer inception modules. 7. The method of claim 6 , wherein each of the inception modules performs convolutional operations with kernel sizes of 5×5 or less. 8. The method of claim 1 , wherein each of the second neural network, third neural network, and fourth neural network is configured to process an input regional feature map using a total of one inception module. 9. A system comprising: one or more processors and one or more computer-readable non-transitory storage media coupled to one or more of the processors, the one or more computer-readable non-transitory storage media comprising instructions operable when executed by one or more of the processors to cause the system to perform operations comprising: accessing an image; generating a feature map for the image using a first neural network; identifying a plurality of regions of interest in the feature map; generating a plurality of regional feature maps for the plurality of regions of interest, respectively, by sampling the feature map for the image; processing the plurality of regional feature maps using a second neural network to: detect at least one regional feature map in the plurality of regional feature maps that corresponds to a person depicted in the image; and generate a target region definition associated with a location of the person using the regional feature map; generating, based on the target region definition associated with the location of the person, a target regional feature map by sampling the feature map for the image; and generating: a keypoint mask associated with the person by processing the target regional feature map using a third neural network; or an instance segmentation mask associated with the person by processing the target regional feature map using a fourth neural network. 10. The system of claim 9 , wherein the instance segmentation mask and keypoint mask are both generated and are being generated concurrently. 11. The system of claim 9 , wherein the first neural network comprises four or fewer convolutional layers. 12. The system of claim 11 , wherein each of the convolutional layers uses a kernel size of 3×3 or less. 13. The system of claim 9 , wherein the first neural network comprises a total of one pooling layer. 14. The system of claim 9 , wherein the first neural network comprises three or fewer inception modules. 15. One or more computer-readable non-transitory storage media embodying software that is operable when executed to cause one or more processors to perform operations comprising: accessing an image; generating a feature map for the image using a first neural network; identifying a plurality of regions of interest in the feature map; generating a plurality of regional feature maps for the plurality of regions of interest, respectively, by sampling the feature map for the image; processing the plurality of regional feature maps using a second neural network to: detect at least one regional feature map in the plurality of regional feature maps that corresponds to a person depicted in the image; and generate a target region definition associated with a location of the person using the regional feature map; generating, based on the target region definition associated with the location of the person, a target regional feature map by sampling the feature map for the image; and generating: a keypoint mask associated with the person by processing the target regional feature map using a third neural network; or an instance segmentation mask associated with the person by processing the target regional feature map using a fourth neural network. 16. The media of claim 15 , wherein the instance segmentation mask and keypoint mask are both generated and are being generated concurrently. 17. The media of claim 15 , wherein the first neural network comprises four or fewer convolutional layers. 18. The media of claim 17 , wherein each of the convolutional layers uses a kernel size of 3×3 or less. 19. The media of claim 15 , wherein the first neural network comprises a total of one pooling layer. 20. The media of claim 15 , wherein the first neural network comprises three or fewer inception modules.

Assignees

Inventors

Classifications

  • using classification, e.g. of video objects · CPC title

  • G06T7/75Primary

    involving models · CPC title

  • G06T7/73Primary

    using feature-based methods · CPC title

  • Knowledge engineering; Knowledge acquisition · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10565729B2 cover?
In one embodiment, a method includes a system accessing an image and generating a feature map using a first neural network. The system identifies a plurality of regions of interest in the feature map. A plurality of regional feature maps may be generated for the plurality of regions of interest, respectively. Using a second neural network, the system may detect at least one regional feature map…
Who is the assignee on this patent?
Facebook Inc
What technology area does this patent fall under?
Primary CPC classification G06T7/75. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 18 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).