Automatically segmenting images based on natural language phrases

US10410351B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10410351-B2
Application numberUS-201816116609-A
CountryUS
Kind codeB2
Filing dateAug 29, 2018
Priority dateMar 14, 2017
Publication dateSep 10, 2019
Grant dateSep 10, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The invention is directed towards segmenting images based on natural language phrases. An image and an n-gram, including a sequence of tokens, are received. An encoding of image features and a sequence of token vectors are generated. A fully convolutional neural network identifies and encodes the image features. A word embedding model generates the token vectors. A recurrent neural network (RNN) iteratively updates a segmentation map based on combinations of the image feature encoding and the token vectors. The segmentation map identifies which pixels are included in an image region referenced by the n-gram. A segmented image is generated based on the segmentation map. The RNN may be a convolutional multimodal RNN. A separate RNN, such as a long short-term memory network, may iteratively update an encoding of semantic features based on the order of tokens. The first RNN may update the segmentation map based on the semantic feature encoding.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer-readable storage medium having instructions stored thereon for segmenting an image, which, when executed by a processor of a computing device cause the computing device to perform actions comprising: receiving a phrase that references a first region of the image, wherein the phrase includes a set of tokens; generating a plurality of token data elements based on the set of tokens, wherein each of the plurality of token data elements indicates a semantic feature of a corresponding token of the set of tokens; generating a plurality of iterative updates of a segmentation map of the image based on an order of the set of tokens, wherein each of a plurality of iterative updates of the segmentation map is based on the semantic feature indicated by the corresponding token data element; and segmenting the first region of the image based on the iteratively updated segmentation map. 2. The non-transitory computer-readable storage medium of claim 1 , wherein the actions further comprise: generating an image map that represents a correspondence between each of a plurality of image features and corresponding portion of a plurality of pixels, wherein the image includes the plurality of pixels. 3. The non-transitory computer-readable storage medium of claim 1 , wherein the actions further comprise: generating a segmented image based on the image and the segmentation map. 4. The non-transitory computer-readable storage medium of claim 1 , wherein the segmentation map represents whether each of the plurality of pixels is included in the first region of the image. 5. The non-transitory computer-readable storage medium of claim 1 , wherein the plurality of iterative updates of the segmentation map is further based on a previous version of the segmentation map and a combination of the image map and one of the corresponding token data elements that is based on the order of the set of tokens. 6. The non-transitory computer-readable storage medium of claim 1 , wherein the actions further comprise: iteratively updating an n-gram data element that encodes semantic features of the order of the set of tokens, wherein each of a plurality of iterative updates of the n-gram data element is based on a previous version of the n-gram data element and one of the token data elements based on the order of the set of tokens; and iteratively updating the segmentation map, wherein each of the plurality of iterative updates of the segmentation map is further based on a combination of the image map and an updated n-gram data element corresponding to the order of the set of tokens. 7. The non-transitory computer-readable storage medium of claim 6 , wherein each of the plurality of iterative updates of the n-gram data element is further based on a trained long short-term memory (LSTM) neural network that propagates each of the plurality of iterative updates of the n-gram element. 8. A method for segmenting an image, the method comprising: receiving an image that includes a plurality of pixels; receiving an n-gram that includes an ordered set of tokens based on a phrase that references an object depicted within a first region of the image; generating an image data structure that encodes a mapping between each of a plurality of image features corresponding to the image and a corresponding portion of the plurality of pixels; generating a set of token data structures; employing a first recurrent neural network (RNN) to iteratively generate a segmentation map; and segmenting the image based on the iteratively generated segmentation map. 9. The method of claim 8 , further comprising: iteratively generating an n-gram data structure based on a second RNN and the set of token data structures, wherein the second RNN propagates the n-gram data structure during the iterative generation of the n-gram data structure; and iteratively generating the segmentation map further based on a plurality of iteratively generated combinations of the image data structure and the n-gram data structure. 10. The method of claim 8 , further comprising: training a long short-term memory (LSTM) neural network based on a training data that includes a plurality of other n-grams; and employing the trained LSTM as the second RNN. 11. The method of claim 8 , wherein the plurality of image features are identified within the image based on an image feature identification model. 12. The method of claim 8 , wherein generating a set of token data structures is based on a natural language model and the set of data token structures encodes semantic features of a corresponding token of the set of tokens. 13. The method of claim 8 , wherein iteratively generating the segmentation map is further based on a plurality of iteratively generated combinations of the image data structure and portions of the set of token data structures, wherein the first RNN propagates the segmentation map during the iterative generation of the segmentation data structure and the segmentation map identifies a subset of the plurality of pixels that are included in the first region of the image. 14. An image segmentation system for segmenting an image, the system comprising: a processor device; and a computer-readable non-transitory storage medium, coupled with the processor device, having instructions stored thereon, which, when executed by the processor device, perform actions comprising: receiving an image feature data structure that encodes images features corresponding to an image; employing a first recurrent neural network (RNN) to generate an n-gram feature data structure that encodes n-gram features corresponding to an ordered set of tokens included in a natural language phrase that references a portion of the image; employing a second RNN to iteratively update a current state of a segmentation map based on the image feature data structure and the n-gram feature data structure, wherein the second RNN propagates a current state of the segmentation map; generating a segmented image based on the iteratively updated current state of the segmentation map, wherein the segmented image indicates the portion of the image referenced by the natural language phrase. 15. The system of claim 14 , the actions further comprising: generating, using a natural language model, a token vector for each token of the ordered set of tokens included in the n-gram; iteratively generating, based on a plurality of token vectors corresponding with the ordered set of tokens, an n-gram hidden feature vector; propagating, using the first recurrent neural model, a current state of the n-gram hidden feature vector; and iteratively updating, using the current state of the n-gram hidden feature vector, the state of the n-gram hidden feature vector in a subsequent iteration of the iterative generation of the n-gram hidden feature vector. 16. The system of claim 14 , the actions further comprising: employing a convolutional neural network to generate the image feature data structure based on the image; providing the second RNN the image feature data structure; and proving the second RNN the n-gram feature data structure. 17. The system of claim 14 , wherein the first RNN is a long short-term memory (LSTM) neural network. 18. The system of claim 14 , wherein the second RNN is a convolutional multimodal recurrent neural network (mRNN). 19. The system of claim 14 , wherein the first RNN iteratively encodes the n-gram feature data structure based on a sequence of the ordered set of tokens.

Assignees

Inventors

Classifications

  • Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title

  • Classification techniques · CPC title

  • using neural networks · CPC title

  • Combinations of networks · CPC title

  • based on distances to training or reference patterns · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10410351B2 cover?
The invention is directed towards segmenting images based on natural language phrases. An image and an n-gram, including a sequence of tokens, are received. An encoding of image features and a sequence of token vectors are generated. A fully convolutional neural network identifies and encodes the image features. A word embedding model generates the token vectors. A recurrent neural network (RNN…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06T7/11. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 10 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).