Cluster-trained machine learning for image processing
US-9704054-B1 · Jul 11, 2017 · US
US10410351B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10410351-B2 |
| Application number | US-201816116609-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 29, 2018 |
| Priority date | Mar 14, 2017 |
| Publication date | Sep 10, 2019 |
| Grant date | Sep 10, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The invention is directed towards segmenting images based on natural language phrases. An image and an n-gram, including a sequence of tokens, are received. An encoding of image features and a sequence of token vectors are generated. A fully convolutional neural network identifies and encodes the image features. A word embedding model generates the token vectors. A recurrent neural network (RNN) iteratively updates a segmentation map based on combinations of the image feature encoding and the token vectors. The segmentation map identifies which pixels are included in an image region referenced by the n-gram. A segmented image is generated based on the segmentation map. The RNN may be a convolutional multimodal RNN. A separate RNN, such as a long short-term memory network, may iteratively update an encoding of semantic features based on the order of tokens. The first RNN may update the segmentation map based on the semantic feature encoding.
Opening claim text (preview).
What is claimed is: 1. A non-transitory computer-readable storage medium having instructions stored thereon for segmenting an image, which, when executed by a processor of a computing device cause the computing device to perform actions comprising: receiving a phrase that references a first region of the image, wherein the phrase includes a set of tokens; generating a plurality of token data elements based on the set of tokens, wherein each of the plurality of token data elements indicates a semantic feature of a corresponding token of the set of tokens; generating a plurality of iterative updates of a segmentation map of the image based on an order of the set of tokens, wherein each of a plurality of iterative updates of the segmentation map is based on the semantic feature indicated by the corresponding token data element; and segmenting the first region of the image based on the iteratively updated segmentation map. 2. The non-transitory computer-readable storage medium of claim 1 , wherein the actions further comprise: generating an image map that represents a correspondence between each of a plurality of image features and corresponding portion of a plurality of pixels, wherein the image includes the plurality of pixels. 3. The non-transitory computer-readable storage medium of claim 1 , wherein the actions further comprise: generating a segmented image based on the image and the segmentation map. 4. The non-transitory computer-readable storage medium of claim 1 , wherein the segmentation map represents whether each of the plurality of pixels is included in the first region of the image. 5. The non-transitory computer-readable storage medium of claim 1 , wherein the plurality of iterative updates of the segmentation map is further based on a previous version of the segmentation map and a combination of the image map and one of the corresponding token data elements that is based on the order of the set of tokens. 6. The non-transitory computer-readable storage medium of claim 1 , wherein the actions further comprise: iteratively updating an n-gram data element that encodes semantic features of the order of the set of tokens, wherein each of a plurality of iterative updates of the n-gram data element is based on a previous version of the n-gram data element and one of the token data elements based on the order of the set of tokens; and iteratively updating the segmentation map, wherein each of the plurality of iterative updates of the segmentation map is further based on a combination of the image map and an updated n-gram data element corresponding to the order of the set of tokens. 7. The non-transitory computer-readable storage medium of claim 6 , wherein each of the plurality of iterative updates of the n-gram data element is further based on a trained long short-term memory (LSTM) neural network that propagates each of the plurality of iterative updates of the n-gram element. 8. A method for segmenting an image, the method comprising: receiving an image that includes a plurality of pixels; receiving an n-gram that includes an ordered set of tokens based on a phrase that references an object depicted within a first region of the image; generating an image data structure that encodes a mapping between each of a plurality of image features corresponding to the image and a corresponding portion of the plurality of pixels; generating a set of token data structures; employing a first recurrent neural network (RNN) to iteratively generate a segmentation map; and segmenting the image based on the iteratively generated segmentation map. 9. The method of claim 8 , further comprising: iteratively generating an n-gram data structure based on a second RNN and the set of token data structures, wherein the second RNN propagates the n-gram data structure during the iterative generation of the n-gram data structure; and iteratively generating the segmentation map further based on a plurality of iteratively generated combinations of the image data structure and the n-gram data structure. 10. The method of claim 8 , further comprising: training a long short-term memory (LSTM) neural network based on a training data that includes a plurality of other n-grams; and employing the trained LSTM as the second RNN. 11. The method of claim 8 , wherein the plurality of image features are identified within the image based on an image feature identification model. 12. The method of claim 8 , wherein generating a set of token data structures is based on a natural language model and the set of data token structures encodes semantic features of a corresponding token of the set of tokens. 13. The method of claim 8 , wherein iteratively generating the segmentation map is further based on a plurality of iteratively generated combinations of the image data structure and portions of the set of token data structures, wherein the first RNN propagates the segmentation map during the iterative generation of the segmentation data structure and the segmentation map identifies a subset of the plurality of pixels that are included in the first region of the image. 14. An image segmentation system for segmenting an image, the system comprising: a processor device; and a computer-readable non-transitory storage medium, coupled with the processor device, having instructions stored thereon, which, when executed by the processor device, perform actions comprising: receiving an image feature data structure that encodes images features corresponding to an image; employing a first recurrent neural network (RNN) to generate an n-gram feature data structure that encodes n-gram features corresponding to an ordered set of tokens included in a natural language phrase that references a portion of the image; employing a second RNN to iteratively update a current state of a segmentation map based on the image feature data structure and the n-gram feature data structure, wherein the second RNN propagates a current state of the segmentation map; generating a segmented image based on the iteratively updated current state of the segmentation map, wherein the segmented image indicates the portion of the image referenced by the natural language phrase. 15. The system of claim 14 , the actions further comprising: generating, using a natural language model, a token vector for each token of the ordered set of tokens included in the n-gram; iteratively generating, based on a plurality of token vectors corresponding with the ordered set of tokens, an n-gram hidden feature vector; propagating, using the first recurrent neural model, a current state of the n-gram hidden feature vector; and iteratively updating, using the current state of the n-gram hidden feature vector, the state of the n-gram hidden feature vector in a subsequent iteration of the iterative generation of the n-gram hidden feature vector. 16. The system of claim 14 , the actions further comprising: employing a convolutional neural network to generate the image feature data structure based on the image; providing the second RNN the image feature data structure; and proving the second RNN the n-gram feature data structure. 17. The system of claim 14 , wherein the first RNN is a long short-term memory (LSTM) neural network. 18. The system of claim 14 , wherein the second RNN is a convolutional multimodal recurrent neural network (mRNN). 19. The system of claim 14 , wherein the first RNN iteratively encodes the n-gram feature data structure based on a sequence of the ordered set of tokens.
Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title
Classification techniques · CPC title
using neural networks · CPC title
Combinations of networks · CPC title
based on distances to training or reference patterns · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.