Processing and generating sets using recurrent neural networks

US2017200076A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017200076-A1
Application numberUS-201715406557-A
CountryUS
Kind codeA1
Filing dateJan 13, 2017
Priority dateJan 13, 2016
Publication dateJul 13, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one aspect, this specification describes a recurrent neural network system implemented by one or more computers that is configured to process input sets to generate neural network outputs for each input set. The input set can be a collection of multiple inputs for which the recurrent neural network should generate the same neural network output regardless of the order in which the inputs are arranged in the collection. The recurrent neural network system can include a read neural network, a process neural network, and a write neural network. In another aspect, this specification describes a system implemented as computer programs on one or more computers in one or more locations that is configured to train a recurrent neural network that receives a neural network input and sequentially emits outputs to generate an output sequence for the neural network input.

First claim

Opening claim text (preview).

What is claimed is: 1 . A neural network system implemented by one or more computers, the neural network system comprising: a read neural network configured to: receive an input set comprising a plurality of inputs, and process each input in the input set to generate a respective memory vector for each input; a process neural network configured to: process the respective memory vector for each of the inputs to generate an order-invariant numeric embedding for the input set, wherein the order-invariant numeric embedding is permutation invariant to the inputs in the input set; and a write neural network configured to: process the order-invariant numeric embedding to generate a neural network output for the input set. 2 . The neural network system of claim 1 , wherein the process neural network comprises: a long short-term memory (LSTM) neural network configured to, for each of a plurality of time steps, update a current modified internal state to generate an initial updated internal state; and a subsystem configured to, for each of the plurality of time steps: receive the initial updated internal state for the time step, and apply an attention mechanism over the memory vectors for the inputs to modify the initial updated internal state for the time step to generate a modified internal state for the time step. 3 . The neural network system of claim 2 , wherein the modified internal state for the last time step in the plurality of time steps is the order-invariant numeric embedding for the input set. 4 . The neural network system of claim 2 , wherein applying the attention mechanism comprises: determining a respective similarity value for each of the memory vectors, wherein the respective similarity value represents a similarity between the initial updated internal state and the memory vector; generating a respective attention weight for each of the memory vectors from the respective similarity values; generating a read vector by combining the memory vectors in accordance with the attention weights; and combining the initial updated internal state and the read vector to generate the modified internal state. 5 . The neural network system of claim 4 , wherein determining the respective similarity for each of the memory vectors comprises determining a dot product between the initial updated internal state and the memory vector. 6 . The neural network system of claim 1 , wherein the write neural network is a pointer recurrent neural network configured to process the order-invariant numeric embedding to generate a plurality of pointers to the inputs in the input set. 7 . The neural network system of claim 1 , wherein the write neural network is a recurrent neural network configured to process the order-invariant numeric embedding to generate a sequence of neural network outputs. 8 . A method of training a recurrent neural network having a plurality of parameters that receives a neural network input and sequentially emits outputs to generate an output sequence for the neural network input, the method comprising: receiving first training data for training the recurrent neural network, the first training data comprising a plurality of training example pairs, each training example pair comprising a training input and a target output set for the training input, the training output set having a plurality of target outputs; and training the recurrent neural network on each of the training example pairs in the first training data, wherein training the recurrent neural network comprises, for each training example pair: selecting a particular order for the target outputs from the target output set in the training example pair; and training the recurrent neural network to generate an output sequence for the training input in the training example pair that matches a sequence having the target outputs from the target output set arranged according to the particular order. 9 . The method of claim 8 , further comprising: pre-training the recurrent neural network on second training data to determine pre-trained values of the parameters of the recurrent neural network from initial values of the parameters of the recurrent neural network, wherein training the recurrent neural network comprises determining trained values of the parameters of the recurrent neural network from the pre-trained values of the parameters. 10 . The method of claim 9 , wherein pre-training the recurrent neural network comprises, for each training example pair in the second training data: generating a plurality of candidate target sequences, each candidate target sequence having the target outputs from the target output set in the training example pair arranged according to a different order; and training the recurrent neural network to maximize an aggregate likelihood that one of the plurality of candidate target sequences is the correct target sequence for the training input in the training example pair as determined by the recurrent neural network. 11 . The method of claim 10 , wherein selecting the particular order comprises: generating a plurality of candidate target sequences, each candidate target sequence having the target outputs from the target output set arranged according to a different order; determining a respective likelihood for each of the candidate target sequences, the respective likelihood for each of the candidate target sequences being the likelihood that the candidate target sequence is the correct target sequence for the training input as determined by the recurrent neural network in accordance with current values of the parameters of the recurrent neural network; and selecting as the particular order the order according to which the target outputs in one of the candidate target sequences are arranged based on the respective likelihoods. 12 . The method of claim 11 , wherein selecting as the particular order the order according to which the target outputs in one of the candidate target sequences are arranged based on the respective likelihoods comprises: selecting the order according to which the target outputs in the candidate target sequence having the highest likelihood are arranged. 13 . The method of claim 11 , wherein selecting as the particular order the order according to which the target outputs in one of the candidate target sequences are arranged based on the respective likelihoods comprises: sampling a candidate target sequence from the candidate target sequences in accordance with the respective likelihoods; and selecting the order according to which the target outputs in the sampled candidate target sequence are arranged. 14 . The method of claim 11 , wherein the likelihood is a log likelihood. 15 . The method of claim 11 , wherein generating the plurality of candidate sequence comprises generating a respective candidate sequence for each possible ordering of the target outputs. 16 . The method of claim 11 , wherein generating the plurality of candidate sequence comprises performing an inexact search over possible orderings of the target outputs. 17 . A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving first training data for training the recurrent neural network, the first training data comprising a plurality of training example pairs, each training example pair comprising a training input and a target output set for the training input, t

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • G06N3/044Primary

    Recurrent networks, e.g. Hopfield networks · CPC title

  • G06N3/0442Primary

    characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Supervised learning · CPC title

  • Feedforward networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017200076A1 cover?
In one aspect, this specification describes a recurrent neural network system implemented by one or more computers that is configured to process input sets to generate neural network outputs for each input set. The input set can be a collection of multiple inputs for which the recurrent neural network should generate the same neural network output regardless of the order in which the inputs are…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/044. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jul 13 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).