Differentiable set to increase the memory capacity of recurrent neural net works

US11636308B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11636308-B2
Application numberUS-201615339303-A
CountryUS
Kind codeB2
Filing dateOct 31, 2016
Priority dateOct 31, 2016
Publication dateApr 25, 2023
Grant dateApr 25, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to embodiments, a recurrent neural network (RNN) is equipped with a set data structure whose operations are differentiable, which data structure can be used to store information for a long period of time. This differentiable set data structure can “remember” an event in the sequence of sequential data that may impact another event much later in the sequence, thereby allowing the RNN to classify the sequence based on many kinds of long dependencies. An RNN that is equipped with the differentiable set data structure can be properly trained with backpropagation and gradient descent optimizations. According to embodiments, a differentiable set data structure can be used to store and retrieve information with a simple set-like interface. According to further embodiments, the RNN can be extended to support several add operations, which can make the differentiable set data structure behave like a Bloom filter.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-executed method comprising: training, based on a set of sequential training data, a recurrent neural network that is equipped with a differentiable set data structure; wherein training the recurrent neural network comprises: performing one or both of: adding an element to the differentiable set data structure based, at least in part, on a hidden state of the recurrent neural network, and performing a query over the differentiable set data structure based, at least in part, on the hidden state of the recurrent neural network; and after performing one or both of adding the element and performing the query, generating a prediction, based on output of the query, without using the hidden state of the recurrent neural network; wherein training the recurrent neural network produces a trained recurrent neural network; generating, by the recurrent neural network, a new query that contains a vector that represents a value in an unlabeled sequence that is syntactically valid, wherein the vector contains a plurality of weights that are less than 0.5 and one weight that is greater than 0.5; and based on the new query that contains the vector that represents the value, the trained recurrent neural network, and the differentiable set data structure, detecting that the unlabeled sequence is semantically invalid; wherein the method is performed by one or more computing devices. 2. The method of claim 1 , wherein: adding the element to the differentiable set data structure is performed via a continuous operation; and performing the query over the differentiable set data structure is performed via a continuous operation. 3. The method of claim 1 , wherein: the differentiable set data structure represents a logical set of values; and the differentiable set data structure stores a plurality of probabilities that indicate whether corresponding values, that correspond to the plurality of probabilities, are included in the logical set of values. 4. The method of claim 1 , wherein: the differentiable set data structure represents a logical set of values; training the recurrent neural network that is equipped with the differentiable set data structure comprises: generating a control command based on a sigmoid activation function and the hidden state of the recurrent neural network; wherein the control command indicates a probability that a particular value will be added to the logical set of values. 5. The method of claim 4 , wherein training the recurrent neural network that is equipped with the differentiable set data structure further comprises: generating a new probability that the particular value is included in the logical set of values by adding the control command to a previous probability that the particular value is included in the logical set of values. 6. The method of claim 5 , wherein generating the new probability comprises: determining whether a value for the new probability is greater than 1; and in response to determining that the value for the new probability is greater than one, setting the value for the new probability to 1; wherein the value for the new probability is included in the differentiable set data structure. 7. The method of claim 1 , wherein training the recurrent neural network that is equipped with the differentiable set data structure comprises: generating, based on a sigmoid activation function and the hidden state of the recurrent neural network, a location vector that indicates a location of a particular value within the differentiable set data structure. 8. The method of claim 1 , wherein: the set of sequential training data comprises one or more sequences of words; the method further comprises at least one selected from the group consisting of: a) identifying one or more properties of the sequence of unlabeled data based, at least in part, on the trained recurrent neural network and the differentiable set data structure, and b) performing both of: determining whether the particular word is identified in the differentiable set data structure; and classifying a portion of the sequence of unlabeled data based, at least in part, on determining that the particular word is identified in the differentiable set data structure. 9. The method of claim 1 , wherein backpropagation is used to train the recurrent neural network that is equipped with the differentiable set data structure. 10. The method of claim 1 , wherein the differentiable set data structure is implemented with a Bloom filter. 11. The method of claim 1 , wherein the recurrent neural network is a Long Short-Term Memory Recurrent Neural Network. 12. The method of claim 1 , wherein training the recurrent neural network comprises performing a mixture of said adding the element to the differentiable set data structure and said performing the query over the differentiable set data structure. 13. The method of claim 1 , wherein: adding the element to the differentiable set data structure is further based, at least in part, on a value generator hash; the performing the query over the differentiable set data structure is based on multiple positions within the differentiable set data structure; and said output of the query represents a probability that a query element is represented by the differentiable set data structure. 14. The method of claim 1 , wherein the differentiable set data structure is represented as an array. 15. One or more non-transitory computer-readable media storing instructions that, when executed by one or more processors, cause: training, based on a set of sequential training data, a recurrent neural network that is equipped with a differentiable set data structure; wherein training the recurrent neural network comprises: performing one or both of: adding an element to the differentiable set data structure based, at least in part, on a hidden state of the recurrent neural network, performing a query over the differentiable set data structure based, at least in part, on the hidden state of the recurrent neural network; and after performing one or both of adding the element and performing the query, generating a prediction, based on output of the query, without using the hidden state of the recurrent neural network; wherein training the recurrent neural network produces a trained recurrent neural network; generating, by the recurrent neural network, a new query that contains a vector that represents a value in an unlabeled sequence that is syntactically valid, wherein the vector contains a plurality of weights that are less than 0.5 and one weight that is greater than 0.5; and based on the new query that contains the vector that represents the value, the trained recurrent neural network, and the differentiable set data structure, detecting that the unlabeled sequence is semantically invalid. 16. The one or more non-transitory computer-readable media of claim 15 , wherein: adding the element to the differentiable set data structure is performed via a continuous operation; and performing the query over the differentiable set data structure is performed via continuous operation. 17. The one or more non-transitory computer-readable media of claim 15 , wherein: the differentiable set data structure represents a logical set of values; and the differentiable set data structure stores a plurality of probabilities that indicate whether corresponding values, that correspond to the plurality of probabilities, are included in the logical set of values. 18. The one or more non-tra

Assignees

Inventors

Classifications

  • G06N3/0442Primary

    characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Supervised learning · CPC title

  • G06N3/044Primary

    Recurrent networks, e.g. Hopfield networks · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11636308B2 cover?
According to embodiments, a recurrent neural network (RNN) is equipped with a set data structure whose operations are differentiable, which data structure can be used to store information for a long period of time. This differentiable set data structure can “remember” an event in the sequence of sequential data that may impact another event much later in the sequence, thereby allowing the RNN t…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/0442. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 25 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).