Quasi-recurrent neural network based encoder-decoder model

US2018129931A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018129931-A1
Application numberUS-201715420801-A
CountryUS
Kind codeA1
Filing dateJan 31, 2017
Priority dateNov 4, 2016
Publication dateMay 10, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The technology disclosed provides a quasi-recurrent neural network (QRNN) encoder-decoder model that alternates convolutional layers, which apply in parallel across timesteps, and minimalist recurrent pooling layers that apply in parallel across feature dimensions.

First claim

Opening claim text (preview).

What is claimed is: 1 . A quasi-recurrent neural network (QRNN) system that increases computational efficiency in neural network sequence-to-sequence modeling, the system comprising: a QRNN encoder that comprises one or more encoder convolutional layers and one or more one encoder pooling layers, at least one encoder convolutional layer receives a time series of encoder input vectors and concurrently outputs encoded convolutional vectors for time series windows, and at least one encoder pooling layer receives the encoded convolutional vectors for the time series windows, concurrently accumulates an ordered set of feature sums in an encoded state vector for a current time series window, and sequentially outputs an encoded state vector for each successive time series window among the time series windows; a QRNN decoder that comprises one or more decoder convolutional layers and one or more one decoder pooling layers, at least one decoder convolutional layer receives a time series of decoder input vectors and concurrently outputs decoded convolutional vectors for time series windows, and at least one decoder pooling layer receives the decoded convolutional vectors for the time series windows respectively concatenated with an encoded state vector outputted by an encoder pooling layer for a final time series window, concurrently accumulates an ordered set of feature sums in a decoded state vector for a current time series window, and sequentially outputs a decoded state vector for each successive time series window among the time series windows; a state comparator that calculates linguistic similarity between the encoded state vectors and the decoded state vectors to produce an affinity matrix with encoding-wise and decoding-wise axes; an exponential normalizer that normalizes the affinity matrix encoding-wise to produce respective encoding-to-decoding attention weights; an encoding mixer that respectively combines the encoded state vectors with the encoding-to-decoding attention weights to generate respective contextual summaries of the encoded state vectors; and an attention encoder that respectively combines the decoded state vectors with the respective contextual summaries of the encoded state vectors to produce an attention encoding for each of the time series windows. 2 . The system of claim 1 , wherein the attention encoder is a multilayer perceptron that projects a concatenation of the decoded state vectors and respective contextual summaries of the encoded state vectors into non-linear projections to produce an attention encoding for each of the time series windows. 3 . The system of claim 1 , wherein the encoded state vectors are respectively multiplied by output gate vectors of the encoded convolutional vectors to produce respective encoded hidden state vectors, wherein the state comparator calculates linguistic similarity between the encoded hidden state vectors and the decoded state vectors to produce an affinity matrix with encoding-wise and decoding-wise axes, wherein the encoding mixer respectively combines the encoded hidden state vectors with the encoding-to-decoding attention weights to generate respective contextual summaries of the encoded hidden state vectors, and wherein the attention encoder respectively combines the decoded state vectors with the respective contextual summaries of the encoded hidden state vectors, and further multiplies the combinations with respective output gate vectors of the decoded convolutional vectors to produce an attention encoding for each of the time series windows. 4 . The system of claim 3 , wherein the attention encoder is a multilayer perceptron that projects a concatenation of the decoded state vectors and respective contextual summaries of the encoded hidden state vectors into non-linear projections, and further multiplies the non-linear projections with respective output gate vectors of the decoded convolutional vectors to produce an attention encoding for each of the time series windows. 5 . The system of claim 1 , wherein each of the convolution vectors comprising feature values in an activation vector and in one or more gate vectors, and the feature values in the gate vectors are parameters that, respectively, apply element-wise by ordinal position to the feature values in the activation vector. 6 . The system of claim 5 , wherein each pooling layer operates in parallel over feature values of a convolutional vector to concurrently accumulate ordinal position-wise, in a state vector for a current time series window, an ordered set of feature sums in dependence upon a feature value at a given ordinal position in an activation vector outputted for the current time series window, one or more feature values at the given ordinal position in one or more gate vectors outputted for the current time series window, and a feature sum at the given ordinal position in a state vector accumulated for a prior time series window. 7 . The system of claim 5 , wherein the gate vector is a forget gate vector, and wherein each pooling layer uses a forget gate vector for a current time series window to control accumulation of information from a state vector accumulated for a prior time series window and information from an activation vector for the current time series window. 8 . The system of claim 5 , wherein the gate vector is an input gate vector, and wherein each pooling layer uses an input gate vector for a current time series window to control accumulation of information from an activation vector for the current time series window. 9 . The system of claim 5 , wherein the gate vector is an output gate vector, and wherein each pooling layer uses an output gate vector for a current time series window to control accumulation of information from a state vector for the current time series window. 10 . A method of increasing computational efficiency in neural network sequence-to-sequence modeling, the method including: receiving a time series of encoder input vectors at an encoder convolutional layer of a QRNN encoder and concurrently outputting encoded convolutional vectors for time series windows; receiving the encoded convolutional vectors for the time series windows at an encoder pooling layer of the QRNN encoder, concurrently accumulating an ordered set of feature sums in an encoded state vector for a current time series window, and sequentially outputting an encoded state vector for each successive time series window among the time series windows; receiving a time series of decoder input vectors at a decoder convolutional layer of a QRNN decoder and concurrently outputting decoded convolutional vectors for time series windows; receiving the decoded convolutional vectors for the time series windows at a decoder pooling layer of the QRNN decoder respectively concatenated with an encoded state vector outputted by an encoder pooling layer for a final time series window, concurrently accumulating an ordered set of feature sums in an decoded state vector for a current time series window, and sequentially outputting an decoded state vector for each successive time series window among the time series windows; calculating linguistic similarity between the encoded state vectors and the decoded state vectors to produce an affinity matrix with encoding-wise and decoding-wise axes; exponentially normalizing the affinity matrix encoding-wise to produce respective encoding-to-decoding attention weights; combining the encoded state vectors with the encoding-to-decoding attention weights to generate respective contextual summaries of the encoded state vectors; and combining the decoded state vectors with the respective contextual summaries of th

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

  • G06F40/216Primary

    using statistical methods · CPC title

  • G06N3/04Primary

    Architecture, e.g. interconnection topology · CPC title

  • Semantic analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018129931A1 cover?
The technology disclosed provides a quasi-recurrent neural network (QRNN) encoder-decoder model that alternates convolutional layers, which apply in parallel across timesteps, and minimalist recurrent pooling layers that apply in parallel across feature dimensions.
Who is the assignee on this patent?
Salesforce Com Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/216. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 10 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).