Neural networks for transforming signals
US-9582753-B2 · Feb 28, 2017 · US
US11151985B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11151985-B2 |
| Application number | US-201916713298-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 13, 2019 |
| Priority date | Feb 26, 2016 |
| Publication date | Oct 19, 2021 |
| Grant date | Oct 19, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing an utterance, and the input acoustic sequence comprising a respective acoustic feature representation at each of a first number of time steps, processing the input acoustic sequence using a first neural network to convert the input acoustic sequence into an alternative representation for the input acoustic sequence, processing the alternative representation for the input acoustic sequence using an attention-based Recurrent Neural Network (RNN) to generate, for each position in an output sequence order, a set of substring scores that includes a respective substring score for each substring in a set of substrings; and generating a sequence of substrings that represent a transcription of the utterance.
Opening claim text (preview).
What is claimed is: 1. A method comprising: obtaining, at a pyramid Bidirectional Long Short Term Memory (BLSTM) Recurrent Neural Network (RNN) executing on data processing hardware, an input sequence representing an utterance, the BLSTM RNN comprising: a bottom BLSTM layer configured to receive, as input, the input sequence representing the utterance; a first pyramid BLSTM layer configured to receive, as input, an output of the bottom BLSTM layer; and a second pyramid BLSTM layer configured to receive, as input, an output of the first pyramid BLSTM layer; at each of a first number of time steps, processing, using the bottom BLSTM layer, a respective feature representation of the input sequence to generate a respective bottom BLSTM layer output; processing, using the first pyramid BLSTM layer, the respective bottom BLSTM layer outputs generated for each of the first number of time steps to generate a sequence of first pyramid BLSTM layer outputs; at each of a second number of time steps: receiving, at the second pyramid BLSTM layer, a respective concatenation of consecutive first pyramid BLSTM layer outputs of the sequence of first pyramid BLSTM layer outputs generated using the first pyramid BLSTM layer; and processing, using the second pyramid BLSTM layer, the respective concatenation of consecutive first pyramid BLSTM layer outputs to generate a respective alternative feature representation for the corresponding time step of the second number of time steps; receiving, at an attention-based neural network executing on the data processing hardware, an alternative representation for the input sequence; and for each position in an output sequence, generating, using the attention-based neural network, a probability distribution over possible outputs by processing the alternative representation for the input sequence. 2. The method of claim 1 , wherein the alternative representation for the input sequence comprises the respective alternative feature representation for each of the second number of time steps. 3. The method of claim 2 , wherein the second number is smaller than the first number. 4. The method of claim 1 , wherein processing the alternative representation for the input sequence using an attention-based neural network comprises, for an initial position in the output sequence order: processing a placeholder start of sequence token and a placeholder initial attention context vector using the attention-based neural network to update a hidden state of the attention-based neural network from an initial hidden state to a hidden state for the initial position in the output sequence order; generating an attention context vector for the initial position from the alternative representation and the hidden state for the initial position in the output sequence order; and generating the set of substring scores for the initial position using the attention context vector for the initial position and the hidden state for the initial position. 5. The method of claim 4 , further comprising selecting, by the data processing hardware, the highest scoring possible output from the probability distribution of possible outputs at the initial position in the output sequence order. 6. The method of claim 1 , wherein processing the alternative representation for the input sequence using the attention-based neural network comprises, for each position after an initial position in the output sequence order: processing a substring at the preceding position in the output sequence order and the attention context vector for the preceding position in the order using the attention-based network to update the hidden state of the attention-based neural network from the hidden state for the preceding position to a hidden state for the position; generating an attention context vector for the position from the alternative representation and the neural network hidden state for the position in the output sequence order; and generating the set of substring scores for the position using the attention context vector for the position and the neural network hidden state for the position. 7. The method of claim 6 , further comprising selecting, by the data processing hardware, the highest scoring substring from the set of substring scores for the position as the substring at the position in the output sequence of substrings. 8. The method of claim 6 , wherein generating an attention context vector for the position from the alternative representation and the neural network hidden state for the position in the output sequence order comprises: computing a scalar energy for the position using the alternative representation and the hidden state of the attention-based neural network for the position; converting the computed scalar energy into a probability distribution using a softmax function; and using the probability distribution to create a context vector by combining the alternative representation at different positions. 9. The method of claim 1 , wherein the BLSTM RNN and the attention-based recurrent neural network are trained jointly. 10. The method of claim 1 , wherein processing the alternative representation for the input sequence using the attention-based neural network comprises processing the alternative representation using the attention-based neural network using a left to right beam search decoding. 11. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising: obtaining, at a pyramid Bidirectional Long Short Term Memory (BLSTM) Recurrent Neural Network (RNN) executing on the data processing hardware, an input sequence representing an utterance, the BLSTM RNN comprising: a bottom BLSTM layer configured to receive, as input, the input sequence representing the utterance; a first pyramid BLSTM layer configured to receive, as input, an output of the bottom BLSTM layer; and a second pyramid BLSTM layer configured to receive, as input, an output of the first pyramid BLSTM layer; at each of a first number of time steps, processing, using the bottom BLSTM layer, a respective feature representation of the input sequence to generate a respective bottom BLSTM layer output; processing, using the first pyramid BLSTM layer, the respective bottom BLSTM layer outputs generated for each of the first number of time steps to generate a sequence of first pyramid BLSTM layer outputs; at each of a second number of time steps: receiving, at the second pyramid BLSTM layer, a respective concatenation of consecutive first pyramid BLSTM layer outputs of the sequence of first pyramid BLSTM layer outputs generated using the first pyramid BLSTM layer; and processing, using the second pyramid BLSTM layer, the respective concatenation of consecutive first pyramid BLSTM layer outputs to generate a respective alternative feature representation for the corresponding time step of the second number of time steps; receiving, at an attention-based neural network executing on the data processing hardware, an alternative representation for the input sequence; and for each position in an output sequence, generating, using the attention-based neural network, a probability distribution over possible outputs by processing the alternative representation for the input sequence. 12. The system of claim 11 , wherein the alternative representation for the input sequence comprises the respective alternative feature representation for each of the second number of time steps. 13. The system of
Recurrent networks, e.g. Hopfield networks · CPC title
Combinations of networks · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Supervised learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.