What technology area does this patent fall under?

Primary CPC classification G10L15/16. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 24 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Speech recognition with attention-based recurrent neural networks

US9799327B1 · US · B1

Patent metadata
Field	Value
Publication number	US-9799327-B1
Application number	US-201615055476-A
Country	US
Kind code	B1
Filing date	Feb 26, 2016
Priority date	Feb 26, 2016
Publication date	Oct 24, 2017
Grant date	Oct 24, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing an utterance, and the input acoustic sequence comprising a respective acoustic feature representation at each of a first number of time steps; processing the input acoustic sequence using a first neural network to convert the input acoustic sequence into an alternative representation for the input acoustic sequence; processing the alternative representation for the input acoustic sequence using an attention-based Recurrent Neural Network (RNN) to generate, for each position in an output sequence order, a set of substring scores that includes a respective substring score for each substring in a set of substrings; and generating a sequence of substrings that represent a transcription of the utterance.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method comprising: obtaining an input acoustic sequence, the input acoustic sequence representing an utterance, and the input acoustic sequence comprising a respective acoustic feature representation at each of a first number of time steps; processing the input acoustic sequence using a first neural network to convert the input acoustic sequence into an alternative representation for the input acoustic sequence, wherein the alternative representation for the input acoustic sequence comprises a respective alternative acoustic feature representation for each of a second number of time steps, and wherein the second number is smaller than the first number; processing the alternative representation for the input acoustic sequence using an attention-based Recurrent Neural Network (RNN) to generate, for each position in an output sequence order, a set of substring scores that includes a respective substring score for each substring in a set of substrings; and generating a sequence of substrings that represent a transcription of the utterance. 2. The method of claim 1 , wherein a substring comprises one or more characters. 3. The method of claim 2 , wherein the set of substrings comprises a set of alphabetic letters which is used to write one or more natural languages. 4. The method of claim 3 , wherein the substrings in the set of substrings further comprise a space character, a comma character, a period character, an apostrophe character, and an unknown character. 5. The method of claim 1 , wherein the generated sequence of substrings begins with a start of sequence token <sos> and ends with an end of sequence token <eos>. 6. The method of claim 1 , wherein the first neural network is a pyramid Bidirectional Long Short Term Memory (BLSTM) RNN. 7. The method of claim 6 , wherein processing the input acoustic sequence using a first neural network to convert the input acoustic sequence into an alternative representation for the input acoustic sequence comprises: processing the input acoustic sequence through a bottom BLSTM layer to generate a BLSTM layer output; and processing the BLSTM layer output through each a plurality of pyramid BLSTM layers, wherein consecutive outputs of each pyramid BLSTM layer are concatenated before being provided to the next pyramid BLSTM layer. 8. The method of claim 1 , wherein processing the alternative representation for the input acoustic sequence using an attention-based RNN comprises, for an initial position in the output sequence order: processing a placeholder start of sequence token and a placeholder initial attention context vector using the attention-based RNN to update a hidden state of the attention-based RNN from an initial hidden state to a hidden state for the initial position in the output sequence order; generating an attention context vector for the initial position from the alternative representation and the RNN hidden state for the initial position in the output sequence order; and generating the set of substring scores for the initial position using the attention context vector for the initial position and the RNN hidden state for the initial position. 9. The method of claim 8 , further comprising selecting a highest scoring substring from the set of substring scores as the substring at the initial position in the output sequence of substrings. 10. The method of claim 1 , wherein processing the alternative representation for the input acoustic sequence using an attention-based Recurrent Neural Network (RNN) comprises, for each position after an initial position in the output sequence order: processing a substring at the preceding position in the output sequence order and an attention context vector for the preceding position in the order using the attention-based RNN to update the hidden state of the attention-based RNN from the hidden state for the preceding position to a hidden state for the position; generating an attention context vector for the position from the alternative representation and the RNN hidden state for the position in the output sequence order; and generating the set of substring scores for the position using the attention context vector for the position and the RNN hidden state for the position. 11. The method of claim 10 , further comprising selecting a highest scoring substring from the set of substring scores for the position as the substring at the position in the output sequence of substrings. 12. The method of claim 10 , wherein generating an attention context vector for the position from the alternative representation and the RNN hidden state for the position in the output sequence order comprises: computing a scalar energy for the position using the alternative representation and the hidden state of the attention-based RNN for the position; converting the computed scalar energy into a probability distribution using a softmax function; and using the probability distribution to create a context vector by combining the alternative representation at different positions. 13. The method of claim 10 , wherein generating the set of substring scores for the position using the attention context vector for the position and the RNN hidden state for the position comprises: providing the hidden state of the attention-based RNN for the position and generated attention context vector for the position as input to a multi-layer perceptron (MLP) with a softmax output layer; and processing the hidden state of the attention-based RNN for the position and generated attention context vector for the position using the MLP to generate a set of substring scores for each substring in the set of substrings for the position. 14. The method of claim 1 , wherein the first neural network and the attention-based recurrent neural network are trained jointly. 15. The method of claim 1 , wherein processing the alternative representation for the input sequence using an attention-based Recurrent Neural Network (RNN) comprises processing the alternative representation using an attention-based RNN using a left to right beam search decoding. 16. A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: obtaining an input acoustic sequence, the input acoustic sequence representing an utterance, and the input acoustic sequence comprising a respective acoustic feature representation at each of a first number of time steps; processing the input acoustic sequence using a first neural network to convert the input acoustic sequence into an alternative representation for the input acoustic sequence, wherein the alternative representation for the input acoustic sequence comprises a respective alternative acoustic feature representation for each of a second number of time steps, and wherein the second number is smaller than the first number; processing the alternative representation for the input acoustic sequence using an attention-based Recurrent Neural Network (RNN) to generate, for each position in an output sequence order, a set of substring scores that includes a respective substring score for each substring in a set of substrings; and generating a sequence of substrings that represent a transcription of the utterance. 17. The system of claim 16 , wherein a substring comprises one or more characters. 18. The system of claim 17 , wherein the set of substrings comprises a set of alphabetic letters whic

Assignees

Google Inc

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G10L25/30
using neural networks · CPC title
G06F40/197
Version control (for software G06F8/71) · CPC title
G06F40/12
Use of codes for handling textual entities · CPC title

Patent family

Related publications grouped by family.

View patent family 60082253

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9799327B1 cover?: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing an utterance, and the input acoustic sequence comprising a respective acoustic feature representation at each of a first number of time steps; processing the input acoustic sequen…
Who is the assignee on this patent?: Google Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 24 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).