What technology area does this patent fall under?

Primary CPC classification G10L15/16. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 08 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Lattice encoding using recurrent neural networks

US10176802B1 · US · B1

Patent metadata
Field	Value
Publication number	US-10176802-B1
Application number	US-201615091722-A
Country	US
Kind code	B1
Filing date	Apr 6, 2016
Priority date	Mar 21, 2016
Publication date	Jan 8, 2019
Grant date	Jan 8, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An automatic speech recognition (ASR) system may convert an ASR output lattice into a matrix form, thus maintaining certain information included in the lattice that might otherwise be lost in an N-best list output. The matrix representation of the lattice may be encoded using a recurrent neural network (RNN) to create a vector representation of the lattice. The vector representation may then be used by the system to perform additional operations, such as ASR results confirmation.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving audio data corresponding to an utterance; performing automatic speech recognition (ASR) processing on the audio data to obtain an ASR lattice portion; determining a matrix representing the ASR lattice portion, wherein the matrix comprises at least: a first representation of a first path of the ASR lattice portion, the first representation comprising: first data indicating that a first node is connected to a second node, the first data being associated with a first word, and second data indicating that the second node is connected to a third node, the second data being associated with a second word; and a second representation of a second path of the ASR lattice portion, the second representation comprising: third data indicating that the first node is connected to the third node, the third data being associated with a third word; encoding the matrix using a recurrent neural network (RNN) encoder to obtain an encoded vector corresponding to the ASR lattice portion, the encoding comprising: processing the first data using the RNN encoder to create a first encoded vector, wherein the first encoded vector represents a portion of the first path up to the second node, processing the second data and the first encoded vector using the RNN encoder to create a second encoded vector, wherein the second encoded vector represents the first path, processing the third data using the RNN encoder to create a third encoded vector, wherein the third encoded vector represents the second path, and combining the second encoded vector and the third encoded vector to create a fourth encoded vector, wherein the fourth encoded vector represents the ASR lattice portion; and determine at least one ASR hypothesis based at least in part on the fourth encoded vector. 2. The computer-implemented method of claim 1 , wherein: processing the first data comprises inputting the first data to a first internal input of the RNN encoder and outputting the first encoded vector at a first internal output of the RNN encoder; processing the second data and the first encoded vector comprises: inputting the second data and the first encoded vector to a second internal input of the RNN encoder, and outputting the second encoded vector at a second internal output of the RNN encoder; processing the third data comprises inputting the third data to a third internal input of the RNN encoder and outputting the third encoded vector at a third internal output of the RNN encoder. 3. The computer-implemented method of claim 2 , wherein combining the second encoded vector and the third encoded vector comprises: inputting the second encoded vector and the third encoded vector to a fourth internal input of the RNN encoder; and combining the second encoded vector and the third encoded vector to output the fourth encoded vector at an external output of the RNN encoder. 4. The computer-implemented method of claim 1 , wherein: the matrix comprises a third representation of a third path, the third representation comprising fourth data indicating that the first node is connected to a fourth node, the fourth data being associated with a fourth word; encoding the matrix further comprises processing the fourth data using the RNN encoder to create a fifth encoded vector; and the method further comprises combining the fourth encoded vector and the fifth encoded vector to create a sixth encoded vector using a combination function that is selected based at least in part on assigned weights, wherein the sixth encoded vector comprises an encoded vector corresponding to an entirety of lattice portion. 5. A computing device comprising: at least one processor; and at least one memory including instructions operable to be executed by the at least one processor to configure the device to: receive first audio data corresponding to a first utterance; process at least a portion of the first audio data to determine a matrix representing at least a portion of an automatic speech recognition (ASR) lattice, the matrix comprising at least: first data representing a first arc of the ASR lattice, the first arc originating at a first node of the ASR lattice and terminating at a second node of the ASR lattice, and second data representing a second arc of the ASR lattice, the second arc originating at a third node of the ASR lattice and terminating at the second node of the ASR lattice; process the first data using a recurrent neural network (RNN) encoder to create a first encoded vector; process the second data using the RNN encoder to create a second encoded vector; combine the first encoded vector and the second encoded vector to create a third encoded vector; and determine at least one ASR hypothesis based at least in part on the third encoded vector. 6. The computing device of claim 5 , wherein the at least one memory includes additional instructions operable to be executed by the at least one processor to further configure the computing device to combine the first encoded vector and the second encoded vector at least by: adding the first encoded vector and the second encoded vector to obtain a sum; and dividing the sum by two to obtain the third encoded vector. 7. The computing device of claim 5 , wherein the at least one memory includes additional instructions operable to be executed by the at least one processor to further configure the computing device to: process the first data using the RNN encoder at least by inputting the first data to a first internal input of the RNN encoder and outputting the first encoded vector at a first internal output of the RNN encoder; and process the second data using the RNN encoder at least by inputting the second data to a second internal input of the RNN encoder and outputting the second encoded vector at a second internal output of the RNN encoder. 8. The computing device of claim 7 , wherein the at least one memory includes additional instructions operable to be executed by the at least one processor to further configure the computing device to combine the first encoded vector and the second encoded vector at least by: inputting the first encoded vector and the second encoded vector to a third internal input of the RNN encoder; and combining the first encoded vector and the second encoded vector to create the third encoded vector at an external output of the RNN encoder. 9. The computing device of claim 5 , wherein the matrix further comprises third data representing a third arc originating at the second node and terminating at a fourth node, wherein the fourth node is a first final node of the ASR lattice, and wherein the at least one memory includes additional instructions operable to be executed by the at least one processor to further configure the computing device to: process the third data and the third encoded vector using the RNN encoder to create a fourth encoded vector. 10. The computing device of claim 9 , wherein the matrix further comprises fourth data representing a fourth arc originating at the second node and terminating at a fifth node, wherein the fifth node is a second final node of the ASR lattice, and wherein the at least one memory includes additional instructions operable to be executed by the at least one processor to further configure the computing device to: process the fourth data and the third encoded vector using the RNN encoder to create a fifth encoded vector; and combine the fourth encoded vector and the fifth encoded vector to create a sixth encoded vector, wherein the sixth encoded vector comprises an encoded vector corresponding to an entirety lattice. 11. The comput

Assignees

Amazon Tech Inc

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title
G06N5/01
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G10L19/038
Vector quantisation, e.g. TwinVQ audio · CPC title

Patent family

Related publications grouped by family.

View patent family 64815587

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10176802B1 cover?: An automatic speech recognition (ASR) system may convert an ASR output lattice into a matrix form, thus maintaining certain information included in the lattice that might otherwise be lost in an N-best list output. The matrix representation of the lattice may be encoded using a recurrent neural network (RNN) to create a vector representation of the lattice. The vector representation may then be…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 08 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).