Lattice encoding using recurrent neural networks

US10176802B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10176802-B1
Application numberUS-201615091722-A
CountryUS
Kind codeB1
Filing dateApr 6, 2016
Priority dateMar 21, 2016
Publication dateJan 8, 2019
Grant dateJan 8, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An automatic speech recognition (ASR) system may convert an ASR output lattice into a matrix form, thus maintaining certain information included in the lattice that might otherwise be lost in an N-best list output. The matrix representation of the lattice may be encoded using a recurrent neural network (RNN) to create a vector representation of the lattice. The vector representation may then be used by the system to perform additional operations, such as ASR results confirmation.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving audio data corresponding to an utterance; performing automatic speech recognition (ASR) processing on the audio data to obtain an ASR lattice portion; determining a matrix representing the ASR lattice portion, wherein the matrix comprises at least: a first representation of a first path of the ASR lattice portion, the first representation comprising: first data indicating that a first node is connected to a second node, the first data being associated with a first word, and second data indicating that the second node is connected to a third node, the second data being associated with a second word; and a second representation of a second path of the ASR lattice portion, the second representation comprising: third data indicating that the first node is connected to the third node, the third data being associated with a third word; encoding the matrix using a recurrent neural network (RNN) encoder to obtain an encoded vector corresponding to the ASR lattice portion, the encoding comprising: processing the first data using the RNN encoder to create a first encoded vector, wherein the first encoded vector represents a portion of the first path up to the second node, processing the second data and the first encoded vector using the RNN encoder to create a second encoded vector, wherein the second encoded vector represents the first path, processing the third data using the RNN encoder to create a third encoded vector, wherein the third encoded vector represents the second path, and combining the second encoded vector and the third encoded vector to create a fourth encoded vector, wherein the fourth encoded vector represents the ASR lattice portion; and determine at least one ASR hypothesis based at least in part on the fourth encoded vector. 2. The computer-implemented method of claim 1 , wherein: processing the first data comprises inputting the first data to a first internal input of the RNN encoder and outputting the first encoded vector at a first internal output of the RNN encoder; processing the second data and the first encoded vector comprises: inputting the second data and the first encoded vector to a second internal input of the RNN encoder, and outputting the second encoded vector at a second internal output of the RNN encoder; processing the third data comprises inputting the third data to a third internal input of the RNN encoder and outputting the third encoded vector at a third internal output of the RNN encoder. 3. The computer-implemented method of claim 2 , wherein combining the second encoded vector and the third encoded vector comprises: inputting the second encoded vector and the third encoded vector to a fourth internal input of the RNN encoder; and combining the second encoded vector and the third encoded vector to output the fourth encoded vector at an external output of the RNN encoder. 4. The computer-implemented method of claim 1 , wherein: the matrix comprises a third representation of a third path, the third representation comprising fourth data indicating that the first node is connected to a fourth node, the fourth data being associated with a fourth word; encoding the matrix further comprises processing the fourth data using the RNN encoder to create a fifth encoded vector; and the method further comprises combining the fourth encoded vector and the fifth encoded vector to create a sixth encoded vector using a combination function that is selected based at least in part on assigned weights, wherein the sixth encoded vector comprises an encoded vector corresponding to an entirety of lattice portion. 5. A computing device comprising: at least one processor; and at least one memory including instructions operable to be executed by the at least one processor to configure the device to: receive first audio data corresponding to a first utterance; process at least a portion of the first audio data to determine a matrix representing at least a portion of an automatic speech recognition (ASR) lattice, the matrix comprising at least: first data representing a first arc of the ASR lattice, the first arc originating at a first node of the ASR lattice and terminating at a second node of the ASR lattice, and second data representing a second arc of the ASR lattice, the second arc originating at a third node of the ASR lattice and terminating at the second node of the ASR lattice; process the first data using a recurrent neural network (RNN) encoder to create a first encoded vector; process the second data using the RNN encoder to create a second encoded vector; combine the first encoded vector and the second encoded vector to create a third encoded vector; and determine at least one ASR hypothesis based at least in part on the third encoded vector. 6. The computing device of claim 5 , wherein the at least one memory includes additional instructions operable to be executed by the at least one processor to further configure the computing device to combine the first encoded vector and the second encoded vector at least by: adding the first encoded vector and the second encoded vector to obtain a sum; and dividing the sum by two to obtain the third encoded vector. 7. The computing device of claim 5 , wherein the at least one memory includes additional instructions operable to be executed by the at least one processor to further configure the computing device to: process the first data using the RNN encoder at least by inputting the first data to a first internal input of the RNN encoder and outputting the first encoded vector at a first internal output of the RNN encoder; and process the second data using the RNN encoder at least by inputting the second data to a second internal input of the RNN encoder and outputting the second encoded vector at a second internal output of the RNN encoder. 8. The computing device of claim 7 , wherein the at least one memory includes additional instructions operable to be executed by the at least one processor to further configure the computing device to combine the first encoded vector and the second encoded vector at least by: inputting the first encoded vector and the second encoded vector to a third internal input of the RNN encoder; and combining the first encoded vector and the second encoded vector to create the third encoded vector at an external output of the RNN encoder. 9. The computing device of claim 5 , wherein the matrix further comprises third data representing a third arc originating at the second node and terminating at a fourth node, wherein the fourth node is a first final node of the ASR lattice, and wherein the at least one memory includes additional instructions operable to be executed by the at least one processor to further configure the computing device to: process the third data and the third encoded vector using the RNN encoder to create a fourth encoded vector. 10. The computing device of claim 9 , wherein the matrix further comprises fourth data representing a fourth arc originating at the second node and terminating at a fifth node, wherein the fifth node is a second final node of the ASR lattice, and wherein the at least one memory includes additional instructions operable to be executed by the at least one processor to further configure the computing device to: process the fourth data and the third encoded vector using the RNN encoder to create a fifth encoded vector; and combine the fourth encoded vector and the fifth encoded vector to create a sixth encoded vector, wherein the sixth encoded vector comprises an encoded vector corresponding to an entirety lattice. 11. The comput

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Vector quantisation, e.g. TwinVQ audio · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10176802B1 cover?
An automatic speech recognition (ASR) system may convert an ASR output lattice into a matrix form, thus maintaining certain information included in the lattice that might otherwise be lost in an N-best list output. The matrix representation of the lattice may be encoded using a recurrent neural network (RNN) to create a vector representation of the lattice. The vector representation may then be…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 08 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).