Compressed finite state transducers for automatic speech recognition

US10381000B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10381000-B1
Application numberUS-201815864689-A
CountryUS
Kind codeB1
Filing dateJan 8, 2018
Priority dateFeb 29, 2016
Publication dateAug 13, 2019
Grant dateAug 13, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Compact finite state transducers (FSTs) for automatic speech recognition (ASR). An HCLG FST and/or G FST may be compacted at training time to reduce the size of the FST to be used at runtime. The compact FSTs may be significantly smaller (e.g., 50% smaller) in terms of memory size, thus reducing the use of computing resources at runtime to operate the FSTs. The individual arcs and states of each FST may be compacted by binning individual weights, thus reducing the number of bits needed for each weight. Further, certain fields such as a next state ID may be left out of a compact FST if an estimation technique can be used to reproduce the next state at runtime. During runtime portions of the FSTs may be decompressed for processing by an ASR engine.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving compressed language model data; detecting audio using a microphone, the audio corresponding to an utterance; determining audio data corresponding to the audio; processing at least a portion of the compressed language model data to determine uncompressed language model data; performing speech recognition using the audio data and the uncompressed language model data to determine text data; deleting the uncompressed language model data from the memory but maintaining a copy of the compressed language model data; and causing a command to be executed using at least the text data. 2. The computer-implemented method of claim 1 , wherein the compressed language model data comprises a portion of a compressed language model. 3. The computer-implemented method of claim 1 , wherein the compressed language model data comprises compressed data corresponding to a finite state transducer (FST). 4. The computer-implemented method of claim 3 , wherein the FST is configured to be traversed using input words and to output words. 5. The computer-implemented method of claim 1 , further comprising: detecting second audio corresponding to a second utterance; determining second audio data corresponding to the second audio; and sending the second audio data to at least one remote device for speech processing. 6. The computer-implemented method of claim 1 , wherein processing the at least a portion of the compressed language model data to determine uncompressed language model data occurs prior to detecting the audio using the microphone. 7. The computer-implemented method of claim 1 , further comprising: receiving an indication from a second device, wherein processing the at least a portion of the compressed language model data to determine uncompressed language model data occurs in response to receiving the indication. 8. The computer-implemented method of claim 7 , wherein the indication corresponds to at least one of: a vehicle starting, a button being pressed, an alarm about to sound, or a delivery person approaching a location. 9. The computer-implemented method of claim 1 , wherein the compressed language model data corresponds to a user profile associated with a device that includes the microphone. 10. The computer-implemented method of claim 1 , further comprising, before processing the at least a portion of the compressed language model data to determine uncompressed language model data: determining that the utterance included a wakeword. 11. A device, comprising: at least one processor; at least one microphone; and memory including instructions operable to be executed by the at least one processor to configure the device to: receive compressed language model data; detect audio using the at least one microphone, the audio corresponding to an utterance; determine audio data corresponding to the audio; process at least a portion of the compressed language model data to determine uncompressed language model data; perform speech recognition using the audio data and the uncompressed language model data to determine text data; delete the uncompressed language model data from the memory but maintain a copy of the compressed language model data; and cause a command to be executed using at least the text data. 12. The device of claim 11 , wherein the compressed language model data comprises a portion of a compressed language model. 13. The device of claim 11 , wherein the compressed language model data comprises compressed data corresponding to a finite state transducer (FST). 14. The device of claim 13 , wherein the FST is configured to be traversed using input words and to output words. 15. The device of claim 11 , wherein the memory further includes instructions that, when executed by the at least one processor further configure the device to: detect second audio corresponding to a second utterance; determine second audio data corresponding to the second audio; and send the second audio data to at least one remote device for speech processing. 16. The device of claim 11 , wherein the memory further includes instructions that, when executed by the at least one processor further configure the device to, before processing the at least a portion of the compressed language model data to determine uncompressed language model data: determine that the utterance included a wakeword. 17. The device of claim 11 , wherein the instructions to process the at least a portion of the compressed language model data to determine uncompressed language model data are executed prior to the instructions to detect the audio using the microphone. 18. The device of claim 11 , wherein the memory further includes instructions that, when executed by the at least one processor further configure the device to: receive an indication from a second device, wherein the instructions to process the at least a portion of the compressed language model data to determine uncompressed language model data are executed in response to receiving the indication. 19. The device of claim 11 , wherein the compressed language model data corresponds to a user profile associated with a device that includes the microphone. 20. The device of claim 11 , wherein the memory further includes instructions that, before processing the at least a portion of the compressed language model data to determine uncompressed language model data: determine that the utterance included a wakeword.

Assignees

Inventors

Classifications

  • Training · CPC title

  • Parsing for meaning understanding · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

  • G10L15/193Primary

    Formal grammars, e.g. finite state automata, context free grammars or word networks · CPC title

  • updating or merging of old and new templates; Mean values; Weighting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10381000B1 cover?
Compact finite state transducers (FSTs) for automatic speech recognition (ASR). An HCLG FST and/or G FST may be compacted at training time to reduce the size of the FST to be used at runtime. The compact FSTs may be significantly smaller (e.g., 50% smaller) in terms of memory size, thus reducing the use of computing resources at runtime to operate the FSTs. The individual arcs and states of eac…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/193. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 13 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).