Compressed finite state transducers for automatic speech recognition

US9865254B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9865254-B1
Application numberUS-201615187102-A
CountryUS
Kind codeB1
Filing dateJun 20, 2016
Priority dateFeb 29, 2016
Publication dateJan 9, 2018
Grant dateJan 9, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Compact finite state transducers (FSTs) for automatic speech recognition (ASR). An HCLG FST and/or G FST may be compacted at training time to reduce the size of the FST to be used at runtime. The compact FSTs may be significantly smaller (e.g., 50% smaller) in terms of memory size, thus reducing the use of computing resources at runtime to operate the FSTs. The individual arcs and states of each FST may be compacted by binning individual weights, thus reducing the number of bits needed for each weight. Further, certain fields such as a next state ID may be left out of a compact FST if an estimation technique can be used to reproduce the next state at runtime. During runtime portions of the FSTs may be decompressed for processing by an ASR engine.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for creating and using a compact finite state transducer for automatic speech recognition, the method comprising: during a training period: identifying a first representation of a finite state transducer (FST) for automatic speech recognition (ASR), wherein the FST corresponds to an ASR grammar and is configured to be traversed using a first sequence of words with corresponding scores and output a second sequence of words with corresponding scores, wherein the first representation occupies a first amount of memory and comprises: a first table representing states of the FST, the first table including a first record representing a first state of the FST, the first record comprising: a 32 bit representation of a weight of the first state, and a first reference to a second record in a second table, the second record representing at least one first arc, the first arc representing a connection from the first state to another state in the FST; and the second table representing arcs of the FST, the second table including the second record representing the first arc, the second record comprising: a 32 bit representation of a weight of the first arc, a second reference to a third record in the first table, the third record representing a second state of the FST, the second state being a destination state for the first arc, a first representation of an input label of the first arc, and a second representation of an output label of the first arc; and configuring a second representation of the FST, wherein the second representation occupies a second amount of memory less than the first amount of memory and comprises: a third table representing at least a portion of the states of the FST, the third table including a fourth record representing the first state, the fourth record comprising: a 12 bit representation of the weight of the first state, and a third reference to a fifth record in a fourth table, the fifth record representing the at least one first arc; the fourth table representing at least a portion of the arcs of the FST, the fourth table including the fifth record representing the first arc, the fifth record comprising: a 12 bit representation of the weight of the first arc, a third representation of an input/output label pair corresponding to the first arc, an identifier of a destination state of the first arc, the first representation of the input label of the first arc, and the second representation of the output label of the first arc; and during a runtime period: receiving audio data comprising audio feature vectors for ASR; and determining ASR output using the second representation of the FST and the audio data. 2. The computer-implemented method of claim 1 , further comprising, during the runtime period: identifying a third representation of a second FST, wherein the second FST is an HCL FST configured to be traversed using acoustic unit data and to output words; processing the audio data using an acoustic model to obtain acoustic unit data; loading the third representation of the second FST into a memory; loading the second representation of the FST into the memory; determining a score for the fifth record using the acoustic unit data; determining an output label using the third representation of the input/output label pair; and determining the ASR output using the output label. 3. The computer-implemented method of claim 1 , further comprising, during the training period: identifying, using the first representation, a third state of the FST; determining the third state is associated with a certain number of outgoing arcs; determining that the certain number is above a threshold; including in the third table, a sixth record corresponding to the third state, the sixth record including a bit indicating that information about the third state is available outside the third table; and including, in the second representation, the information about the third state, the information comprising: a 32 bit representation of a weight of the third state, a fourth reference to a seventh record in the fourth table, the seventh record representing a second arc, the second arc outgoing from the second state in the FST, first data representing a number of arcs associated with the third state that have a blank input label, and second data representing a number of arcs associated with the third state that have a blank output label. 4. The computer-implemented method of claim 1 , further comprising, during the runtime period: receiving an indication to capture audio for speech processing; loading the 12 bit representation of the weight of the first state into memory; converting the 12 bit representation of the weight of the first state into a second 32 bit representation of the weight of the first state; storing the second 32 bit representation of the weight of the first state; receiving the audio data; and performing speech processing using the audio data and the second 32 bit representation of the weight of the first state. 5. A computer-implemented method, comprising: receiving audio data; identifying a first representation of a first finite state transducer (FST) for automatic speech recognition (ASR), the FST configured to be traversed using input words and to output words, the first representation including: a first table representing states of the first FST, the first table including a first record representing a first state, the first record comprising: a first representation of a weight of the first state, and a reference to a second record in a second table, the second record representing a first arc; the second table representing arcs of the first FST, the second table including the second record, the second record comprising: a second representation of a weight of the first arc, a 12 bit representation of the weight of the first arc, a third representation of an input/output label pair, and an identifier of a destination state of the first arc; and identifying a second representation of a second FST, wherein the second FST is an FST configured to be traversed using acoustic unit data and to output words; loading the first representation of the first FST into a memory; loading the second representation of the second FST into the memory; performing ASR on a first portion of the audio data using the first record and the second record; determining an output label using the third representation of the input/output label pair; and determining ASR output using the output label. 6. The computer-implemented method of claim 5 , further comprising: receiving an indication to capture audio for speech processing; loading the 12 bit representation of the weight of the first state into memory; converting the 12 bit representation of the weight of the first state into a second 32 bit representation of the weight of the first state; storing the second 32 bit representation of the weight of the first state; and performing speech processing using the audio data and the second 32 bit representation of the weight of the first state. 7. The computer-implemented method of claim 5 , further comprising loading the 12 bit representation of the weight of the first state into memory after receiving the indication but prior to receiving the audio data. 8. The computer-implemented method of claim 5 , further comprising configuring the second table to include a third record corresponding to a second arc, the third record occupying a same amount of space in memory as the second record. 9. The computer-implemented method of claim 5 , further comprising configuring the third representation of the input/output la

Assignees

Inventors

Classifications

  • updating or merging of old and new templates; Mean values; Weighting · CPC title

  • Parsing for meaning understanding · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

  • Training · CPC title

  • Feature extraction for speech recognition; Selection of recognition unit · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9865254B1 cover?
Compact finite state transducers (FSTs) for automatic speech recognition (ASR). An HCLG FST and/or G FST may be compacted at training time to reduce the size of the FST to be used at runtime. The compact FSTs may be significantly smaller (e.g., 50% smaller) in terms of memory size, thus reducing the use of computing resources at runtime to operate the FSTs. The individual arcs and states of eac…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/193. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 09 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).