Acoustic processing unit interface for determining senone scores using a greater clock frequency than that corresponding to received audio

US9785613B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9785613-B2
Application numberUS-201213490124-A
CountryUS
Kind codeB2
Filing dateJun 6, 2012
Priority dateDec 19, 2011
Publication dateOct 10, 2017
Grant dateOct 10, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present invention include an apparatus, method, and system for acoustic modeling. In an embodiment, a speech recognition system is provided. The system includes a processing unit configured to divide a received audio signal into consecutive frames having respective frame vectors, an acoustic processing unit (APU), a data bus that couples the processing unit and the APU. The APU includes a local, non-volatile memory that stores a plurality of senones, a memory buffer coupled to the memory, the acoustic processing unit being configured to load at least one Gaussian probability distribution vector stored in the memory into the memory buffer, and a scoring unit configured to simultaneously compare a plurality of dimensions of a Gaussian probability distribution vector loaded into the memory buffer with respective dimensions of a frame vector received from the processing unit and to output a corresponding score to the processing unit. The APU is further configured to divide a clock frequency associated with the received audio signal to a frequency greater than the clock frequency associated with the received audio signal in order to help the score calculation operate faster than the clock frequency of the received audio signal.

First claim

Opening claim text (preview).

What is claimed is: 1. A speech recognition system, comprising: a processing unit configured to divide a received audio signal into consecutive frames; an acoustic processing unit (APU), comprising: a local, non-volatile memory that stores a plurality of senones; a memory buffer coupled to the memory, wherein the acoustic processing unit is configured to load at least one Gaussian probability distribution vector stored in the memory into the memory buffer; and a scoring unit configured to simultaneously compare a plurality of dimensions of a Gaussian probability distribution vector loaded into the memory buffer with respective dimensions of a frame vector received from the processing unit and to output a distance score; a senone scoring control unit to divide a clock frequency associated with the received audio signal and provide the divided clock frequency to the scoring unit, wherein the scoring unit operates at the divided clock frequency and the divided clock frequency is greater than the clock frequency associated with the received audio signal, wherein the acoustic processing unit is configured to perform a comparison using a first frame to generate the distance score while the processing unit performs a search to find a senone score match using another distance score that corresponds to a second frame, the second frame immediately preceding the first frame; and a data bus that couples the processing unit and the APU. 2. The speech recognition system of claim 1 , wherein the processing unit is configured to concurrently run a search thread and a distance computation thread. 3. The speech recognition system of claim 2 , wherein the processing unit comprises: an application programming interface (API) module configured to receive a command from the distance computation thread and generate one or more corresponding commands to be received by the APU. 4. The speech recognition system of claim 3 , wherein the API module comprises: a Generic DCA configured to receive a command from the distance computation thread and output one or more functions in a library that implements the received command. 5. The speech recognition system of claim 4 , the Generic DCA specifies at least: (i) a Create function that stores an acoustic model, a number of dimensions in a feature vector, and a number of senones in the acoustic model as state information; (ii) a Set Feature function that stores a feature vector corresponding to a received frameID; (iii) a Compute Scores function that specifies at least one senone to be scored for a frame; (iv) a Fill Scores function that stores senone scores in a buffer; (v) a Set Feature Matrix function that stores a feature vector transform matrix and adapts the comparison to a specific speaker. 6. The speech recognition system of claim 5 , wherein the API module further comprises an APU library configured to receive parameters from the Generic DCA and output parameters compatible with the APU. 7. The speech recognition system of claim 6 , the APU library specifies at least: (i) a Set Acoustic Model function that sets an acoustic model to be used for senone scoring; (ii) a Load Feature Vector function that loads a feature vector in to the APU; (iii) a Score Senone Chunk function that loads a senone list in to the APU; (iv) a Score Range function that specifies that all senones in a range are to be scored; (v) a Read Senone Scores function that reads senone scores and stores the senone scores in a destination buffer; (vi) a Check Score Ready Status function that determines if senone scores are ready to be read from the APU; (vii) a Read Score Length function that reads a first status register of the APU to determine a number of score entries that are available; (viii) a Read Status function that reads a second status register of the APU to determine a status of a read operation; (ix) a Read Configuration function that reads a configuration register of the APU; and (x) a Write Configuration function that writes to the configuration register. 8. The speech recognition system of claim 6 , wherein the API module further comprises: a hardware abstraction layer (HAL) configured to interface between the APU library and the APU. 9. An acoustic processing method, comprising: dividing a received audio signal into a plurality of frames using a processing unit; comparing a feature vector associated with a first frame of the plurality of frames to a Gaussian probability distribution vector using an acoustic processing unit (APU) to generate a distance score; dividing a clock frequency associated with the received audio signal, using a senone scoring control unit, wherein the divided clock frequency is greater than the clock frequency associated with the received audio signal, wherein the APU uses the divided clock frequency to generate the distance score; and concurrently with the comparing, performing a search to find a senone score match using another distance score that corresponds to a feature vector associated with a second frame of the plurality of frames received from an acoustic processing unit (APU) using the processing unit, wherein the second frame immediately precedes the first frame and wherein the processing unit and the APU are coupled over a data bus. 10. The acoustic processing method of claim 9 , wherein the distance computation thread controls the comparing via an application programming interface (API). 11. The acoustic processing method of claim 10 , wherein the API comprises: a Generic DCA; an APU library; and a hardware abstraction layer (HAL). 12. The acoustic processing method of claim 11 , the Generic DCA specifies at least: (i) a Create function that stores an acoustic model, a number of dimensions in a feature vector, and a number of senones in the acoustic model as state information; (ii) a Set Feature function that stores a feature vector corresponding to a received frameID; (iii) a Compute Scores function that specifies at least one senone to be scored for a frame; (iv) a Fill Scores function that stores senone scores in a buffer; and (v) a Set Feature Matrix function that stores a feature vector transform matrix and adapts the comparison to a specific speaker. 13. The acoustic processing method of claim 11 , the APU library specifies at least: (i) a Set Acoustic Model function that sets an acoustic model to be used for senone scoring; (ii) a Load Feature Vector function that loads a feature vector in to the APU; (iii) a Score Senone Chunk function that loads a senone list in to the APU; (iv) a Score Range function that specifies that all senones in a range are to be scored; (v) a Read Senone Scores function that reads senone scores and stores the senone scores in a destination buffer; (vi) a Check Score Ready Status function that determines if senone scores are ready to be read from the APU; (vii) a Read Score Length function that reads a first status register of the APU to determine a number of score entries that are available; (viii) a Read Status function that reads a second status register of the APU to determine a status of a read operation; (ix) a Read Configuration function that reads a configuration register of the APU; and (x) a Write Configuration function that writes to the configuration register. 14. The acoustic processing method of claim 9 , further comprising: creating a search thread and a distance computation thread on the processing unit. 15. A non-transitory computer readable medium having stored therein one or more sequences of one or more instructions for execution by one or more proc

Assignees

Inventors

Classifications

  • Speech classification or search · CPC title

  • Feature extraction for speech recognition; Selection of recognition unit · CPC title

  • Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title

  • G06F17/10Primary

    Complex mathematical operations {(function generation by table look-up G06F1/03; evaluation of elementary functions by calculation G06F7/544)} · CPC title

  • using statistical models, e.g. Hidden Markov Models [HMMs] (G10L15/18 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9785613B2 cover?
Embodiments of the present invention include an apparatus, method, and system for acoustic modeling. In an embodiment, a speech recognition system is provided. The system includes a processing unit configured to divide a received audio signal into consecutive frames having respective frame vectors, an acoustic processing unit (APU), a data bus that couples the processing unit and the APU. The A…
Who is the assignee on this patent?
Natarajan Venkataraman, Rosner Stephan, Cypress Semiconductor Corp
What technology area does this patent fall under?
Primary CPC classification G06F17/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 10 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).