Learning front-end speech recognition parameters within neural network training
US-2015161995-A1 · Jun 11, 2015 · US
US10573294B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10573294-B2 |
| Application number | US-201715858112-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 29, 2017 |
| Priority date | Jun 5, 2017 |
| Publication date | Feb 25, 2020 |
| Grant date | Feb 25, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the present disclosure provide a speech recognition method based on artificial intelligence, and a terminal. The method includes obtaining speech data to be recognized; performing a processing on the speech data to be recognized using a trained sub-band energy normalized acoustic model, to determine an normalized energy feature corresponding to each time-frequency unit in the speech data to be recognized; and determining text data corresponding to the speech data to be recognized according to the normalized energy feature corresponding to each time-frequency unit.
Opening claim text (preview).
What is claimed is: 1. A speech recognition method based on artificial intelligence, comprising: obtaining speech data to be recognized; performing a processing on the speech data to be recognized using a trained sub-band energy normalized acoustic model, to determine a normalized energy feature corresponding to each time-frequency unit in the speech data to be recognized; and determining text data corresponding to the speech data to be recognized according to the normalized energy feature corresponding to each time-frequency unit; wherein before performing the processing on the speech data to be recognized using the trained sub-band energy normalized acoustic model, the method further comprises: performing a pre-processing on the speech data to be recognized, to determine an energy value of a filter bank corresponding to each time-frequency unit in the speech data to be recognized, wherein performing the processing on the speech data to be recognized using the trained sub-band energy normalized acoustic model comprises: performing the processing on the energy value of the filter bank corresponding to each time-frequency unit in the speech data to be recognized using the trained sub-band energy normalized acoustic model, wherein determining the normalized energy feature corresponding to each time-frequency unit in the speech data to be recognized comprises: determining an energy value E(i, j) of a filter bank corresponding to a jth time-frequency unit in an ith frame of speech data and a smoothing parameter s j corresponding to each sub-band; obtaining a smoothed energy value M(i−1, j) of a filter bank corresponding to a jth time-frequency unit in an (i−1)th frame of speech data; determining a smoothed energy value M(i, j) of the filter bank corresponding to the jth time-frequency unit in the ith frame of speech data according to the energy value E(i, j), the smoothing parameter s j and the smoothed energy value M(i−1, j); and determining the normalized energy feature corresponding to the jth time-frequency unit in the ith frame of speech data according to the energy value E(i, j) and the smoothed energy value M(i, j). 2. The method according to claim 1 , before performing the processing on the speech data to be recognized using a preset sub-band energy normalized acoustic model, further comprising: obtaining training speech data; training an initial sub-band energy normalized acoustic model using the training speech data, to determine the trained sub-band energy normalized acoustic model and a smoothing parameter corresponding to each sub-band. 3. The method according to claim 2 , before training the initial sub-band energy normalized acoustic model using the training speech data, further comprising: determining an initial smoothing parameter corresponding to each sub-band according to a preset rule. 4. The method according to claim 1 , wherein the pre-processing comprises at least one of a pre-emphasis processing, a framing, a Han windowing, a quick Fourier transform processing, a quadratic energy processing, a Mel filtering, a processing of taking the logarithm. 5. The method according to claim 1 , wherein obtaining the smoothed energy value M(i, j) of the filter bank corresponding to the jth time-frequency unit in the ith frame of speech data according to a formula: M(i,j)=(1−s j )M(i−1, j)+s j E(i,j). 6. The method according to claim 1 , wherein determining the normalized energy feature corresponding to the jth time-frequency unit in the ith frame of speech data according to a formula: PCEN ( i , j ) = ( E ( i , j ) ( o ‵ + M ( i , j ) ) α + σ ) γ - σ γ , where ò is a preset minimum value, E ( i , j ) ( o ‵ + M ( i , j ) ) α represents a feed forward automatic gain control, of which a strength is controlled by α, and σ and γ are square root compression parameters. 7. The method according to claim 1 , wherein determining text data corresponding to the speech data to be recognized comprises: inputting the normalized energy feature corresponding to each time-frequency unit into a neural network model; and determining the text data corresponding to the speech data to be recognized by the neural network model. 8. The method according to claim 7 , further comprising: generating the neural network model. 9. The method according to claim 8 , wherein generating the neural network model comprises: obtaining a large amount of training speech data and corresponding training text data; determining normalized energy feature corresponding to each time-frequency unit in the large amount of speech data; perform
Speech to text systems (G10L15/08 takes precedence) · CPC title
Feature extraction for speech recognition; Selection of recognition unit · CPC title
Processing in the time domain · CPC title
Training · CPC title
using subband decomposition · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.