Who is the assignee on this patent?

Baidu online network technology beijing co ltd, Baidu Online Network Tech Geijing Co Ltd

What technology area does this patent fall under?

Primary CPC classification G10L15/02. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 25 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Speech recognition method based on artificial intelligence and terminal

Patent metadata
Field	Value
Publication number	US-10573294-B2
Application number	US-201715858112-A
Country	US
Kind code	B2
Filing date	Dec 29, 2017
Priority date	Jun 5, 2017
Publication date	Feb 25, 2020
Grant date	Feb 25, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure provide a speech recognition method based on artificial intelligence, and a terminal. The method includes obtaining speech data to be recognized; performing a processing on the speech data to be recognized using a trained sub-band energy normalized acoustic model, to determine an normalized energy feature corresponding to each time-frequency unit in the speech data to be recognized; and determining text data corresponding to the speech data to be recognized according to the normalized energy feature corresponding to each time-frequency unit.

First claim

Opening claim text (preview).

What is claimed is: 1. A speech recognition method based on artificial intelligence, comprising: obtaining speech data to be recognized; performing a processing on the speech data to be recognized using a trained sub-band energy normalized acoustic model, to determine a normalized energy feature corresponding to each time-frequency unit in the speech data to be recognized; and determining text data corresponding to the speech data to be recognized according to the normalized energy feature corresponding to each time-frequency unit; wherein before performing the processing on the speech data to be recognized using the trained sub-band energy normalized acoustic model, the method further comprises: performing a pre-processing on the speech data to be recognized, to determine an energy value of a filter bank corresponding to each time-frequency unit in the speech data to be recognized, wherein performing the processing on the speech data to be recognized using the trained sub-band energy normalized acoustic model comprises: performing the processing on the energy value of the filter bank corresponding to each time-frequency unit in the speech data to be recognized using the trained sub-band energy normalized acoustic model, wherein determining the normalized energy feature corresponding to each time-frequency unit in the speech data to be recognized comprises: determining an energy value E(i, j) of a filter bank corresponding to a jth time-frequency unit in an ith frame of speech data and a smoothing parameter s j corresponding to each sub-band; obtaining a smoothed energy value M(i−1, j) of a filter bank corresponding to a jth time-frequency unit in an (i−1)th frame of speech data; determining a smoothed energy value M(i, j) of the filter bank corresponding to the jth time-frequency unit in the ith frame of speech data according to the energy value E(i, j), the smoothing parameter s j and the smoothed energy value M(i−1, j); and determining the normalized energy feature corresponding to the jth time-frequency unit in the ith frame of speech data according to the energy value E(i, j) and the smoothed energy value M(i, j). 2. The method according to claim 1 , before performing the processing on the speech data to be recognized using a preset sub-band energy normalized acoustic model, further comprising: obtaining training speech data; training an initial sub-band energy normalized acoustic model using the training speech data, to determine the trained sub-band energy normalized acoustic model and a smoothing parameter corresponding to each sub-band. 3. The method according to claim 2 , before training the initial sub-band energy normalized acoustic model using the training speech data, further comprising: determining an initial smoothing parameter corresponding to each sub-band according to a preset rule. 4. The method according to claim 1 , wherein the pre-processing comprises at least one of a pre-emphasis processing, a framing, a Han windowing, a quick Fourier transform processing, a quadratic energy processing, a Mel filtering, a processing of taking the logarithm. 5. The method according to claim 1 , wherein obtaining the smoothed energy value M(i, j) of the filter bank corresponding to the jth time-frequency unit in the ith frame of speech data according to a formula: M(i,j)=(1−s j )M(i−1, j)+s j E(i,j). 6. The method according to claim 1 , wherein determining the normalized energy feature corresponding to the jth time-frequency unit in the ith frame of speech data according to a formula: PCEN ⁡ ( i , j ) = ( E ⁡ ( i , j ) ( o ‵ + M ⁡ ( i , j ) ) α + σ ) γ - σ γ , where ò is a preset minimum value, E ⁡ ( i , j ) ( o ‵ + M ⁡ ( i , j ) ) α represents a feed forward automatic gain control, of which a strength is controlled by α, and σ and γ are square root compression parameters. 7. The method according to claim 1 , wherein determining text data corresponding to the speech data to be recognized comprises: inputting the normalized energy feature corresponding to each time-frequency unit into a neural network model; and determining the text data corresponding to the speech data to be recognized by the neural network model. 8. The method according to claim 7 , further comprising: generating the neural network model. 9. The method according to claim 8 , wherein generating the neural network model comprises: obtaining a large amount of training speech data and corresponding training text data; determining normalized energy feature corresponding to each time-frequency unit in the large amount of speech data; perform

Assignees

Inventors

Classifications

G10L15/26
Speech to text systems (G10L15/08 takes precedence) · CPC title
G10L15/02Primary
Feature extraction for speech recognition; Selection of recognition unit · CPC title
G10L21/0224
Processing in the time domain · CPC title
G10L15/063
Training · CPC title
G10L19/0204
using subband decomposition · CPC title

Patent family

Related publications grouped by family.

View patent family 60254470

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10573294B2 cover?: Embodiments of the present disclosure provide a speech recognition method based on artificial intelligence, and a terminal. The method includes obtaining speech data to be recognized; performing a processing on the speech data to be recognized using a trained sub-band energy normalized acoustic model, to determine an normalized energy feature corresponding to each time-frequency unit in the spe…
Who is the assignee on this patent?: Baidu online network technology beijing co ltd, Baidu Online Network Tech Geijing Co Ltd
What technology area does this patent fall under?: Primary CPC classification G10L15/02. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 25 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).