Who is the assignee on this patent?

Baidu online network technology beijing co ltd

What technology area does this patent fall under?

Primary CPC classification G06N3/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 23 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and device for processing speech based on artificial intelligence

US10360899B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10360899-B2
Application number	US-201715714820-A
Country	US
Kind code	B2
Filing date	Sep 25, 2017
Priority date	Mar 24, 2017
Publication date	Jul 23, 2019
Grant date	Jul 23, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a method and a device for processing a speech based on artificial intelligence. The method includes: receiving a speech processing request, in which the speech processing request includes a first digital speech signal and a first sample frequency corresponding to the first digital speech signal; selecting a target speech processing model from a pre-trained speech processing model base according to the first sample frequency; performing up-sampling processing on the first digital speech signal using the target speech processing model to generate a second digital speech signal having a second sample frequency, in which the second sample frequency is larger than the first sample frequency.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for processing a speech based on artificial intelligence, comprising: receiving a speech processing request, wherein the speech processing request comprises a first digital speech signal and a first sample frequency corresponding to the first digital speech signal; selecting a target speech processing model from a pre-trained speech processing model base according to the first sample frequency; and performing up-sampling processing on the first digital speech signal using the target speech processing model to generate a second digital speech signal having a second sample frequency, wherein the second sample frequency is larger than the first sample frequency; before selecting a target speech processing model, further comprising: obtaining a training data sequence, wherein the training data sequence comprises a plurality of digital speech data sample pairs, each of the plurality of digital speech data sample pairs comprises a sample having the first sample frequency and a sample having the second sample frequency corresponding to a same speech; and training a preset deep neural network model using the training data sequence to generate the target speech processing model. 2. The method according to claim 1 , wherein the speech processing request further comprises the second sample frequency, and selecting a target speech processing model comprises: selecting the target speech processing model according to the first sample frequency and the second sample frequency. 3. The method according to claim 1 , wherein training a preset deep neural network model using the training data sequence to generate the target speech processing model comprises: selecting a first digital speech data sample pair from the training data sequence according to a preset rule; performing pre-processing on the first digital speech data sample pair, and obtaining information of sampling points having the first sample frequency and information of sampling points having the second sample frequency; inputting the information of the sampling points having the first sample frequency to the preset deep neural network model to generate prediction information of the sampling points having the second sample frequency; determining a correction coefficient according to a difference between the information of the sampling points having the second sample frequency and the prediction information of the sampling points having the second sample frequency; performing a correction on the preset deep neural network model according to the correction coefficient to generate a first speech processing model; and selecting a second digital speech data sample pair from the training data sequence according to the preset rule, performing a correction on the first speech processing model using the second digital speech data sample pair, and repeating above steps until the target speech processing model is determined. 4. The method according to claim 1 , before obtaining a training data sequence, further comprising: performing sampling processing on a plurality of pieces of speech data in a speech data base respectively with the second sample frequency to obtain a sample sequence having the second sample frequency; and extracting a sample sequence having the first sample frequency from the sample sequence having the second sample frequency. 5. The method according to claim 3 , before performing up-sampling processing on the first digital speech signal using the target speech processing model, further comprising: pre-processing the first digital speech signal, to obtain the information of the sampling points having the first sample frequency. 6. A method for processing a speech based on artificial intelligence, comprising: receiving a speech processing request, wherein the speech processing request comprises a first digital speech signal and a first sample frequency corresponding to the first digital speech signal; selecting a target speech processing model from a pre-trained speech processing model base according to the first sample frequency; and performing up-sampling processing on the first digital speech signal using the target speech processing model to generate a second digital speech signal having a second sample frequency, wherein the second sample frequency is larger than the first sample frequency; wherein performing up-sampling processing on the first digital speech signal using the target speech processing model to generate a second digital speech signal having a second sample frequency comprises: performing the up-sampling processing on the first digital speech signal to generate information of sampling points of the second digital speech signal having the second sample frequency; and generating the second digital speech signal having the second sample frequency according to the information of sampling points of the second digital speech signal having the second sample frequency. 7. The method according to claim 5 , wherein performing up-sampling processing on the first digital speech signal using the target speech processing model to generate a second digital speech signal having a second sample frequency comprises: processing the first digital speech signal by interpolating and training using the target speech processing model, to generate information of sampling points to be interpolated; and forming the second digital speech signal having the second sample frequency according to the information of the sampling points having the first sample frequency and the information of sampling points to be interpolated. 8. The method according to claim 3 , wherein the information of a sampling point is a wave amplitude of a speech signal corresponding to the sampling point. 9. A device for processing a speech based on artificial intelligence, comprising: a memory having computer programs executable by the processor; and a processor; wherein the processor is configured to: receive a speech processing request, wherein the speech processing request comprises a first digital speech signal and a first sample frequency corresponding to the first digital speech signal; obtain a training data sequence, wherein the training data sequence comprises a plurality of digital speech data sample pairs, each of the plurality of digital speech data sample pairs comprises a sample having the first sample frequency and a sample having the second sample frequency corresponding to a same speech; train a preset deep neural network model using the training data sequence to generate the target speech processing model; select a target speech processing model from a pre-trained speech processing model base according to the first sample frequency; and perform up-sampling processing on the first digital speech signal using the target speech processing model to generate a second digital speech signal having a second sample frequency, wherein the second sample frequency is larger than the first sample frequency. 10. The device according to claim 9 , wherein the speech processing request further comprises the second sample frequency, and the processor is configured to select a target speech processing model by selecting the target speech processing model according to the first sample frequency and the second sample frequency. 11. The device according to claim 9 , wherein the processor is configured to train a preset deep neural network model using the training data sequence to generate the target speech processing model by: selecting a first digital speech data sample pair from the training data sequence according to a preset rule; performing pre-processing on the first digital speech data sample pair, and obtain

Assignees

Baidu online network technology beijing co ltd

Inventors

Classifications

G06N3/047
Probabilistic or stochastic networks · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/08Primary
Learning methods · CPC title
G10L25/30Primary
using neural networks · CPC title
G10L15/16
using artificial neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 59431552

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10360899B2 cover?: The present disclosure provides a method and a device for processing a speech based on artificial intelligence. The method includes: receiving a speech processing request, in which the speech processing request includes a first digital speech signal and a first sample frequency corresponding to the first digital speech signal; selecting a target speech processing model from a pre-trained speech…
Who is the assignee on this patent?: Baidu online network technology beijing co ltd
What technology area does this patent fall under?: Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 23 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Restructuring deep neural network acoustic models

Building conversational understanding systems using a toolset

Estimating a pitch lag

System and method of using neural transforms of robust audio features for speech processing

Frequently asked questions