Method and device for processing speech based on artificial intelligence

US10360899B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10360899-B2
Application numberUS-201715714820-A
CountryUS
Kind codeB2
Filing dateSep 25, 2017
Priority dateMar 24, 2017
Publication dateJul 23, 2019
Grant dateJul 23, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a method and a device for processing a speech based on artificial intelligence. The method includes: receiving a speech processing request, in which the speech processing request includes a first digital speech signal and a first sample frequency corresponding to the first digital speech signal; selecting a target speech processing model from a pre-trained speech processing model base according to the first sample frequency; performing up-sampling processing on the first digital speech signal using the target speech processing model to generate a second digital speech signal having a second sample frequency, in which the second sample frequency is larger than the first sample frequency.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for processing a speech based on artificial intelligence, comprising: receiving a speech processing request, wherein the speech processing request comprises a first digital speech signal and a first sample frequency corresponding to the first digital speech signal; selecting a target speech processing model from a pre-trained speech processing model base according to the first sample frequency; and performing up-sampling processing on the first digital speech signal using the target speech processing model to generate a second digital speech signal having a second sample frequency, wherein the second sample frequency is larger than the first sample frequency; before selecting a target speech processing model, further comprising: obtaining a training data sequence, wherein the training data sequence comprises a plurality of digital speech data sample pairs, each of the plurality of digital speech data sample pairs comprises a sample having the first sample frequency and a sample having the second sample frequency corresponding to a same speech; and training a preset deep neural network model using the training data sequence to generate the target speech processing model. 2. The method according to claim 1 , wherein the speech processing request further comprises the second sample frequency, and selecting a target speech processing model comprises: selecting the target speech processing model according to the first sample frequency and the second sample frequency. 3. The method according to claim 1 , wherein training a preset deep neural network model using the training data sequence to generate the target speech processing model comprises: selecting a first digital speech data sample pair from the training data sequence according to a preset rule; performing pre-processing on the first digital speech data sample pair, and obtaining information of sampling points having the first sample frequency and information of sampling points having the second sample frequency; inputting the information of the sampling points having the first sample frequency to the preset deep neural network model to generate prediction information of the sampling points having the second sample frequency; determining a correction coefficient according to a difference between the information of the sampling points having the second sample frequency and the prediction information of the sampling points having the second sample frequency; performing a correction on the preset deep neural network model according to the correction coefficient to generate a first speech processing model; and selecting a second digital speech data sample pair from the training data sequence according to the preset rule, performing a correction on the first speech processing model using the second digital speech data sample pair, and repeating above steps until the target speech processing model is determined. 4. The method according to claim 1 , before obtaining a training data sequence, further comprising: performing sampling processing on a plurality of pieces of speech data in a speech data base respectively with the second sample frequency to obtain a sample sequence having the second sample frequency; and extracting a sample sequence having the first sample frequency from the sample sequence having the second sample frequency. 5. The method according to claim 3 , before performing up-sampling processing on the first digital speech signal using the target speech processing model, further comprising: pre-processing the first digital speech signal, to obtain the information of the sampling points having the first sample frequency. 6. A method for processing a speech based on artificial intelligence, comprising: receiving a speech processing request, wherein the speech processing request comprises a first digital speech signal and a first sample frequency corresponding to the first digital speech signal; selecting a target speech processing model from a pre-trained speech processing model base according to the first sample frequency; and performing up-sampling processing on the first digital speech signal using the target speech processing model to generate a second digital speech signal having a second sample frequency, wherein the second sample frequency is larger than the first sample frequency; wherein performing up-sampling processing on the first digital speech signal using the target speech processing model to generate a second digital speech signal having a second sample frequency comprises: performing the up-sampling processing on the first digital speech signal to generate information of sampling points of the second digital speech signal having the second sample frequency; and generating the second digital speech signal having the second sample frequency according to the information of sampling points of the second digital speech signal having the second sample frequency. 7. The method according to claim 5 , wherein performing up-sampling processing on the first digital speech signal using the target speech processing model to generate a second digital speech signal having a second sample frequency comprises: processing the first digital speech signal by interpolating and training using the target speech processing model, to generate information of sampling points to be interpolated; and forming the second digital speech signal having the second sample frequency according to the information of the sampling points having the first sample frequency and the information of sampling points to be interpolated. 8. The method according to claim 3 , wherein the information of a sampling point is a wave amplitude of a speech signal corresponding to the sampling point. 9. A device for processing a speech based on artificial intelligence, comprising: a memory having computer programs executable by the processor; and a processor; wherein the processor is configured to: receive a speech processing request, wherein the speech processing request comprises a first digital speech signal and a first sample frequency corresponding to the first digital speech signal; obtain a training data sequence, wherein the training data sequence comprises a plurality of digital speech data sample pairs, each of the plurality of digital speech data sample pairs comprises a sample having the first sample frequency and a sample having the second sample frequency corresponding to a same speech; train a preset deep neural network model using the training data sequence to generate the target speech processing model; select a target speech processing model from a pre-trained speech processing model base according to the first sample frequency; and perform up-sampling processing on the first digital speech signal using the target speech processing model to generate a second digital speech signal having a second sample frequency, wherein the second sample frequency is larger than the first sample frequency. 10. The device according to claim 9 , wherein the speech processing request further comprises the second sample frequency, and the processor is configured to select a target speech processing model by selecting the target speech processing model according to the first sample frequency and the second sample frequency. 11. The device according to claim 9 , wherein the processor is configured to train a preset deep neural network model using the training data sequence to generate the target speech processing model by: selecting a first digital speech data sample pair from the training data sequence according to a preset rule; performing pre-processing on the first digital speech data sample pair, and obtain

Assignees

Inventors

Classifications

  • Probabilistic or stochastic networks · CPC title

  • Combinations of networks · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • G10L25/30Primary

    using neural networks · CPC title

  • using artificial neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10360899B2 cover?
The present disclosure provides a method and a device for processing a speech based on artificial intelligence. The method includes: receiving a speech processing request, in which the speech processing request includes a first digital speech signal and a first sample frequency corresponding to the first digital speech signal; selecting a target speech processing model from a pre-trained speech…
Who is the assignee on this patent?
Baidu online network technology beijing co ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 23 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).