Who is the assignee on this patent?

Baidu online network technology beijing co ltd, Shanghai Xiaodu Tech Co Ltd

What technology area does this patent fall under?

Primary CPC classification G10L15/22. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 14 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Voice processing method, apparatus and device

US11200899B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11200899-B2
Application number	US-201916674176-A
Country	US
Kind code	B2
Filing date	Nov 5, 2019
Priority date	Jan 28, 2019
Publication date	Dec 14, 2021
Grant date	Dec 14, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure provides a voice processing method, apparatus and device, where the method includes: acquiring, by a terminal device, first voice information; and acquiring, by the terminal device, response information corresponding to the first voice information, and performing an operation corresponding to the response information according to a type of the response information, where the type of the response information is at least one of a voice type, a text type, an image type, a video type, and a program operation type, which improves the flexibility of the voice processing.

First claim

Opening claim text (preview).

What is claimed is: 1. A voice processing method, comprising: acquiring, by a terminal device, first voice information; and acquiring, by the terminal device, response information corresponding to the first voice information, and performing an operation corresponding to the response information according to a type of the response information, wherein the type of the response information is at least one of a voice type, a text type, an image type, a video type, and a program operation type; wherein the acquiring, by a terminal device, first voice information, comprises: acquiring, by the terminal device, the first voice information through a first execution object; and the acquiring, by the terminal device, response information corresponding to the first voice information, and performing an operation corresponding to the response information according to a type of the response information, comprises: acquiring, by the terminal device, the response information corresponding to the first voice information through a second execution object, and performing the operation corresponding to the response information according to the type of the response information through the second execution object, wherein the first execution object and the second execution object execute in parallel. 2. The method according to claim 1 , wherein the first execution object is a first thread, and the second execution object is a second thread; or the first execution object is a first hardware processing component, and the second execution object is a second hardware processing component. 3. The method according to claim 1 , wherein the acquiring, by the terminal device, the response information corresponding to the first voice information through a second execution object, comprises: transmitting, by the terminal device, request information to a server, wherein the request information comprises voice representation information, and the voice representation information is determined according to the first voice information; and receiving, by the terminal device, the response information transmitted by the server. 4. The method according to claim 3 , wherein the voice representation information comprises the first voice information; and correspondingly, the response information is determined by the server according to the first voice information. 5. The method according to claim 3 , before the transmitting, by the terminal device, request information to a server, further comprising: acquiring, by the terminal device, text information and an audio characteristic of the first voice information; correspondingly, the voice representation information comprises the text information and the audio characteristic; the response information is determined by the server according to the text information and the audio characteristic; wherein the audio characteristic comprises at least one of a voiceprint, a volume, a length, a sound wave amplitude, and a sound wave frequency of the first voice information. 6. The method according to claim 1 , wherein the acquiring, by the terminal device, the response information corresponding to the first voice information through a second execution object, comprises: determining, by the terminal device, whether the first voice information is a real user statement through the second execution object; and when the terminal device determines that the first voice information is a real user statement, generating, by the terminal device, the response information through the second execution object. 7. The method according to claim 6 , wherein the determining, by the terminal device, whether the first voice information is a real user statement through the second execution object, comprises: acquiring, by the terminal device, characteristic information of the first voice information through the second execution object, wherein the characteristic information comprises at least one of the following: an audio characteristic of the first voice information, a text characteristic of the first voice information, text information of the first voice information, context information of the first voice information, and an interaction behavior characteristic of a user, wherein the interaction behavior characteristic is configured to indicate a behavior characteristic of the user inputting voice information in the terminal device; acquiring, by the terminal device, a probability that the first voice information is a real user statement according to the characteristic information of the first voice information; and determining, by the terminal device, whether the first voice information is a real user statement according to the probability. 8. The method according to claim 1 , wherein the performing, by the terminal device, the operation corresponding to the response information according to the type of the response information through the second execution object, comprises: when the type of the response information is a voice type, playing, by the terminal device, the response information of the voice type; when the type of the response information is a text type or an image type, displaying, by the terminal device, the response information of the text type or the image type; when the type of the response information is a video type, playing, by the terminal device, the response information of the video type; and when the type of the response information is a program operation type, performing, by the terminal device, a program operation corresponding to the response information. 9. A voice processing apparatus, comprising: at least one processor and a memory; wherein the memory stores a computer executed instruction; and the at least one processor, when executing the computer executed instruction stored in the memory, is configured to: acquire first voice information; acquire response information corresponding to the first voice information; and perform an operation corresponding to the response information according to a type of the response information, wherein the type of the response information is at least one of a voice type, a text type, an image type, a video type, and a program operation type; wherein the at least one processor is further configured to: acquire the first voice information through a first execution object; acquire the response information corresponding to the first voice information through a second execution object, wherein the first execution object and the second execution object execute in parallel; and perform the operation corresponding to the response information according to the type of the response information through the second execution object. 10. The apparatus according to claim 9 , wherein the first execution object is a first thread, and the second execution object is a second thread; or the first execution object is a first hardware processing component, and the second execution object is a second hardware processing component. 11. The apparatus according to claim 9 , wherein the at least one processor is further configured to: transmit request information to a server, wherein the request information comprises voice representation information, and the voice representation information is determined according to the first voice information; and receive the response information transmitted by the server. 12. The apparatus according to claim 11 , wherein the voice representation information comprises the first voice information; and correspondingly, the response information is determined by the server according to the first voice information. 13. The apparatus according to claim 11 , wherein the at least on

Assignees

Inventors

Classifications

G10L2015/223
Execution procedure of a spoken command · CPC title
G10L15/22Primary
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G10L15/30
Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title
G06F3/167
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
G10L15/26
Speech to text systems (G10L15/08 takes precedence) · CPC title

Patent family

Related publications grouped by family.

View patent family 66502775

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11200899B2 cover?: Embodiments of the present disclosure provides a voice processing method, apparatus and device, where the method includes: acquiring, by a terminal device, first voice information; and acquiring, by the terminal device, response information corresponding to the first voice information, and performing an operation corresponding to the response information according to a type of the response info…
Who is the assignee on this patent?: Baidu online network technology beijing co ltd, Shanghai Xiaodu Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 14 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).