Control method and control apparatus for speech interaction

US11615784B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11615784-B2
Application numberUS-202017118869-A
CountryUS
Kind codeB2
Filing dateDec 11, 2020
Priority dateJun 30, 2020
Publication dateMar 28, 2023
Grant dateMar 28, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure discloses a control method and a control apparatus for speech interaction. The detailed implementation solution of the control method for the speech interaction includes: collecting an audio signal; detecting a wake-up word in the audio signal to obtain a wake-up word result; and playing a prompt tone and/or executing a speech instruction in the audio signal based on the wake-up word result.

First claim

Opening claim text (preview).

What is claimed is: 1. A control method for speech interaction, comprising: collecting an audio signal; detecting a wake-up word in the audio signal to obtain a wake-up word result; and playing a prompt tone and/or executing a speech instruction in the audio signal based on the wake-up word result, wherein the wake-up word result comprises a first confidence and a second confidence, the first confidence is configured to represent a reliability that the audio signal comprises a target wake-up word, the second confidence is configured to represent a reliability that the audio signal comprises an ordinary wake-up word, the number of syllables of the target wake-up word is no more than that of the ordinary wake-up word, and playing the prompt tone and/or executing the speech instruction in the audio signal based on the wake-up word result comprises: executing the speech instruction in a case that the first confidence reaches a first confidence threshold; playing the prompt tone in a case that the second confidence reaches a second confidence threshold and the first confidence fails to reach the first confidence threshold. 2. The method of claim 1 , wherein before or when executing the speech instruction in the audio signal based on the wake-up word result, the method further comprises: withholding from playing the prompt tone. 3. The method of claim 1 , wherein the ordinary wake-up word comprises at least one target wake-up word; and detecting the wake-up word in the audio signal comprises: performing a primary detection on the target wake-up word in the audio signal by employing a wake-up word detection model to obtain a first detection result; performing a secondary detection on the target wake-up word within a set period after the primary detection to obtain a second detection result; and determining the first confidence and the second confidence based on the first detection result and the second detection result. 4. The method of claim 1 , wherein the speech instruction is obtained by detecting a part subsequent to the wake-up word in the audio signal. 5. The method of claim 1 , wherein the method is executed by a speech interaction terminal; and executing the speech instruction in the case that the first confidence reaches the first confidence threshold comprises: sending the audio signal comprising the target wake-up word and the speech instruction subsequent to the target wake-up word to a server in the case that the first confidence reaches the first confidence threshold, such that the server detects the wake-up word at a front part of the audio signal and the speech instruction subsequent to the wake-up word; and obtaining the speech instruction from the server and executing the speech instruction. 6. The method of claim 1 , wherein the target wake-up word is a word with less than four syllables; and the ordinary wake-up word is a word with four or more syllables. 7. The method of claim 1 , wherein the number of syllables of the target wake-up word is same as that of the ordinary wake-up word. 8. A method for controlling a speech interaction, comprising: obtaining an audio signal; detecting a wake-up word at a front part of the audio signal and detecting a speech instruction subsequent to the wake-up word, to obtain a wake-up word result and a speech instruction result; and controlling a speech interaction terminal to play a prompt tone and/or to execute the speech instruction based on at least one of the wake-up word result and the speech instruction result, wherein the wake-up word result comprises a third confidence and a fourth confidence, the third confidence is configured to represent a reliability that the front part of the audio signal comprises a target wake-up word, the fourth confidence is configured to represent a reliability that the front part of the audio signal comprises an ordinary wake-up word, the number of syllables of the target wake-up word is no more than that of the ordinary wake-up word, and controlling the speech interaction terminal to play the prompt tone and/or to execute the speech instruction based on the at least one of the wake-up word result and the speech instruction result comprises: controlling the speech interaction terminal to execute the speech instruction based on the speech instruction result in a case that the third confidence reaches a third confidence threshold; controlling the speech interaction terminal to play the prompt tone in a case that the third confidence fails to reach the third confidence threshold; controlling the speech interaction terminal to execute the speech instruction and/or to play the prompt tone based on the speech instruction result in a case that the fourth confidence reaches a fourth confidence threshold; and controlling the speech interaction terminal to send a dummy instruction in a case that the fourth confidence fails to reach the fourth confidence threshold and the third confidence fails to reach the third confidence threshold. 9. The method of claim 8 , wherein detecting the wake-up word at the front part of the audio signal and detecting the speech instruction subsequent to the wake-up word, to obtain the wake-up word result and the speech instruction result comprise: performing wake-up word detection on a front part of a recognition text of the audio signal to obtain a wake-up word detection result of the front part; determining an interaction confidence of the audio signal based on at least one of an acoustic feature representation of the audio signal and a textual feature representation associated with the recognition text of the audio signal, the interaction confidence indicating a reliability that the audio signal is taken as the speech instruction for interacting with the speech interaction terminal; determining a match condition between the recognition text and the audio signal, the match condition indicating a level that the recognition text correctly reflects information comprised in the audio signal; and obtaining the wake-up word result and the speech instruction result based on the interaction confidence, the match condition and the wake-up word detection result of the front part. 10. The method of claim 8 , wherein the method is executed by a server; and obtaining the audio signal comprises: receiving the audio signal sent by the speech interaction terminal. 11. A control apparatus for speech interaction, comprising: a non-transitory computer-readable medium including computer-executable instructions stored thereon, and an instruction execution system which is configured by the instructions to implement at least one of: a collecting module, configured to collect an audio signal; a detecting module, configured to detect a wake-up word in the audio signal to obtain a wake-up word result; and an executing module, configured to play a prompt tone and/or to execute a speech instruction in the audio signal based on the wake-up word result, the wake-up word result comprises a first confidence and a second confidence, the first confidence is configured to represent a reliability that the audio signal comprises a target wake-up word, the second confidence is configured to represent a reliability that the audio signal comprises an ordinary wake-up word, the number of syllables of the target wake-up word is no more than that of the ordinary wake-up word, and the executing module comprises: an instruction executing module, configured to execute the speech instruction in a case that the first confidence reaches a first confidence threshold; and a playing module, configured to play the prompt tone in a case that the second confidence reaches the second confidence thresho

Assignees

Inventors

Classifications

  • Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title

  • Word spotting · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Feedback of the input speech · CPC title

  • G06F3/167Primary

    Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11615784B2 cover?
The present disclosure discloses a control method and a control apparatus for speech interaction. The detailed implementation solution of the control method for the speech interaction includes: collecting an audio signal; detecting a wake-up word in the audio signal to obtain a wake-up word result; and playing a prompt tone and/or executing a speech instruction in the audio signal based on the …
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).