Conditional wake word eventing based on environment
US-11361756-B2 · Jun 14, 2022 · US
US11615784B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11615784-B2 |
| Application number | US-202017118869-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 11, 2020 |
| Priority date | Jun 30, 2020 |
| Publication date | Mar 28, 2023 |
| Grant date | Mar 28, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure discloses a control method and a control apparatus for speech interaction. The detailed implementation solution of the control method for the speech interaction includes: collecting an audio signal; detecting a wake-up word in the audio signal to obtain a wake-up word result; and playing a prompt tone and/or executing a speech instruction in the audio signal based on the wake-up word result.
Opening claim text (preview).
What is claimed is: 1. A control method for speech interaction, comprising: collecting an audio signal; detecting a wake-up word in the audio signal to obtain a wake-up word result; and playing a prompt tone and/or executing a speech instruction in the audio signal based on the wake-up word result, wherein the wake-up word result comprises a first confidence and a second confidence, the first confidence is configured to represent a reliability that the audio signal comprises a target wake-up word, the second confidence is configured to represent a reliability that the audio signal comprises an ordinary wake-up word, the number of syllables of the target wake-up word is no more than that of the ordinary wake-up word, and playing the prompt tone and/or executing the speech instruction in the audio signal based on the wake-up word result comprises: executing the speech instruction in a case that the first confidence reaches a first confidence threshold; playing the prompt tone in a case that the second confidence reaches a second confidence threshold and the first confidence fails to reach the first confidence threshold. 2. The method of claim 1 , wherein before or when executing the speech instruction in the audio signal based on the wake-up word result, the method further comprises: withholding from playing the prompt tone. 3. The method of claim 1 , wherein the ordinary wake-up word comprises at least one target wake-up word; and detecting the wake-up word in the audio signal comprises: performing a primary detection on the target wake-up word in the audio signal by employing a wake-up word detection model to obtain a first detection result; performing a secondary detection on the target wake-up word within a set period after the primary detection to obtain a second detection result; and determining the first confidence and the second confidence based on the first detection result and the second detection result. 4. The method of claim 1 , wherein the speech instruction is obtained by detecting a part subsequent to the wake-up word in the audio signal. 5. The method of claim 1 , wherein the method is executed by a speech interaction terminal; and executing the speech instruction in the case that the first confidence reaches the first confidence threshold comprises: sending the audio signal comprising the target wake-up word and the speech instruction subsequent to the target wake-up word to a server in the case that the first confidence reaches the first confidence threshold, such that the server detects the wake-up word at a front part of the audio signal and the speech instruction subsequent to the wake-up word; and obtaining the speech instruction from the server and executing the speech instruction. 6. The method of claim 1 , wherein the target wake-up word is a word with less than four syllables; and the ordinary wake-up word is a word with four or more syllables. 7. The method of claim 1 , wherein the number of syllables of the target wake-up word is same as that of the ordinary wake-up word. 8. A method for controlling a speech interaction, comprising: obtaining an audio signal; detecting a wake-up word at a front part of the audio signal and detecting a speech instruction subsequent to the wake-up word, to obtain a wake-up word result and a speech instruction result; and controlling a speech interaction terminal to play a prompt tone and/or to execute the speech instruction based on at least one of the wake-up word result and the speech instruction result, wherein the wake-up word result comprises a third confidence and a fourth confidence, the third confidence is configured to represent a reliability that the front part of the audio signal comprises a target wake-up word, the fourth confidence is configured to represent a reliability that the front part of the audio signal comprises an ordinary wake-up word, the number of syllables of the target wake-up word is no more than that of the ordinary wake-up word, and controlling the speech interaction terminal to play the prompt tone and/or to execute the speech instruction based on the at least one of the wake-up word result and the speech instruction result comprises: controlling the speech interaction terminal to execute the speech instruction based on the speech instruction result in a case that the third confidence reaches a third confidence threshold; controlling the speech interaction terminal to play the prompt tone in a case that the third confidence fails to reach the third confidence threshold; controlling the speech interaction terminal to execute the speech instruction and/or to play the prompt tone based on the speech instruction result in a case that the fourth confidence reaches a fourth confidence threshold; and controlling the speech interaction terminal to send a dummy instruction in a case that the fourth confidence fails to reach the fourth confidence threshold and the third confidence fails to reach the third confidence threshold. 9. The method of claim 8 , wherein detecting the wake-up word at the front part of the audio signal and detecting the speech instruction subsequent to the wake-up word, to obtain the wake-up word result and the speech instruction result comprise: performing wake-up word detection on a front part of a recognition text of the audio signal to obtain a wake-up word detection result of the front part; determining an interaction confidence of the audio signal based on at least one of an acoustic feature representation of the audio signal and a textual feature representation associated with the recognition text of the audio signal, the interaction confidence indicating a reliability that the audio signal is taken as the speech instruction for interacting with the speech interaction terminal; determining a match condition between the recognition text and the audio signal, the match condition indicating a level that the recognition text correctly reflects information comprised in the audio signal; and obtaining the wake-up word result and the speech instruction result based on the interaction confidence, the match condition and the wake-up word detection result of the front part. 10. The method of claim 8 , wherein the method is executed by a server; and obtaining the audio signal comprises: receiving the audio signal sent by the speech interaction terminal. 11. A control apparatus for speech interaction, comprising: a non-transitory computer-readable medium including computer-executable instructions stored thereon, and an instruction execution system which is configured by the instructions to implement at least one of: a collecting module, configured to collect an audio signal; a detecting module, configured to detect a wake-up word in the audio signal to obtain a wake-up word result; and an executing module, configured to play a prompt tone and/or to execute a speech instruction in the audio signal based on the wake-up word result, the wake-up word result comprises a first confidence and a second confidence, the first confidence is configured to represent a reliability that the audio signal comprises a target wake-up word, the second confidence is configured to represent a reliability that the audio signal comprises an ordinary wake-up word, the number of syllables of the target wake-up word is no more than that of the ordinary wake-up word, and the executing module comprises: an instruction executing module, configured to execute the speech instruction in a case that the first confidence reaches a first confidence threshold; and a playing module, configured to play the prompt tone in a case that the second confidence reaches the second confidence thresho
Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title
Word spotting · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Feedback of the input speech · CPC title
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.