Adaptive speech endpoint detector
US-10339918-B2 · Jul 2, 2019 · US
US11790893B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11790893-B2 |
| Application number | US-202017039169-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 30, 2020 |
| Priority date | Nov 22, 2019 |
| Publication date | Oct 17, 2023 |
| Grant date | Oct 17, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A voice processing method is disclosed. The voice processing method applies first and second sentence vectors extracted from first and second utterances, that are included in one dialog group and are separated from each other, to a learning model and generates an output from which at least one word having an overlapping meaning is removed. The voice processing method can be associated with an artificial intelligence module, an unmanned aerial vehicle (UAV), a robot, an augmented reality (AR) device, a virtual reality (VR) device, devices related to 5G services, and the like.
Opening claim text (preview).
What is claimed is: 1. A voice processing method for controlling an artificial intelligence device, the voice processing method comprising: in response to detecting, by a processor in the artificial intelligence device, a stop signal during a reception of a first utterance, temporarily pausing the reception of the first utterance; receiving, by the processor, a sub-utterance while the reception of the first utterance is temporarily paused; outputting, by the processor, a first result corresponding to the sub-utterance while the reception of the first utterance is temporarily paused; receiving, by the processor, a second utterance after a termination of a temporary pause state based on the stop signal; applying, by the processor, a concatenated vector concatenating first and second sentence vectors extracted from the first and second utterances to a pre-trained learning model to generate an output from which at least one word having an overlapping meaning is removed; and outputting, by the processor, a second result according to the output generated by the pre-trained learning model, the second result being different than the first result, wherein the stop signal is a voice signal corresponding to one of a hesitation word, a silent delay, or a preset temporary pause keyword or sound, wherein the artificial intelligence device is prevented from providing an answer to the first utterance while the reception of the first utterance is paused, wherein the second sentence vector is a vector concatenating a plurality of sub-vectors extracted from at least one word included in the second utterance, wherein generating the output comprises: calculating a similarity between the first sentence vector and at least one of the plurality of sub-vectors constituting the second sentence vector; and in response to determining that the first sentence vector and the at least one of the plurality of sub-vectors have the overlapping meaning based on the similarity, generating the output from which the at least one word having the overlapping meaning is removed, and wherein the at least one word having the overlapping meaning is a word corresponding to at least one of the plurality of sub-vectors having a calculated similarity that is equal to or greater than a threshold. 2. The voice processing method of claim 1 , further comprising, if the reception of the first utterance is temporarily paused, waiting for an additional voice input for the first utterance that is input before the temporary pause state. 3. The voice processing method of claim 1 , wherein the first sentence vector is a vector representing an overall content of the first utterance. 4. The voice processing method of claim 1 , wherein the first and second sentence vectors are extracted by a convolutional neural network (CNN). 5. The voice processing method of claim 1 , wherein the learning model is a learning model based on an artificial neural network, wherein the artificial neural network includes an input layer, a hidden layer, and an output layer each having at least one node. 6. The voice processing method of claim 5 , wherein the learning model is a learning model based on a recurrent neural network (RNN). 7. The voice processing method of claim 5 , wherein at least some nodes in the artificial neural network have different weights in order to generate the output. 8. The voice processing method of claim 1 , wherein the second utterance is an utterance belonging to a same dialog group as the first utterance. 9. A non-transitory computer readable recording medium on which a program for implementing the method according to claim 1 is recorded. 10. A voice processing method for controlling an artificial intelligence device, the voice processing method comprising: in response to detecting, by a processor in the artificial intelligence device, a stop signal while a first utterance is transmitted to a server, temporarily pausing transmission of the first utterance; receiving, by the processor, a sub-utterance while the transmission of the first utterance is temporarily paused; outputting, by the processor, a first result corresponding to the sub-utterance while the transmission of the first utterance is temporarily paused; transmitting, by the processor, a second utterance to the server after a termination of a temporary pause state based on the stop signal; applying, by the processor, a concatenated vector concatenating first and second sentence vectors extracted from the first and second utterances to a pre-trained learning model and receiving, from the server, an output from which at least one word having an overlapping meaning is removed; and outputting, by the processor, a second result according to the output from the server, the second result being different than the first result, wherein the stop signal is a voice signal corresponding to one of a hesitation word, a silent delay, or a preset temporary pause keyword or sound, wherein the artificial intelligence device is prevented from providing an answer to the first utterance while the reception of the first utterance is paused, wherein the second sentence vector is a vector concatenating a plurality of sub-vectors extracted from at least one word included in the second utterance, wherein generating the output comprises: calculating a similarity between the first sentence vector and at least one of the plurality of sub-vectors constituting the second sentence vector; and in response to determining that the first sentence vector and the at least one of the plurality of sub-vectors have the overlapping meaning based on the similarity, generating the output from which the at least one word having the overlapping meaning is removed, and wherein the at least one word having the overlapping meaning is a word corresponding to at least one of the plurality of sub-vectors having a calculated similarity that is equal to or greater than a threshold. 11. The voice processing method of claim 10 , further comprising: receiving, from a network, downlink control information (DCI) used to schedule the transmission of the first and second utterances; and transmitting the first and second utterances to the network based on the DCI. 12. The voice processing method of claim 11 , further comprising: performing an initial access procedure with the network based on a synchronization signal block (SSB); and transmitting the first and second utterances to the network via a physical uplink shared channel (PUSCH), wherein the SSB and a demodulation reference signal (DM-RS) of the PUSCH are QCLed for QCL (quasi co-located) type D. 13. The voice processing method of claim 12 , further comprising: controlling a communication module to transmit the first and second utterances to an AI processor included in the network; and controlling the communication module to receive AI processing information from the AI processor, wherein the AI processing information is voice information synthesized based on the output from which the at least one word having the overlapping meaning is removed. 14. An artificial intelligence device for voice processing, comprising: a memory configured to store utterances from a user; and a processor configured to: detect a stop signal during a reception of a first utterance, and temporarily pause the reception of the first utterance, receive a sub-utterance while the reception of the first utterance is temporarily paused, output a first result corresponding to the sub-utterance while the reception of the first utterance is temporarily paused, receive a second utterance after a termination of a temp
using artificial neural networks · CPC title
of uplink data flows · CPC title
in the downlink direction of a wireless link, i.e. towards a terminal · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
specially adapted for particular use · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.