Voice processing method based on artificial intelligence

US11790893B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11790893-B2
Application numberUS-202017039169-A
CountryUS
Kind codeB2
Filing dateSep 30, 2020
Priority dateNov 22, 2019
Publication dateOct 17, 2023
Grant dateOct 17, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A voice processing method is disclosed. The voice processing method applies first and second sentence vectors extracted from first and second utterances, that are included in one dialog group and are separated from each other, to a learning model and generates an output from which at least one word having an overlapping meaning is removed. The voice processing method can be associated with an artificial intelligence module, an unmanned aerial vehicle (UAV), a robot, an augmented reality (AR) device, a virtual reality (VR) device, devices related to 5G services, and the like.

First claim

Opening claim text (preview).

What is claimed is: 1. A voice processing method for controlling an artificial intelligence device, the voice processing method comprising: in response to detecting, by a processor in the artificial intelligence device, a stop signal during a reception of a first utterance, temporarily pausing the reception of the first utterance; receiving, by the processor, a sub-utterance while the reception of the first utterance is temporarily paused; outputting, by the processor, a first result corresponding to the sub-utterance while the reception of the first utterance is temporarily paused; receiving, by the processor, a second utterance after a termination of a temporary pause state based on the stop signal; applying, by the processor, a concatenated vector concatenating first and second sentence vectors extracted from the first and second utterances to a pre-trained learning model to generate an output from which at least one word having an overlapping meaning is removed; and outputting, by the processor, a second result according to the output generated by the pre-trained learning model, the second result being different than the first result, wherein the stop signal is a voice signal corresponding to one of a hesitation word, a silent delay, or a preset temporary pause keyword or sound, wherein the artificial intelligence device is prevented from providing an answer to the first utterance while the reception of the first utterance is paused, wherein the second sentence vector is a vector concatenating a plurality of sub-vectors extracted from at least one word included in the second utterance, wherein generating the output comprises: calculating a similarity between the first sentence vector and at least one of the plurality of sub-vectors constituting the second sentence vector; and in response to determining that the first sentence vector and the at least one of the plurality of sub-vectors have the overlapping meaning based on the similarity, generating the output from which the at least one word having the overlapping meaning is removed, and wherein the at least one word having the overlapping meaning is a word corresponding to at least one of the plurality of sub-vectors having a calculated similarity that is equal to or greater than a threshold. 2. The voice processing method of claim 1 , further comprising, if the reception of the first utterance is temporarily paused, waiting for an additional voice input for the first utterance that is input before the temporary pause state. 3. The voice processing method of claim 1 , wherein the first sentence vector is a vector representing an overall content of the first utterance. 4. The voice processing method of claim 1 , wherein the first and second sentence vectors are extracted by a convolutional neural network (CNN). 5. The voice processing method of claim 1 , wherein the learning model is a learning model based on an artificial neural network, wherein the artificial neural network includes an input layer, a hidden layer, and an output layer each having at least one node. 6. The voice processing method of claim 5 , wherein the learning model is a learning model based on a recurrent neural network (RNN). 7. The voice processing method of claim 5 , wherein at least some nodes in the artificial neural network have different weights in order to generate the output. 8. The voice processing method of claim 1 , wherein the second utterance is an utterance belonging to a same dialog group as the first utterance. 9. A non-transitory computer readable recording medium on which a program for implementing the method according to claim 1 is recorded. 10. A voice processing method for controlling an artificial intelligence device, the voice processing method comprising: in response to detecting, by a processor in the artificial intelligence device, a stop signal while a first utterance is transmitted to a server, temporarily pausing transmission of the first utterance; receiving, by the processor, a sub-utterance while the transmission of the first utterance is temporarily paused; outputting, by the processor, a first result corresponding to the sub-utterance while the transmission of the first utterance is temporarily paused; transmitting, by the processor, a second utterance to the server after a termination of a temporary pause state based on the stop signal; applying, by the processor, a concatenated vector concatenating first and second sentence vectors extracted from the first and second utterances to a pre-trained learning model and receiving, from the server, an output from which at least one word having an overlapping meaning is removed; and outputting, by the processor, a second result according to the output from the server, the second result being different than the first result, wherein the stop signal is a voice signal corresponding to one of a hesitation word, a silent delay, or a preset temporary pause keyword or sound, wherein the artificial intelligence device is prevented from providing an answer to the first utterance while the reception of the first utterance is paused, wherein the second sentence vector is a vector concatenating a plurality of sub-vectors extracted from at least one word included in the second utterance, wherein generating the output comprises: calculating a similarity between the first sentence vector and at least one of the plurality of sub-vectors constituting the second sentence vector; and in response to determining that the first sentence vector and the at least one of the plurality of sub-vectors have the overlapping meaning based on the similarity, generating the output from which the at least one word having the overlapping meaning is removed, and wherein the at least one word having the overlapping meaning is a word corresponding to at least one of the plurality of sub-vectors having a calculated similarity that is equal to or greater than a threshold. 11. The voice processing method of claim 10 , further comprising: receiving, from a network, downlink control information (DCI) used to schedule the transmission of the first and second utterances; and transmitting the first and second utterances to the network based on the DCI. 12. The voice processing method of claim 11 , further comprising: performing an initial access procedure with the network based on a synchronization signal block (SSB); and transmitting the first and second utterances to the network via a physical uplink shared channel (PUSCH), wherein the SSB and a demodulation reference signal (DM-RS) of the PUSCH are QCLed for QCL (quasi co-located) type D. 13. The voice processing method of claim 12 , further comprising: controlling a communication module to transmit the first and second utterances to an AI processor included in the network; and controlling the communication module to receive AI processing information from the AI processor, wherein the AI processing information is voice information synthesized based on the output from which the at least one word having the overlapping meaning is removed. 14. An artificial intelligence device for voice processing, comprising: a memory configured to store utterances from a user; and a processor configured to: detect a stop signal during a reception of a first utterance, and temporarily pause the reception of the first utterance, receive a sub-utterance while the reception of the first utterance is temporarily paused, output a first result corresponding to the sub-utterance while the reception of the first utterance is temporarily paused, receive a second utterance after a termination of a temp

Assignees

Inventors

Classifications

  • G10L15/16Primary

    using artificial neural networks · CPC title

  • of uplink data flows · CPC title

  • in the downlink direction of a wireless link, i.e. towards a terminal · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • specially adapted for particular use · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11790893B2 cover?
A voice processing method is disclosed. The voice processing method applies first and second sentence vectors extracted from first and second utterances, that are included in one dialog group and are separated from each other, to a learning model and generates an output from which at least one word having an overlapping meaning is removed. The voice processing method can be associated with an a…
Who is the assignee on this patent?
Lg Electronics Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 17 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).