Multiple user recognition with voiceprints on online social networks
US-11223699-B1 · Jan 11, 2022 · US
US11735172B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11735172-B2 |
| Application number | US-202117239894-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 26, 2021 |
| Priority date | Apr 26, 2021 |
| Publication date | Aug 22, 2023 |
| Grant date | Aug 22, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A voice-based system is configured to process commands in a flexible format, for example, in which a wake word does not necessarily have to occur at the beginning of an utterance. As in natural speech, the system being addressed may be named within or at the end of a spoken utterance rather than at the beginning, or depending on the context, may not be named at all.
Opening claim text (preview).
What is claimed is: 1. A method for processing speech commands, comprising: receiving a first audio input from a user; determining whether the first audio input comprises a valid system-direct command; and after determining that the first audio input comprises a first system-directed command, acting on said command; wherein determining whether the first audio input comprises a system-directed command includes automatically transcribing the first audio input to produce a first transcribed input, determining a location in the first transcribed input of a first word associated with a first assistant, determining that the first audio input has acoustic characteristics of a spoken command, and determining that the first audio input comprises a system-directed command based on the determined location of the first word and the determining that the first audio input has acoustic characteristics of a spoken command; and wherein acting on the command comprises invoking the first assistant to act on the first command. 2. The method of claim 1 , wherein automatically transcribing the first audio input includes applying an automated speech recognition procedure that permits the first word to occur at a plurality of locations in the output of said speech recognition procedure. 3. The method of claim 2 , wherein the automated speech recognition procedure uses a statistical language model that permits the first word to occur at the plurality of locations with different probabilities in different locations. 4. The method of claim 3 , wherein the statistical language model is determined from a training corpus of utterances in which the first word occurs in various locations in said utterances, and in system directed and not system directed utterances. 5. The method of claim 3 , wherein the statistical language model is configured with a plurality of names that are permitted to occur in locations in which the first word can occur. 6. The method of claim 5 , further comprising determining the plurality of names for configuring the statistical language model based on an environment of the user. 7. The method of claim 1 , wherein determining whether the audio input comprises a system directed command further includes: determining whether the first transcribed input has characteristics of a spoken command. 8. The method of claim 1 , further comprising: receiving a second audio input from the user; determining whether the second audio input comprises a valid system-direct command; and after determining that the second audio input does not comprise a system-directed command, preventing invoking of an assistant to act using the second audio input; wherein determining whether the second audio input comprises a system-directed command includes automatically transcribing the second audio input to produce a second transcribed input, determining a location in the second transcribed input of the first word associated with the first assistant, determining that the second audio input does not comprise a system-directed command based on at least one of a determined location of the first word in the second transcribed input and acoustic characteristics of the second audio input. 9. The method of claim 8 , wherein determining that the second audio input does not comprise a system-directed command is based on the determined location of the first word is not a permitted location for said first word. 10. The method of claim 1 , further comprising: receiving a second audio input from the user or a different user; and determining that the second audio input is a non-system-directed input. 11. The method of claim 1 , wherein determining whether the audio input comprises a system-directed command comprises using an auxiliary input comprising at least one of: a video signal representing a facial region of the user; and a manual input from the user. 12. The method of claim 1 , wherein the first word associated with the first assistant comprises a name of the first assistant. 13. The method of claim 1 , wherein determining the location of the first word comprises determining that the first word occurred at a location other than a beginning of the first command. 14. The method of claim 1 , wherein automatically transcribing the audio input uses a speech recognition language model that is configurable to set the first word associated with the first assistant to a user-specified name for the first assistant. 15. The method of claim 14 , wherein setting the first word associated with the first assistant to the user-specified name does not require retraining the speech recognition language model. 16. The method of claim 1 , wherein determining whether the audio input comprises a system-directed command includes determining to which of a plurality of assistants the command is directed. 17. The method of claim 16 , wherein determining to which of the plurality of assistants the command is directed comprises at least one of (a) determining which of a plurality of words associated with respective of the assistants is located in the audio input and (b) determining with which assistant the meaning of the command is associated. 18. The method of claim 16 , wherein determining whether the audio input comprises a system-directed command using different criteria associated with different assistants to determine whether the input comprises a system-directed command. 19. The method of claim 18 , wherein the different criteria comprise (a) a first criterion requiring that a first word associated with a first assistant be located at the beginning of a command, and (b) a second criterion that permits a second word associated with a second assistant to be at a location other than the beginning of the command and the meaning of the command is associated with the second assistant. 20. The method of claim 1 , wherein determining whether the audio input comprises a system-directed command comprises using a state of a dialog between the system and the user in the determining. 21. A voice-based assistant comprising: an audio input device; a computing device configured to: receive a first audio input from a user; determine whether the first audio input comprises a valid system-direct command; and after determining that the first audio input comprises a first system-directed command, act on said command; wherein determining whether the first audio input comprises a system-directed command includes automatically transcribing the first audio input to produce a first transcribed input, determining a location in the first transcribed input of a first word associated with a first assistant, determining that the first audio input has acoustic characteristics of a spoken command, and determining that the first audio input comprises a system-directed command based on the determined location of the first word and the determining that the first audio input has acoustic characteristics of a spoken command; and wherein acting on the command comprises invoking the first assistant to act on the first command. 22. A non-transitory machine-readable medium comprising instructions stored thereon, wherein the instructions when executed by a processor cause the processor to: receive a first audio input from a user; determine whether the first audio input comprises a valid system-direct command; and after determining that the first audio input comprises a first system-directed command, act on said command; wherein determ
Probabilistic grammars, e.g. word n-grams · CPC title
Training · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Execution procedure of a spoken command · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.