Flexible-format voice command

US11735172B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11735172-B2
Application numberUS-202117239894-A
CountryUS
Kind codeB2
Filing dateApr 26, 2021
Priority dateApr 26, 2021
Publication dateAug 22, 2023
Grant dateAug 22, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A voice-based system is configured to process commands in a flexible format, for example, in which a wake word does not necessarily have to occur at the beginning of an utterance. As in natural speech, the system being addressed may be named within or at the end of a spoken utterance rather than at the beginning, or depending on the context, may not be named at all.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for processing speech commands, comprising: receiving a first audio input from a user; determining whether the first audio input comprises a valid system-direct command; and after determining that the first audio input comprises a first system-directed command, acting on said command; wherein determining whether the first audio input comprises a system-directed command includes automatically transcribing the first audio input to produce a first transcribed input, determining a location in the first transcribed input of a first word associated with a first assistant, determining that the first audio input has acoustic characteristics of a spoken command, and determining that the first audio input comprises a system-directed command based on the determined location of the first word and the determining that the first audio input has acoustic characteristics of a spoken command; and wherein acting on the command comprises invoking the first assistant to act on the first command. 2. The method of claim 1 , wherein automatically transcribing the first audio input includes applying an automated speech recognition procedure that permits the first word to occur at a plurality of locations in the output of said speech recognition procedure. 3. The method of claim 2 , wherein the automated speech recognition procedure uses a statistical language model that permits the first word to occur at the plurality of locations with different probabilities in different locations. 4. The method of claim 3 , wherein the statistical language model is determined from a training corpus of utterances in which the first word occurs in various locations in said utterances, and in system directed and not system directed utterances. 5. The method of claim 3 , wherein the statistical language model is configured with a plurality of names that are permitted to occur in locations in which the first word can occur. 6. The method of claim 5 , further comprising determining the plurality of names for configuring the statistical language model based on an environment of the user. 7. The method of claim 1 , wherein determining whether the audio input comprises a system directed command further includes: determining whether the first transcribed input has characteristics of a spoken command. 8. The method of claim 1 , further comprising: receiving a second audio input from the user; determining whether the second audio input comprises a valid system-direct command; and after determining that the second audio input does not comprise a system-directed command, preventing invoking of an assistant to act using the second audio input; wherein determining whether the second audio input comprises a system-directed command includes automatically transcribing the second audio input to produce a second transcribed input, determining a location in the second transcribed input of the first word associated with the first assistant, determining that the second audio input does not comprise a system-directed command based on at least one of a determined location of the first word in the second transcribed input and acoustic characteristics of the second audio input. 9. The method of claim 8 , wherein determining that the second audio input does not comprise a system-directed command is based on the determined location of the first word is not a permitted location for said first word. 10. The method of claim 1 , further comprising: receiving a second audio input from the user or a different user; and determining that the second audio input is a non-system-directed input. 11. The method of claim 1 , wherein determining whether the audio input comprises a system-directed command comprises using an auxiliary input comprising at least one of: a video signal representing a facial region of the user; and a manual input from the user. 12. The method of claim 1 , wherein the first word associated with the first assistant comprises a name of the first assistant. 13. The method of claim 1 , wherein determining the location of the first word comprises determining that the first word occurred at a location other than a beginning of the first command. 14. The method of claim 1 , wherein automatically transcribing the audio input uses a speech recognition language model that is configurable to set the first word associated with the first assistant to a user-specified name for the first assistant. 15. The method of claim 14 , wherein setting the first word associated with the first assistant to the user-specified name does not require retraining the speech recognition language model. 16. The method of claim 1 , wherein determining whether the audio input comprises a system-directed command includes determining to which of a plurality of assistants the command is directed. 17. The method of claim 16 , wherein determining to which of the plurality of assistants the command is directed comprises at least one of (a) determining which of a plurality of words associated with respective of the assistants is located in the audio input and (b) determining with which assistant the meaning of the command is associated. 18. The method of claim 16 , wherein determining whether the audio input comprises a system-directed command using different criteria associated with different assistants to determine whether the input comprises a system-directed command. 19. The method of claim 18 , wherein the different criteria comprise (a) a first criterion requiring that a first word associated with a first assistant be located at the beginning of a command, and (b) a second criterion that permits a second word associated with a second assistant to be at a location other than the beginning of the command and the meaning of the command is associated with the second assistant. 20. The method of claim 1 , wherein determining whether the audio input comprises a system-directed command comprises using a state of a dialog between the system and the user in the determining. 21. A voice-based assistant comprising: an audio input device; a computing device configured to: receive a first audio input from a user; determine whether the first audio input comprises a valid system-direct command; and after determining that the first audio input comprises a first system-directed command, act on said command; wherein determining whether the first audio input comprises a system-directed command includes automatically transcribing the first audio input to produce a first transcribed input, determining a location in the first transcribed input of a first word associated with a first assistant, determining that the first audio input has acoustic characteristics of a spoken command, and determining that the first audio input comprises a system-directed command based on the determined location of the first word and the determining that the first audio input has acoustic characteristics of a spoken command; and wherein acting on the command comprises invoking the first assistant to act on the first command. 22. A non-transitory machine-readable medium comprising instructions stored thereon, wherein the instructions when executed by a processor cause the processor to: receive a first audio input from a user; determine whether the first audio input comprises a valid system-direct command; and after determining that the first audio input comprises a first system-directed command, act on said command; wherein determ

Assignees

Inventors

Classifications

  • G10L15/197Primary

    Probabilistic grammars, e.g. word n-grams · CPC title

  • Training · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Execution procedure of a spoken command · CPC title

  • using neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11735172B2 cover?
A voice-based system is configured to process commands in a flexible format, for example, in which a wake word does not necessarily have to occur at the beginning of an utterance. As in natural speech, the system being addressed may be named within or at the end of a spoken utterance rather than at the beginning, or depending on the context, may not be named at all.
Who is the assignee on this patent?
Cerence Operating Co
What technology area does this patent fall under?
Primary CPC classification G10L15/197. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 22 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).