What technology area does this patent fall under?

Primary CPC classification G10L15/197. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 22 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Flexible-format voice command

US11735172B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11735172-B2
Application number	US-202117239894-A
Country	US
Kind code	B2
Filing date	Apr 26, 2021
Priority date	Apr 26, 2021
Publication date	Aug 22, 2023
Grant date	Aug 22, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A voice-based system is configured to process commands in a flexible format, for example, in which a wake word does not necessarily have to occur at the beginning of an utterance. As in natural speech, the system being addressed may be named within or at the end of a spoken utterance rather than at the beginning, or depending on the context, may not be named at all.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for processing speech commands, comprising: receiving a first audio input from a user; determining whether the first audio input comprises a valid system-direct command; and after determining that the first audio input comprises a first system-directed command, acting on said command; wherein determining whether the first audio input comprises a system-directed command includes automatically transcribing the first audio input to produce a first transcribed input, determining a location in the first transcribed input of a first word associated with a first assistant, determining that the first audio input has acoustic characteristics of a spoken command, and determining that the first audio input comprises a system-directed command based on the determined location of the first word and the determining that the first audio input has acoustic characteristics of a spoken command; and wherein acting on the command comprises invoking the first assistant to act on the first command. 2. The method of claim 1 , wherein automatically transcribing the first audio input includes applying an automated speech recognition procedure that permits the first word to occur at a plurality of locations in the output of said speech recognition procedure. 3. The method of claim 2 , wherein the automated speech recognition procedure uses a statistical language model that permits the first word to occur at the plurality of locations with different probabilities in different locations. 4. The method of claim 3 , wherein the statistical language model is determined from a training corpus of utterances in which the first word occurs in various locations in said utterances, and in system directed and not system directed utterances. 5. The method of claim 3 , wherein the statistical language model is configured with a plurality of names that are permitted to occur in locations in which the first word can occur. 6. The method of claim 5 , further comprising determining the plurality of names for configuring the statistical language model based on an environment of the user. 7. The method of claim 1 , wherein determining whether the audio input comprises a system directed command further includes: determining whether the first transcribed input has characteristics of a spoken command. 8. The method of claim 1 , further comprising: receiving a second audio input from the user; determining whether the second audio input comprises a valid system-direct command; and after determining that the second audio input does not comprise a system-directed command, preventing invoking of an assistant to act using the second audio input; wherein determining whether the second audio input comprises a system-directed command includes automatically transcribing the second audio input to produce a second transcribed input, determining a location in the second transcribed input of the first word associated with the first assistant, determining that the second audio input does not comprise a system-directed command based on at least one of a determined location of the first word in the second transcribed input and acoustic characteristics of the second audio input. 9. The method of claim 8 , wherein determining that the second audio input does not comprise a system-directed command is based on the determined location of the first word is not a permitted location for said first word. 10. The method of claim 1 , further comprising: receiving a second audio input from the user or a different user; and determining that the second audio input is a non-system-directed input. 11. The method of claim 1 , wherein determining whether the audio input comprises a system-directed command comprises using an auxiliary input comprising at least one of: a video signal representing a facial region of the user; and a manual input from the user. 12. The method of claim 1 , wherein the first word associated with the first assistant comprises a name of the first assistant. 13. The method of claim 1 , wherein determining the location of the first word comprises determining that the first word occurred at a location other than a beginning of the first command. 14. The method of claim 1 , wherein automatically transcribing the audio input uses a speech recognition language model that is configurable to set the first word associated with the first assistant to a user-specified name for the first assistant. 15. The method of claim 14 , wherein setting the first word associated with the first assistant to the user-specified name does not require retraining the speech recognition language model. 16. The method of claim 1 , wherein determining whether the audio input comprises a system-directed command includes determining to which of a plurality of assistants the command is directed. 17. The method of claim 16 , wherein determining to which of the plurality of assistants the command is directed comprises at least one of (a) determining which of a plurality of words associated with respective of the assistants is located in the audio input and (b) determining with which assistant the meaning of the command is associated. 18. The method of claim 16 , wherein determining whether the audio input comprises a system-directed command using different criteria associated with different assistants to determine whether the input comprises a system-directed command. 19. The method of claim 18 , wherein the different criteria comprise (a) a first criterion requiring that a first word associated with a first assistant be located at the beginning of a command, and (b) a second criterion that permits a second word associated with a second assistant to be at a location other than the beginning of the command and the meaning of the command is associated with the second assistant. 20. The method of claim 1 , wherein determining whether the audio input comprises a system-directed command comprises using a state of a dialog between the system and the user in the determining. 21. A voice-based assistant comprising: an audio input device; a computing device configured to: receive a first audio input from a user; determine whether the first audio input comprises a valid system-direct command; and after determining that the first audio input comprises a first system-directed command, act on said command; wherein determining whether the first audio input comprises a system-directed command includes automatically transcribing the first audio input to produce a first transcribed input, determining a location in the first transcribed input of a first word associated with a first assistant, determining that the first audio input has acoustic characteristics of a spoken command, and determining that the first audio input comprises a system-directed command based on the determined location of the first word and the determining that the first audio input has acoustic characteristics of a spoken command; and wherein acting on the command comprises invoking the first assistant to act on the first command. 22. A non-transitory machine-readable medium comprising instructions stored thereon, wherein the instructions when executed by a processor cause the processor to: receive a first audio input from a user; determine whether the first audio input comprises a valid system-direct command; and after determining that the first audio input comprises a first system-directed command, act on said command; wherein determ

Assignees

Cerence Operating Co

Inventors

Classifications

G10L15/197Primary
Probabilistic grammars, e.g. word n-grams · CPC title
G10L15/063
Training · CPC title
G10L15/22Primary
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G10L2015/223
Execution procedure of a spoken command · CPC title
G06V10/82
using neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 83694497

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11735172B2 cover?: A voice-based system is configured to process commands in a flexible format, for example, in which a wake word does not necessarily have to occur at the beginning of an utterance. As in natural speech, the system being addressed may be named within or at the end of a spoken utterance rather than at the beginning, or depending on the context, may not be named at all.
Who is the assignee on this patent?: Cerence Operating Co
What technology area does this patent fall under?: Primary CPC classification G10L15/197. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 22 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Multiple user recognition with voiceprints on online social networks

Automated control of noise reduction or noise masking

Method and system for controlling home assistant devices

Frequently asked questions