Contextual suppression of assistant command(s)

US11557293B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11557293-B2
Application numberUS-202117321994-A
CountryUS
Kind codeB2
Filing dateMay 17, 2021
Priority dateMay 17, 2021
Publication dateJan 17, 2023
Grant dateJan 17, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Some implementations process, using warm word model(s), a stream of audio data to determine a portion of the audio data that corresponds to particular word(s) and/or phrase(s) (e.g., a warm word) associated with an assistant command, process, using an automatic speech recognition (ASR) model, a preamble portion of the audio data (e.g., that precedes the warm word) and/or a postamble portion of the audio data (e.g., that follows the warm word) to generate ASR output, and determine, based on processing the ASR output, whether a user intended the assistant command to be performed. Additional or alternative implementations can process the stream of audio data using a speaker identification (SID) model to determine whether the audio data is sufficient to identify the user that provided a spoken utterance captured in the stream of audio data, and determine if that user is authorized to cause performance of the assistant command.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented by one or more processors, the method comprising: processing, using a warm word model, a stream of audio data to monitor for an occurrence of one or more particular words or phrases, the stream of audio data being generated by one or more microphones of a client device of a user, and each of the one or more particular words or phrases being associated with an assistant command; in response to determining a portion of the audio data corresponds to one or more of the particular words or phrases: processing, using an automatic speech recognition (ASR) model, a preamble portion of the audio data and/or a postamble portion of the audio data to generate ASR output, wherein the preamble portion of the audio data precedes the portion of the audio data that corresponds to the one or more particular words or phrases, and wherein the postamble portion of the audio data follows the portion of the audio data that corresponds to the one or more particular words or phrases; and determining, based on processing the ASR output, whether the user intended the one or more particular words or phrases to cause performance of the assistant command; in response to determining the user did not intend the one or more particular words or phrases to cause performance of the assistant command that is associated one or more of the particular words or phrases: refraining from causing an automated assistant to perform the assistant command that is associated with one or more of the particular words or phrases; and in response to determining the user intended the one or more particular words or phrases to cause performance of the assistant command that is associated with one or more of the particular words or phrases: causing the automated assistant to perform the assistant command that is associated with one or more of the particular words or phrases. 2. The method of claim 1 , further comprising: detecting an occurrence of a warm word activation event; and in response to detecting the occurrence of the warm word activation event, activating one or more currently dormant automated assistant functions that utilize the warm word model, wherein processing the stream of audio data using the warm word model to monitor for the occurrence of the one or more particular words or phrases is in response to activating the one or more currently dormant automated assistant functions that utilize the warm word model. 3. The method of claim 2 , wherein the warm word activation event comprises one or more of: a phone call being received at the client device, a text message being received at the client device, an email being received at the client device, an alarm sounding at the client device, a timer sounding at the client device, media being played at the client device or an additional client device in an environment of the client device, a notification being received at the client device, a location of the client device, or a software application being accessible at the client device. 4. The method of claim 1 , wherein determining whether the user intended the one or more particular words or phrases to cause performance of the assistant command that is associated with the one or more particular words or phrases based on processing the ASR output comprises: processing, using a natural language understanding (NLU) model, the ASR output to generate NLU output, wherein the ASR output is generated based on the preamble portion of the audio data, but not the postamble portion of the audio data; and determining, based on the NLU output, whether the user intended the one or more particular words or phrases to cause performance of the assistant command. 5. The method of claim 4 , further comprising: in response to determining the NLU output is insufficient for determining whether the user intended the one or more particular words or phrases to cause performance of the assistant command that is associated with one or more of the particular words or phrases: processing, using the ASR model, the postamble portion of the audio data to generate additional ASR output; and determining, based on processing the additional ASR output, whether the user intended the one or more particular words or phrases to cause performance of the assistant command that is associated with one or more of the particular words or phrases. 6. The method of claim 1 , wherein determining whether the user intended the one or more particular words or phrases to cause performance of the assistant command that is associated with one or more of the particular words or phrases based on processing the ASR output comprises: processing, using a natural language understanding (NLU) model, the ASR output to generate NLU output, wherein the ASR output is generated based on both the preamble portion of the audio data and the postamble portion of the audio data; and determining, based on the NLU output, whether the user intended the one or more particular words or phrases to cause performance of the assistant command. 7. The method of claim 6 , further comprising: in response to determining the NLU output is insufficient for determining whether the user intended the one or more particular words or phrases to cause performance of the assistant command that is associated with one or more of the particular words or phrases: processing, using the ASR model, an additional postamble portion of the audio data to generate additional ASR output, wherein the additional postamble portion of the audio data follows the postamble portion of the audio data; and determining, based on processing the additional ASR output, whether the user intended the one or more particular words or phrases to cause performance of the assistant command that is associated with one or more of the particular words or phrases. 8. The method of claim 1 , further comprising: processing, using an endpointing model, the stream of audio data to generate a plurality of timestamps for a spoken utterance that is captured in the stream of audio data, and that includes the one or more particular words or phrases. 9. The method of claim 8 , wherein the plurality of timestamps comprise at least a first timestamp associated with a first time when the user began providing the spoken utterance, a second timestamp associated with a second time, that is subsequent to the first time, when the user began providing the one or more particular words or phrases included in the spoken utterance, a third timestamp associated with a third time, that is subsequent to the second time, when the user finished providing the one or more particular words or phrases included in the spoken utterance, and a fourth timestamp associated with a fourth time, that is subsequent to the third time, when the user finished providing the spoken utterance. 10. The method of claim 9 , wherein the preamble portion of the audio data includes any audio data that corresponds to the spoken utterance between the first timestamp and the second timestamp. 11. The method of claim 9 , wherein the postamble portion of the audio data includes any audio data that corresponds to the spoken utterance between the third timestamp and the fourth timestamp. 12. The method of claim 1 , further comprising: activating one or more currently dormant automated assistant functions that utilize the ASR model in response to determining that the spoken utterance includes one or more of the particular words or phrases. 13. The method of claim 1 , further comprising: processing, using the ASR model, and along with the preamble portion of the audio data and/or the postamble portion audio data, t

Assignees

Inventors

Classifications

  • Word spotting · CPC title

  • Execution procedure of a spoken command · CPC title

  • Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11557293B2 cover?
Some implementations process, using warm word model(s), a stream of audio data to determine a portion of the audio data that corresponds to particular word(s) and/or phrase(s) (e.g., a warm word) associated with an assistant command, process, using an automatic speech recognition (ASR) model, a preamble portion of the audio data (e.g., that precedes the warm word) and/or a postamble portion of …
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 17 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).