Voice commands for an automated assistant utilized in smart dictation

US2022366910A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022366910-A1
Application numberUS-202117322765-A
CountryUS
Kind codeA1
Filing dateMay 17, 2021
Priority dateMay 17, 2021
Publication dateNov 17, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods described herein relate to determining whether to incorporate recognized text, that corresponds to a spoken utterance of a user of a client device, into a transcription displayed at the client device, or to cause an assistant command, that is associated with the transcription and that is based on the recognized text, to be performed by an automated assistant implemented by the client device. The spoken utterance is received during a dictation session between the user and the automated assistant. Implementations can process, using automatic speech recognition model(s), audio data that captures the spoken utterance to generate the recognized text. Further, implementations can determine whether to incorporate the recognized text into the transcription or cause the assistant command to be performed based on touch input being directed to the transcription, a state of the transcription, and/or audio-based characteristic(s) of the spoken utterance.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method implemented by one or more processors, the method comprising: receiving audio data that captures a spoken utterance of a user of a client device, the audio data being generated one or more microphones of the client device, and the audio data being received while touch input of the user is being directed to a transcription that is displayed at the client device via a software application accessible at the client device; determining, based on the touch input of the user being directed to the transcription and the spoken utterance, whether to: incorporate recognized text, that corresponds to the spoken utterance, into the transcription, or perform an assistant command that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance; in response to determining to incorporate the recognized text that corresponds to the spoken utterance into the transcription: automatically incorporating the recognized text that corresponds to the spoken utterance into the transcription; and in response to determining to perform the assistant command that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance: causing an automated assistant to perform the assistant command that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance. 2 . The method of claim 1 , further comprising: processing, using an automatic speech recognition (ASR) model, the audio data that captures the spoken utterance to generate the recognized text that corresponds to the spoken utterance. 3 . The method of claim 2 , further comprising: processing, using a natural language understanding (NLU) model, the recognized text that corresponds to the spoken utterance to generate annotated recognized text. 4 . The method of claim 3 , further comprising: determining the assistant command that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance, wherein determining the assistant command, that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance, is based on the annotated recognized text. 5 . The method of claim 1 , wherein the touch input of the user is being directed to one or more textual segments of the transcription that is displayed at the client device. 6 . The method of claim 5 , wherein the touch input of the user graphically demarcates one or more of the textual segments of the transcription that is displayed at the client device. 7 . The method of claim 6 , wherein determining whether to incorporate the recognized text, that corresponds to the spoken utterance, into the transcription, or to perform the assistant command, that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance, comprises determining to perform the assistant command based on the touch input of the user graphically demarcating one or more of the textual segments of the transcription. 8 . The method of claim 1 , wherein the touch input of the user is being directed to one or more fields of the transcription that is displayed at the client device. 9 . The method of claim 8 , wherein determining whether to incorporate the recognized text, that corresponds to the spoken utterance, into the transcription, or to perform the assistant command, that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance, comprises determining to perform the assistant command based on the touch input of the user being directed to one or more fields of the transcription. 10 . The method of claim 1 , wherein automatically incorporating the recognized text that corresponds to the spoken utterance into the transcription comprises: causing the recognized text to be visually displayed to the user via the software application accessible at the client device as part of the transcription. 11 . The method of claim 10 , wherein causing the recognized text to be visually displayed to the user via the software application accessible at the client device as part of the transcription comprises causing the recognized text to be maintained in the transcription after additional text is incorporated into the transcription. 12 . A method implemented by one or more processors, the method comprising: receiving audio data that captures a spoken utterance of a user of a client device, the audio data being generated one or more microphones of the client device, and the audio data being received while a transcription is being displayed at the client device via a software application accessible at the client device; processing, using an automatic speech recognition (ASR) model, the audio data that captures the spoken utterance to generate recognized text that corresponds to the spoken utterance; processing, using a natural language understanding (NLU) model, the recognized text that corresponds to the spoken utterance to generate annotated recognized text; processing, using an audio-based machine learning (ML) model, the audio data that captures the spoken utterance to determine one or more audio-based characteristics of the spoken utterance; determining, based on one or more of the annotated recognized text or one or more of the audio-based characteristics of the spoken utterance, whether to: incorporate recognized text, that corresponds to the spoken utterance, into the transcription, or perform an assistant command that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance; in response to determining to incorporate the recognized text that corresponds to the spoken utterance into the transcription: automatically incorporating the recognized text that corresponds to the spoken utterance into the transcription; and in response to determining to perform the assistant command that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance: causing an automated assistant to perform the assistant command that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance. 13 . The method of claim 12 , further comprising: determining the assistant command that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance, wherein determining the assistant command, that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance, is based on the annotated recognized text. 14 . The method of claim 12 , wherein the audio-based ML model is an endpointing model trained to detect pauses in the spoken utterance, and wherein one or more of the audio-based characteristics of the spoken utterance correspond to one or more of pauses in the spoken utterance. 15 . The method of claim 14 , wherein determining whether to incorporate the recognized text, that corresponds to the spoken utterance, into the transcription, or to perform the assistant command, that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance, comprises determining to perform the assistant command associated with the transcription based on one or more of the pauses in the spoken utterance. 16

Assignees

Inventors

Classifications

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • using a touch-screen or digitiser, e.g. input of commands through traced gestures · CPC title

  • using natural language modelling · CPC title

  • Machine learning · CPC title

  • Execution procedure of a spoken command · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022366910A1 cover?
Systems and methods described herein relate to determining whether to incorporate recognized text, that corresponds to a spoken utterance of a user of a client device, into a transcription displayed at the client device, or to cause an assistant command, that is associated with the transcription and that is based on the recognized text, to be performed by an automated assistant implemented by t…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 17 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).