Voice commands for an automated assistant utilized in smart dictation

US12106758B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12106758-B2
Application numberUS-202117322765-A
CountryUS
Kind codeB2
Filing dateMay 17, 2021
Priority dateMay 17, 2021
Publication dateOct 1, 2024
Grant dateOct 1, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods described herein relate to determining whether to incorporate recognized text, that corresponds to a spoken utterance of a user of a client device, into a transcription displayed at the client device, or to cause an assistant command, that is associated with the transcription and that is based on the recognized text, to be performed by an automated assistant implemented by the client device. The spoken utterance is received during a dictation session between the user and the automated assistant. Implementations can process, using automatic speech recognition model(s), audio data that captures the spoken utterance to generate the recognized text. Further, implementations can determine whether to incorporate the recognized text into the transcription or cause the assistant command to be performed based on touch input being directed to the transcription, a state of the transcription, and/or audio-based characteristic(s) of the spoken utterance.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented by one or more processors, the method comprising: receiving audio data that captures a spoken utterance of a user of a client device, the audio data being generated by one or more microphones of the client device; determining whether touch input of the user is being simultaneously directed to a transcription, that is displayed at the client device via a software application accessible at the client device, at the same time the audio data that captures the spoken utterance is received; in response to determining that no touch input of the user is being simultaneously directed to the transcription at the same time the audio data that captures the spoken utterance is received: determining to incorporate recognized text, that corresponds to the spoken utterance, into the transcription; in response to determining that touch input of the user is being simultaneously directed to the transcription at the same time the audio data that captures the spoken utterance is received: determining, based on one or more terms of the spoken utterance, whether to: incorporate the recognized text, that corresponds to the spoken utterance, into the transcription, or perform an assistant command that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance; in response to determining to incorporate the recognized text that corresponds to the spoken utterance into the transcription: automatically incorporating the recognized text that corresponds to the spoken utterance into the transcription; and in response to determining to perform the assistant command that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance: causing an automated assistant to perform the assistant command that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance. 2. The method of claim 1 , further comprising: processing, using an automatic speech recognition (ASR) model, the audio data that captures the spoken utterance to generate the recognized text that corresponds to the spoken utterance. 3. The method of claim 2 , further comprising: processing, using a natural language understanding (NLU) model, the recognized text that corresponds to the spoken utterance to generate annotated recognized text. 4. The method of claim 3 , further comprising: determining the assistant command that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance, wherein determining the assistant command, that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance, is based on the annotated recognized text. 5. The method of claim 1 , wherein the touch input of the user is being directed to one or more textual segments of the transcription that is displayed at the client device. 6. The method of claim 5 , wherein the touch input of the user graphically demarcates one or more of the textual segments of the transcription that is displayed at the client device. 7. The method of claim 6 , wherein determining whether to incorporate the recognized text, that corresponds to the spoken utterance, into the transcription, or to perform the assistant command, that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance, comprises determining to perform the assistant command based on the touch input of the user graphically demarcating one or more of the textual segments of the transcription. 8. The method of claim 1 , wherein the touch input of the user is being directed to one or more fields of the transcription that is displayed at the client device. 9. The method of claim 8 , wherein determining whether to incorporate the recognized text, that corresponds to the spoken utterance, into the transcription, or to perform the assistant command, that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance, comprises determining to perform the assistant command based on the touch input of the user being directed to one or more fields of the transcription. 10. The method of claim 1 , wherein automatically incorporating the recognized text that corresponds to the spoken utterance into the transcription comprises: causing the recognized text to be visually displayed to the user via the software application accessible at the client device as part of the transcription. 11. The method of claim 10 , wherein causing the recognized text to be visually displayed to the user via the software application accessible at the client device as part of the transcription comprises causing the recognized text to be maintained in the transcription after additional text is incorporated into the transcription. 12. A system comprising: at least one processor; and memory storing instructions that, when executed, cause the at least one processor to be operable to: receive audio data that captures a spoken utterance of a user of a client device, the audio data being generated by one or more microphones of the client device; determine whether touch input of the user is being simultaneously directed to a transcription, that is displayed at the client device via a software application accessible at the client device, the audio data that captures the spoken utterance is received; in response to determining that no touch input of the user is being simultaneously directed to the transcription at the same time the audio data that captures the spoken utterance is received: determine to incorporate recognized text, that corresponds to the spoken utterance, into the transcription; in response to determining that touch input of the user is being simultaneously directed to the transcription at the same time the audio data that captures the spoken utterance is received: determine, based on one or more terms of the spoken utterance, whether to: incorporate the recognized text, that corresponds to the spoken utterance, into the transcription, or perform an assistant command that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance; in response to determining to incorporate the recognized text that corresponds to the spoken utterance into the transcription: automatically incorporate the recognized text that corresponds to the spoken utterance into the transcription; and in response to determining to perform the assistant command that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance: cause an automated assistant to perform the assistant command that is associated with the transcription and that is based on the recognized text that corresponds to the spoken utterance. 13. The system of claim 12 , wherein the at least one processor is further operable to: process, using an automatic speech recognition (ASR) model, the audio data that captures the spoken utterance to generate the recognized text that corresponds to the spoken utterance. 14. The system of claim 13 , wherein the at least one processor is further operable to: process, using a natural language understanding (NLU) model, the recognized text that corresponds to the spoken utterance to generate annotated recognized text. 15. The system of claim 14 , wherein the at least one processor is further operable to: determine the assistan

Assignees

Inventors

Classifications

  • Execution procedure of a spoken command · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • using natural language modelling · CPC title

  • using a touch-screen or digitiser, e.g. input of commands through traced gestures · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12106758B2 cover?
Systems and methods described herein relate to determining whether to incorporate recognized text, that corresponds to a spoken utterance of a user of a client device, into a transcription displayed at the client device, or to cause an assistant command, that is associated with the transcription and that is based on the recognized text, to be performed by an automated assistant implemented by t…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 01 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).