Arranging and/or clearing speech-to-text content without a user providing express instructions

US12431138B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12431138-B2
Application numberUS-202418677629-A
CountryUS
Kind codeB2
Filing dateMay 29, 2024
Priority dateMay 17, 2021
Publication dateSep 30, 2025
Grant dateSep 30, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Implementations described herein relate to an application and/or automated assistant that can identify arrangement operations to perform for arranging text during speech-to-text operations—without a user having to expressly identify the arrangement operations. In some instances, a user that is dictating a document (e.g., an email, a text message, etc.) can provide a spoken utterance to an application in order to incorporate textual content. However, in some of these instances, certain corresponding arrangements are needed for the textual content in the document. The textual content that is derived from the spoken utterance can be arranged by the application based on an intent, vocalization features, and/or contextual features associated with the spoken utterance and/or a type of the application associated with the document, without the user expressly identifying the corresponding arrangements. In this way, the application can infer content arrangement operations from a spoken utterance that only specifies the textual content.

First claim

Opening claim text (preview).

We claim: 1. A method implemented by one or more processors, the method comprising: receiving, at a computing device, a spoken utterance that is directed to a first application from a user, wherein the spoken utterance corresponds to a request for the first application to perform a speech-to-text operation for incorporating text into a field of a second application; generating, based on the spoken utterance, textual content data that characterizes textual content to be incorporated into the field of the second application, wherein the second application is different from the first application; generating, based on a type of application of the second application, content arrangement data that characterizes an arrangement, within the field of the second application, of a first portion of the textual content relative to a second portion of the textual content, wherein the content arrangement data that characterizes the arrangement, within the field of the second application, of the first portion of the textual content relative to the second portion of the textual content and that is generated based on the type of the application differs based on different types of applications corresponding to the second application; and causing, based on the textual content data and the content arrangement data, the textual content to be incorporated into a field of the second application according to the arrangement, in response to the spoken utterance. 2. The method of claim 1 , wherein the content arrangement data is generated further based on one or more prior interactions that involved the user providing other textual content to the type of application corresponding to the second application. 3. The method of claim 1 , further comprising: determining, based on the request, the type of the second application. 4. The method of claim 3 , further comprising: in response to determining that the type of the second application is a first type: generating, based on the type of the second application being the first type, first content arrangement data that characterizes a first arrangement as the arrangement, within the field of the second application, of the first portion of the textual content relative to the second portion of the textual content. 5. The method of claim 4 , further comprising: in response to determining that the type of the second application is a second type: generating, based on the type of the second application being the second type, second content arrangement data that characterizes a second arrangement as the arrangement, within the field of the second application, of the first portion of the textual content relative to the second portion of the textual content, wherein the second arrangement differs from the first arrangement. 6. The method of claim 5 , wherein the first type is one of an email application or a text messaging application, and wherein the second type is the other one of the email application or the text messaging application. 7. A system, comprising: at least one processor; and memory storing instructions that, when executed, cause the at least one processor to be operable to: receive, at a computing device, a spoken utterance that is directed to a first application from a user, wherein the spoken utterance corresponds to a request for the first application to perform a speech-to-text operation for incorporating text into a field of a second application; generate, based on the spoken utterance, textual content data that characterizes textual content to be incorporated into the field of the second application, wherein the second application is different from the first application; generate, based on a type of application of the second application, content arrangement data that characterizes an arrangement, within the field of the second application, of a first portion of the textual content relative to a second portion of the textual content, wherein the content arrangement data that characterizes the arrangement, within the field of the second application, of the first portion of the textual content relative to the second portion of the textual content and that is generated based on the type of the application differs based on different types of applications corresponding to the second application; and cause, based on the textual content data and the content arrangement data, the textual content to be incorporated into a field of the second application according to the arrangement, in response to the spoken utterance. 8. The system of claim 7 , wherein the content arrangement data is generated further based on one or more prior interactions that involved the user providing other textual content to the type of application corresponding to the second application. 9. The system of claim 7 , wherein the at least one processor is further operable to: determine, based on the request, the type of the second application. 10. The system of claim 9 , wherein the at least one processor is further operable to: in response to determining that the type of the second application is a first type: generate, based on the type of the second application being the first type, first content arrangement data that characterizes a first arrangement as the arrangement, within the field of the second application, of the first portion of the textual content relative to the second portion of the textual content. 11. The system of claim 10 , wherein the at least one processor is further operable to: in response to determining that the type of the second application is a second type: generate, based on the type of the second application being the second type, second content arrangement data that characterizes a second arrangement as the arrangement, within the field of the second application, of the first portion of the textual content relative to the second portion of the textual content, wherein the second arrangement differs from the first arrangement. 12. The system of claim 11 , wherein the first type is one of an email application or a text messaging application, and wherein the second type is the other one of the email application or the text messaging application. 13. A non-transitory computer-readable storage medium storing instructions that, when executed, cause at least one processor to perform operations, the operations comprising: receiving, at a computing device, a spoken utterance that is directed to a first application from a user, wherein the spoken utterance corresponds to a request for the first application to perform a speech-to-text operation for incorporating text into a field of a second application; generating, based on the spoken utterance, textual content data that characterizes textual content to be incorporated into the field of the second application, wherein the second application is different from the first application; generating, based on a type of application of the second application, content arrangement data that characterizes an arrangement, within the field of the second application, of a first portion of the textual content relative to a second portion of the textual content, wherein the content arrangement data that characterizes the arrangement, within the field of the second application, of the first portion of the textual content relative to the second portion of the textual content and that is generated based on the type of the application differs based on different types of applications corresponding to the second application; and causing, based on the textual content data and the content arrangement data, the textual content to be incorporated into a field of the

Assignees

Inventors

Classifications

  • Execution procedure of a spoken command · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Semantic analysis · CPC title

  • Formatting, i.e. changing of presentation of documents (automatic justification G06F40/189; automatic line break hyphenation G06F40/191) · CPC title

  • Parsing for meaning understanding · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12431138B2 cover?
Implementations described herein relate to an application and/or automated assistant that can identify arrangement operations to perform for arranging text during speech-to-text operations—without a user having to expressly identify the arrangement operations. In some instances, a user that is dictating a document (e.g., an email, a text message, etc.) can provide a spoken utterance to an appli…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 30 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).