Arranging and/or clearing speech-to-text content without a user providing express instructions
US-2022366911-A1 · Nov 17, 2022 · US
US12431138B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12431138-B2 |
| Application number | US-202418677629-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 29, 2024 |
| Priority date | May 17, 2021 |
| Publication date | Sep 30, 2025 |
| Grant date | Sep 30, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Implementations described herein relate to an application and/or automated assistant that can identify arrangement operations to perform for arranging text during speech-to-text operations—without a user having to expressly identify the arrangement operations. In some instances, a user that is dictating a document (e.g., an email, a text message, etc.) can provide a spoken utterance to an application in order to incorporate textual content. However, in some of these instances, certain corresponding arrangements are needed for the textual content in the document. The textual content that is derived from the spoken utterance can be arranged by the application based on an intent, vocalization features, and/or contextual features associated with the spoken utterance and/or a type of the application associated with the document, without the user expressly identifying the corresponding arrangements. In this way, the application can infer content arrangement operations from a spoken utterance that only specifies the textual content.
Opening claim text (preview).
We claim: 1. A method implemented by one or more processors, the method comprising: receiving, at a computing device, a spoken utterance that is directed to a first application from a user, wherein the spoken utterance corresponds to a request for the first application to perform a speech-to-text operation for incorporating text into a field of a second application; generating, based on the spoken utterance, textual content data that characterizes textual content to be incorporated into the field of the second application, wherein the second application is different from the first application; generating, based on a type of application of the second application, content arrangement data that characterizes an arrangement, within the field of the second application, of a first portion of the textual content relative to a second portion of the textual content, wherein the content arrangement data that characterizes the arrangement, within the field of the second application, of the first portion of the textual content relative to the second portion of the textual content and that is generated based on the type of the application differs based on different types of applications corresponding to the second application; and causing, based on the textual content data and the content arrangement data, the textual content to be incorporated into a field of the second application according to the arrangement, in response to the spoken utterance. 2. The method of claim 1 , wherein the content arrangement data is generated further based on one or more prior interactions that involved the user providing other textual content to the type of application corresponding to the second application. 3. The method of claim 1 , further comprising: determining, based on the request, the type of the second application. 4. The method of claim 3 , further comprising: in response to determining that the type of the second application is a first type: generating, based on the type of the second application being the first type, first content arrangement data that characterizes a first arrangement as the arrangement, within the field of the second application, of the first portion of the textual content relative to the second portion of the textual content. 5. The method of claim 4 , further comprising: in response to determining that the type of the second application is a second type: generating, based on the type of the second application being the second type, second content arrangement data that characterizes a second arrangement as the arrangement, within the field of the second application, of the first portion of the textual content relative to the second portion of the textual content, wherein the second arrangement differs from the first arrangement. 6. The method of claim 5 , wherein the first type is one of an email application or a text messaging application, and wherein the second type is the other one of the email application or the text messaging application. 7. A system, comprising: at least one processor; and memory storing instructions that, when executed, cause the at least one processor to be operable to: receive, at a computing device, a spoken utterance that is directed to a first application from a user, wherein the spoken utterance corresponds to a request for the first application to perform a speech-to-text operation for incorporating text into a field of a second application; generate, based on the spoken utterance, textual content data that characterizes textual content to be incorporated into the field of the second application, wherein the second application is different from the first application; generate, based on a type of application of the second application, content arrangement data that characterizes an arrangement, within the field of the second application, of a first portion of the textual content relative to a second portion of the textual content, wherein the content arrangement data that characterizes the arrangement, within the field of the second application, of the first portion of the textual content relative to the second portion of the textual content and that is generated based on the type of the application differs based on different types of applications corresponding to the second application; and cause, based on the textual content data and the content arrangement data, the textual content to be incorporated into a field of the second application according to the arrangement, in response to the spoken utterance. 8. The system of claim 7 , wherein the content arrangement data is generated further based on one or more prior interactions that involved the user providing other textual content to the type of application corresponding to the second application. 9. The system of claim 7 , wherein the at least one processor is further operable to: determine, based on the request, the type of the second application. 10. The system of claim 9 , wherein the at least one processor is further operable to: in response to determining that the type of the second application is a first type: generate, based on the type of the second application being the first type, first content arrangement data that characterizes a first arrangement as the arrangement, within the field of the second application, of the first portion of the textual content relative to the second portion of the textual content. 11. The system of claim 10 , wherein the at least one processor is further operable to: in response to determining that the type of the second application is a second type: generate, based on the type of the second application being the second type, second content arrangement data that characterizes a second arrangement as the arrangement, within the field of the second application, of the first portion of the textual content relative to the second portion of the textual content, wherein the second arrangement differs from the first arrangement. 12. The system of claim 11 , wherein the first type is one of an email application or a text messaging application, and wherein the second type is the other one of the email application or the text messaging application. 13. A non-transitory computer-readable storage medium storing instructions that, when executed, cause at least one processor to perform operations, the operations comprising: receiving, at a computing device, a spoken utterance that is directed to a first application from a user, wherein the spoken utterance corresponds to a request for the first application to perform a speech-to-text operation for incorporating text into a field of a second application; generating, based on the spoken utterance, textual content data that characterizes textual content to be incorporated into the field of the second application, wherein the second application is different from the first application; generating, based on a type of application of the second application, content arrangement data that characterizes an arrangement, within the field of the second application, of a first portion of the textual content relative to a second portion of the textual content, wherein the content arrangement data that characterizes the arrangement, within the field of the second application, of the first portion of the textual content relative to the second portion of the textual content and that is generated based on the type of the application differs based on different types of applications corresponding to the second application; and causing, based on the textual content data and the content arrangement data, the textual content to be incorporated into a field of the
Execution procedure of a spoken command · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Semantic analysis · CPC title
Formatting, i.e. changing of presentation of documents (automatic justification G06F40/189; automatic line break hyphenation G06F40/191) · CPC title
Parsing for meaning understanding · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.