What technology area does this patent fall under?

Primary CPC classification G10L15/22. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Arranging and/or clearing speech-to-text content without a user providing express instructions

US12033637B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12033637-B2
Application number	US-202117337804-A
Country	US
Kind code	B2
Filing date	Jun 3, 2021
Priority date	May 17, 2021
Publication date	Jul 9, 2024
Grant date	Jul 9, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Implementations described herein relate to an application and/or automated assistant that can identify arrangement operations to perform for arranging text during speech-to-text operations—without a user having to expressly identify the arrangement operations. In some instances, a user that is dictating a document (e.g., an email, a text message, etc.) can provide a spoken utterance to an application in order to incorporate textual content. However, in some of these instances, certain corresponding arrangements are needed for the textual content in the document. The textual content that is derived from the spoken utterance can be arranged by the application based on an intent, vocalization features, and/or contextual features associated with the spoken utterance and/or a type of the application associated with the document, without the user expressly identifying the corresponding arrangements. In this way, the application can infer content arrangement operations from a spoken utterance that only specifies the textual content.

First claim

Opening claim text (preview).

We claim: 1. A method implemented by one or more processors, the method comprising: receiving, at a computing device, a spoken utterance that is directed to a first application from a user, wherein the spoken utterance corresponds to a request for the first application to perform a speech-to-text operation for incorporating text into a field of a second application; generating, based on the spoken utterance, textual content data that characterizes textual content to be incorporated into the field of the second application, wherein the second application is different from the first application; generating, based on an intent associated with the spoken utterance, content arrangement data that characterizes an arrangement, within the field of the second application, of a first portion of the textual content relative to a second portion of the textual content; causing, based on the textual content data and the content arrangement data, the textual content to be incorporated into the field of the second application according to the arrangement, in response to the spoken utterance; receiving, at the computing device, an additional spoken utterance that is directed to the first application from the user, wherein the additional spoken utterance corresponds to an additional request for the first application to perform an additional speech-to-text operation for incorporating additional textual content into the field of the second application; and causing, in response to the additional spoken utterance, the second application to incorporate the additional textual content into the field of the second application; and causing, in response to the additional spoken utterance, the second application to perform one or more formatting operations that modify arrangement of the additional textual content relative to the textual content within the field, wherein the one or more formatting operations are not expressly identified by the user via the additional spoken utterance. 2. The method of claim 1 , wherein generating the content arrangement data includes: determining a duration of time between a first spoken portion of the spoken utterance and a second spoken portion of the spoken utterance, wherein the arrangement of the first portion of the textual content relative to the second portion of the textual content is based on the duration of time. 3. The method of claim 2 , wherein the arrangement includes a vertical position of the first portion of the textual content in the field of the second application relative to the second portion of the textual content. 4. The method of claim 3 , wherein causing the textual content to be incorporated into the field of the second application according to the arrangement includes: causing, at the field of the second application, incorporation of the vertical position of the first portion of the textual content relative to the second portion of the textual content. 5. The method of claim 4 , wherein causing incorporation of the vertical position of the first portion includes: incorporating carriage return data into the field of the second application following the first portion of the textual content, and incorporating the second portion of the textual content following the carriage return data. 6. The method of claim 1 , wherein generating the textual content data includes: identifying one or more punctuation symbols to include in the textual content to be incorporated into the field of the second application, wherein the spoken utterance does not expressly identify a punctuation symbol to be incorporated into the field of the second application. 7. The method of claim 1 , wherein the textual content data characterizes the natural language content embodied in the spoken utterance, and wherein the content arrangement data characterizes a formatting command that, when executed by the second application, causes the second application to arrange the first portion of the textual content separate from the second portion of the textual content within the field of the second application. 8. The method of claim 1 , wherein the first application is an automated assistant and the second application is a word processing application, and the method further comprises: identifying the one or more formatting operations based on the additional spoken utterance and the textual content incorporated into the field of the second application. 9. A method implemented by one or more processors, the method comprising: receiving, at a computing device, a first spoken utterance that corresponds to a request for a first application to perform a speech-to-text operation for a user; causing, based on the first spoken utterance, textual content to be rendered within a field of a second application, wherein the textual content includes natural language content of the first spoken utterance; receiving, from the user, a second spoken utterance that corresponds to an additional request for the first application to remove a portion of the textual content from the field of the second application, wherein the additional request does not expressly identify the portion of the textual content to be removed; determining, in response to the second spoken utterance, an amount of content to remove from the textual content that is rendered within the field of the second application, wherein determining the amount of content to remove from the textual content comprises: identifying a length of a first segment of text of the textual content, wherein the first segment of text is a first portion of the textual content that was most recently incorporated into the field of the application; causing, in response to the second spoken utterance, the amount of content to be removed from the textual content that is rendered within the field of the second application; receiving, at the computing device, an additional instance of the second spoken utterance for removing an additional portion of the textual content from the field of the application, determining an additional amount of content to remove from the textual content, wherein the additional amount of content includes a second segment of the textual content having a length that is longer than the first segment of the textual content; and causing, in response to the additional instance of the second spoken utterance, the additional amount of content to be removed from the textual content that is rendered within the field of the second application. 10. The method of claim 9 , wherein the amount of content to be removed is further based on a vocalization feature exhibited by the user when providing at least a portion of the first spoken utterance. 11. The method of claim 10 , wherein the vocalization feature includes a duration of time between separate portions of the first spoken utterance, and wherein the separate portions of the first spoken utterance describe different respective portions of the textual content. 12. The method of claim 10 , wherein the vocalization feature includes an intonation characteristic embodied in the first spoken utterance. 13. The method of claim 11 , wherein determining the amount of content to remove from the textual content includes: determining that the vocalization feature of the first spoken utterance includes an express pronunciation of individual natural language characters, wherein the textual content that is rendered within the field of the second application includes the individual natural language characters; and determining a quantity of the individual natural language characters to remove from the textual content that is rendered wi

Assignees

Google Llc

Inventors

Classifications

G10L15/22Primary
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G10L2015/223
Execution procedure of a spoken command · CPC title
G06F40/103
Formatting, i.e. changing of presentation of documents (automatic justification G06F40/189; automatic line break hyphenation G06F40/191) · CPC title
G06F40/191
Automatic line break hyphenation · CPC title
G10L15/26Primary
Speech to text systems (G10L15/08 takes precedence) · CPC title

Patent family

Related publications grouped by family.

View patent family 83998866

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12033637B2 cover?: Implementations described herein relate to an application and/or automated assistant that can identify arrangement operations to perform for arranging text during speech-to-text operations—without a user having to expressly identify the arrangement operations. In some instances, a user that is dictating a document (e.g., an email, a text message, etc.) can provide a spoken utterance to an appli…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Information processing device and information processing method

Device, system and method for controlling a plurality of voice recognition devices

Speech recognition arbitration logic

Canonicalizing search queries to natural language questions

Information processing device, information processing method, and program

Training punctuation models

Frequently asked questions