Information processing device and information processing method
US-11335334-B2 · May 17, 2022 · US
US12033637B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12033637-B2 |
| Application number | US-202117337804-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 3, 2021 |
| Priority date | May 17, 2021 |
| Publication date | Jul 9, 2024 |
| Grant date | Jul 9, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Implementations described herein relate to an application and/or automated assistant that can identify arrangement operations to perform for arranging text during speech-to-text operations—without a user having to expressly identify the arrangement operations. In some instances, a user that is dictating a document (e.g., an email, a text message, etc.) can provide a spoken utterance to an application in order to incorporate textual content. However, in some of these instances, certain corresponding arrangements are needed for the textual content in the document. The textual content that is derived from the spoken utterance can be arranged by the application based on an intent, vocalization features, and/or contextual features associated with the spoken utterance and/or a type of the application associated with the document, without the user expressly identifying the corresponding arrangements. In this way, the application can infer content arrangement operations from a spoken utterance that only specifies the textual content.
Opening claim text (preview).
We claim: 1. A method implemented by one or more processors, the method comprising: receiving, at a computing device, a spoken utterance that is directed to a first application from a user, wherein the spoken utterance corresponds to a request for the first application to perform a speech-to-text operation for incorporating text into a field of a second application; generating, based on the spoken utterance, textual content data that characterizes textual content to be incorporated into the field of the second application, wherein the second application is different from the first application; generating, based on an intent associated with the spoken utterance, content arrangement data that characterizes an arrangement, within the field of the second application, of a first portion of the textual content relative to a second portion of the textual content; causing, based on the textual content data and the content arrangement data, the textual content to be incorporated into the field of the second application according to the arrangement, in response to the spoken utterance; receiving, at the computing device, an additional spoken utterance that is directed to the first application from the user, wherein the additional spoken utterance corresponds to an additional request for the first application to perform an additional speech-to-text operation for incorporating additional textual content into the field of the second application; and causing, in response to the additional spoken utterance, the second application to incorporate the additional textual content into the field of the second application; and causing, in response to the additional spoken utterance, the second application to perform one or more formatting operations that modify arrangement of the additional textual content relative to the textual content within the field, wherein the one or more formatting operations are not expressly identified by the user via the additional spoken utterance. 2. The method of claim 1 , wherein generating the content arrangement data includes: determining a duration of time between a first spoken portion of the spoken utterance and a second spoken portion of the spoken utterance, wherein the arrangement of the first portion of the textual content relative to the second portion of the textual content is based on the duration of time. 3. The method of claim 2 , wherein the arrangement includes a vertical position of the first portion of the textual content in the field of the second application relative to the second portion of the textual content. 4. The method of claim 3 , wherein causing the textual content to be incorporated into the field of the second application according to the arrangement includes: causing, at the field of the second application, incorporation of the vertical position of the first portion of the textual content relative to the second portion of the textual content. 5. The method of claim 4 , wherein causing incorporation of the vertical position of the first portion includes: incorporating carriage return data into the field of the second application following the first portion of the textual content, and incorporating the second portion of the textual content following the carriage return data. 6. The method of claim 1 , wherein generating the textual content data includes: identifying one or more punctuation symbols to include in the textual content to be incorporated into the field of the second application, wherein the spoken utterance does not expressly identify a punctuation symbol to be incorporated into the field of the second application. 7. The method of claim 1 , wherein the textual content data characterizes the natural language content embodied in the spoken utterance, and wherein the content arrangement data characterizes a formatting command that, when executed by the second application, causes the second application to arrange the first portion of the textual content separate from the second portion of the textual content within the field of the second application. 8. The method of claim 1 , wherein the first application is an automated assistant and the second application is a word processing application, and the method further comprises: identifying the one or more formatting operations based on the additional spoken utterance and the textual content incorporated into the field of the second application. 9. A method implemented by one or more processors, the method comprising: receiving, at a computing device, a first spoken utterance that corresponds to a request for a first application to perform a speech-to-text operation for a user; causing, based on the first spoken utterance, textual content to be rendered within a field of a second application, wherein the textual content includes natural language content of the first spoken utterance; receiving, from the user, a second spoken utterance that corresponds to an additional request for the first application to remove a portion of the textual content from the field of the second application, wherein the additional request does not expressly identify the portion of the textual content to be removed; determining, in response to the second spoken utterance, an amount of content to remove from the textual content that is rendered within the field of the second application, wherein determining the amount of content to remove from the textual content comprises: identifying a length of a first segment of text of the textual content, wherein the first segment of text is a first portion of the textual content that was most recently incorporated into the field of the application; causing, in response to the second spoken utterance, the amount of content to be removed from the textual content that is rendered within the field of the second application; receiving, at the computing device, an additional instance of the second spoken utterance for removing an additional portion of the textual content from the field of the application, determining an additional amount of content to remove from the textual content, wherein the additional amount of content includes a second segment of the textual content having a length that is longer than the first segment of the textual content; and causing, in response to the additional instance of the second spoken utterance, the additional amount of content to be removed from the textual content that is rendered within the field of the second application. 10. The method of claim 9 , wherein the amount of content to be removed is further based on a vocalization feature exhibited by the user when providing at least a portion of the first spoken utterance. 11. The method of claim 10 , wherein the vocalization feature includes a duration of time between separate portions of the first spoken utterance, and wherein the separate portions of the first spoken utterance describe different respective portions of the textual content. 12. The method of claim 10 , wherein the vocalization feature includes an intonation characteristic embodied in the first spoken utterance. 13. The method of claim 11 , wherein determining the amount of content to remove from the textual content includes: determining that the vocalization feature of the first spoken utterance includes an express pronunciation of individual natural language characters, wherein the textual content that is rendered within the field of the second application includes the individual natural language characters; and determining a quantity of the individual natural language characters to remove from the textual content that is rendered wi
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Execution procedure of a spoken command · CPC title
Formatting, i.e. changing of presentation of documents (automatic justification G06F40/189; automatic line break hyphenation G06F40/191) · CPC title
Automatic line break hyphenation · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.