Method and system for processing user spoken utterance
US-2021193141-A1 · Jun 24, 2021 · US
US12248748B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12248748-B2 |
| Application number | US-202418414095-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 16, 2024 |
| Priority date | Jun 15, 2022 |
| Publication date | Mar 11, 2025 |
| Grant date | Mar 11, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for generating encoded text representations of spoken utterances are disclosed. Audio data is received for a spoken utterance and analyzed to identify a nonverbal characteristic, such as a sentiment, a speaking rate, or a volume. An encoded text representation of the spoken utterance is generated, comprising a text transcription and a visual representation of the nonverbal characteristic. The visual representation comprises a geometric element, such as a graph or shape, or a variation in a text attribute, such as font, font size, or color. Analysis of the audio data and/or generation of the encoded text representation can be performed using machine learning.
Opening claim text (preview).
We claim: 1. A mobile device for generating encoded text to convey nonverbal meaning based on audio inputs, the mobile device comprising: at least one hardware processor; at least one hardware display screen; and at least one non-transitory memory carrying instructions that, when executed by the at least one hardware processor, cause the mobile device to: analyze audio data for a spoken utterance using a text encoding model to identify a nonverbal characteristic including a sentiment of the spoken utterance; generate, by the text encoding model, an encoded representation of the spoken utterance, the encoded representation comprising a transcription and a visual representation of the nonverbal characteristic of the spoken utterance; generate, based on the nonverbal characteristic, a prompt to input a second spoken utterance comprising at least one suggestion for changes to one or more different nonverbal characteristics indicative of a different sentiment; and cause display, on the at least one hardware display screen, of the encoded representation and the prompt. 2. The mobile device of claim 1 , wherein the instructions further cause the mobile device to: modify the encoded representation in response to a received input; and receive an indication that the modified encoded representation is approved. 3. The mobile device of claim 1 , wherein the instructions further cause the mobile device to: receive second audio data for the second spoken utterance; and incorporate the encoded representation into a message or a post using a mobile application executing on the mobile device. 4. The mobile device of claim 1 , wherein generating the encoded representation of the spoken utterance further causes the mobile device to: automatically insert into the encoded representation an emoji or a set of characters based on the identified nonverbal characteristic of the spoken utterance. 5. The mobile device of claim 1 , wherein the instructions further cause the mobile device to: receive visual data for the spoken utterance, wherein the visual data comprises at least one image or video; and analyze the visual data, wherein the nonverbal characteristic is identified based at least in part on analysis of the visual data. 6. The mobile device of claim 1 , wherein the text encoding model includes a machine learning model that is trained, using a training dataset, to generate encoded representations based on audio data of spoken utterances. 7. The mobile device of claim 1 , wherein the visual representation of the one or more nonverbal characteristics comprises a geometric element or a variation in a text attribute. 8. The mobile device of claim 1 , wherein identifying the nonverbal characteristic of the spoken utterance further causes the mobile device to: detect, using a speech analytics model, a pitch, a timbre, a tone of voice, an inflection, a volume, or a speaking rate, or a change in a pitch, a timbre, a tone of voice, an inflection, a volume, or a speaking rate corresponding to the nonverbal characteristic. 9. A method for generating encoded text representations to convey nonverbal information based on audio inputs, the method comprising: analyzing audio data of a spoken utterance using a text encoding model to identify a nonverbal characteristic including a sentiment of the spoken utterance; generating, by the text encoding model, an encoded representation of the spoken utterance, the encoded representation comprising a transcription and a visual representation of the nonverbal characteristic of the spoken utterance; generating, based on the identified nonverbal characteristic of the spoken utterance, a prompt to input a second spoken utterance comprising at least one suggestion for changes to one or more different nonverbal characteristics; and causing display, via a user interface, of the generated encoded representation and the prompt. 10. The method of claim 9 , further comprising: modifying the generated encoded representation in response to a received input; and receiving an indication that the modified encoded representation is approved. 11. The method of claim 9 , further comprising: receiving second audio data for the second spoken utterance; and incorporating the displayed encoded representation into a message or a post using a mobile application. 12. The method of claim 9 , wherein generating the encoded representation of the spoken utterance comprises automatically inserting into the encoded representation an emoji or a set of characters based on the identified nonverbal characteristic of the spoken utterance. 13. The method of claim 9 , further comprising: receiving visual data for the spoken utterance, wherein the visual data comprises at least one image or video; and analyzing the visual data, wherein the nonverbal characteristic is identified based at least in part on an analysis of the visual data. 14. The method of claim 9 , wherein the text encoding model includes a machine learning model that is trained, using a training dataset, to generate encoded representations based on audio data of spoken utterances. 15. The method of claim 9 , wherein the visual representation of the one or more nonverbal characteristics comprises a geometric element or a variation in a text attribute. 16. The method of claim 9 , wherein identifying the nonverbal characteristic of the spoken utterance comprises: detecting, using a speech analytics model, a pitch, a timbre, a tone of voice, an inflection, a volume, or a speaking rate, or a change in a pitch, a timbre, a tone of voice, an inflection, a volume, or a speaking rate corresponding to the nonverbal characteristic. 17. At least one computer-readable medium, excluding transitory signals, carrying instructions that, when executed by a computing system, cause the computing system to perform operations to generate encoded to convey nonverbal information based on audio inputs, the operations comprising: analyzing audio data for a spoken utterance using a text encoding model to identify a nonverbal characteristic including a sentiment of the spoken utterance; generating, by the text encoding model, an encoded representation of the spoken utterance, the encoded representation comprising a transcription and a visual representation of the nonverbal characteristic of the spoken utterance; generate, based on the identified nonverbal characteristic of the spoken utterance, a prompt to input a second spoken utterance comprising at least one suggestion for changes to one or more different nonverbal characteristics indicative of a different sentiment; and cause display of the generated encoded representation and the prompt. 18. The at least one computer-readable medium of claim 17 , wherein the operations further comprise: modifying the generated encoded representation in response to a received input; and receiving an indication that the modified encoded representation is approved. 19. The at least one computer-readable medium of claim 17 , wherein the operations further comprise: receiving second audio data for the second spoken utterance; and incorporating the displayed encoded representation into a message or a post using a mobile application. 20. The at least one computer-readable medium of claim 17 , wherein generating the encoded representation of the spoken utterance further comprises: automatically inserting into the encoded representation an emoji or a set of characters based on the identified nonverbal characteristic
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Machine learning · CPC title
using non-speech characteristics · CPC title
Editing, e.g. inserting or deleting · CPC title
Learning methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.