Generating encoded text based on spoken utterances using machine learning systems and methods

US12248748B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12248748-B2
Application numberUS-202418414095-A
CountryUS
Kind codeB2
Filing dateJan 16, 2024
Priority dateJun 15, 2022
Publication dateMar 11, 2025
Grant dateMar 11, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for generating encoded text representations of spoken utterances are disclosed. Audio data is received for a spoken utterance and analyzed to identify a nonverbal characteristic, such as a sentiment, a speaking rate, or a volume. An encoded text representation of the spoken utterance is generated, comprising a text transcription and a visual representation of the nonverbal characteristic. The visual representation comprises a geometric element, such as a graph or shape, or a variation in a text attribute, such as font, font size, or color. Analysis of the audio data and/or generation of the encoded text representation can be performed using machine learning.

First claim

Opening claim text (preview).

We claim: 1. A mobile device for generating encoded text to convey nonverbal meaning based on audio inputs, the mobile device comprising: at least one hardware processor; at least one hardware display screen; and at least one non-transitory memory carrying instructions that, when executed by the at least one hardware processor, cause the mobile device to: analyze audio data for a spoken utterance using a text encoding model to identify a nonverbal characteristic including a sentiment of the spoken utterance; generate, by the text encoding model, an encoded representation of the spoken utterance, the encoded representation comprising a transcription and a visual representation of the nonverbal characteristic of the spoken utterance; generate, based on the nonverbal characteristic, a prompt to input a second spoken utterance comprising at least one suggestion for changes to one or more different nonverbal characteristics indicative of a different sentiment; and cause display, on the at least one hardware display screen, of the encoded representation and the prompt. 2. The mobile device of claim 1 , wherein the instructions further cause the mobile device to: modify the encoded representation in response to a received input; and receive an indication that the modified encoded representation is approved. 3. The mobile device of claim 1 , wherein the instructions further cause the mobile device to: receive second audio data for the second spoken utterance; and incorporate the encoded representation into a message or a post using a mobile application executing on the mobile device. 4. The mobile device of claim 1 , wherein generating the encoded representation of the spoken utterance further causes the mobile device to: automatically insert into the encoded representation an emoji or a set of characters based on the identified nonverbal characteristic of the spoken utterance. 5. The mobile device of claim 1 , wherein the instructions further cause the mobile device to: receive visual data for the spoken utterance, wherein the visual data comprises at least one image or video; and analyze the visual data, wherein the nonverbal characteristic is identified based at least in part on analysis of the visual data. 6. The mobile device of claim 1 , wherein the text encoding model includes a machine learning model that is trained, using a training dataset, to generate encoded representations based on audio data of spoken utterances. 7. The mobile device of claim 1 , wherein the visual representation of the one or more nonverbal characteristics comprises a geometric element or a variation in a text attribute. 8. The mobile device of claim 1 , wherein identifying the nonverbal characteristic of the spoken utterance further causes the mobile device to: detect, using a speech analytics model, a pitch, a timbre, a tone of voice, an inflection, a volume, or a speaking rate, or a change in a pitch, a timbre, a tone of voice, an inflection, a volume, or a speaking rate corresponding to the nonverbal characteristic. 9. A method for generating encoded text representations to convey nonverbal information based on audio inputs, the method comprising: analyzing audio data of a spoken utterance using a text encoding model to identify a nonverbal characteristic including a sentiment of the spoken utterance; generating, by the text encoding model, an encoded representation of the spoken utterance, the encoded representation comprising a transcription and a visual representation of the nonverbal characteristic of the spoken utterance; generating, based on the identified nonverbal characteristic of the spoken utterance, a prompt to input a second spoken utterance comprising at least one suggestion for changes to one or more different nonverbal characteristics; and causing display, via a user interface, of the generated encoded representation and the prompt. 10. The method of claim 9 , further comprising: modifying the generated encoded representation in response to a received input; and receiving an indication that the modified encoded representation is approved. 11. The method of claim 9 , further comprising: receiving second audio data for the second spoken utterance; and incorporating the displayed encoded representation into a message or a post using a mobile application. 12. The method of claim 9 , wherein generating the encoded representation of the spoken utterance comprises automatically inserting into the encoded representation an emoji or a set of characters based on the identified nonverbal characteristic of the spoken utterance. 13. The method of claim 9 , further comprising: receiving visual data for the spoken utterance, wherein the visual data comprises at least one image or video; and analyzing the visual data, wherein the nonverbal characteristic is identified based at least in part on an analysis of the visual data. 14. The method of claim 9 , wherein the text encoding model includes a machine learning model that is trained, using a training dataset, to generate encoded representations based on audio data of spoken utterances. 15. The method of claim 9 , wherein the visual representation of the one or more nonverbal characteristics comprises a geometric element or a variation in a text attribute. 16. The method of claim 9 , wherein identifying the nonverbal characteristic of the spoken utterance comprises: detecting, using a speech analytics model, a pitch, a timbre, a tone of voice, an inflection, a volume, or a speaking rate, or a change in a pitch, a timbre, a tone of voice, an inflection, a volume, or a speaking rate corresponding to the nonverbal characteristic. 17. At least one computer-readable medium, excluding transitory signals, carrying instructions that, when executed by a computing system, cause the computing system to perform operations to generate encoded to convey nonverbal information based on audio inputs, the operations comprising: analyzing audio data for a spoken utterance using a text encoding model to identify a nonverbal characteristic including a sentiment of the spoken utterance; generating, by the text encoding model, an encoded representation of the spoken utterance, the encoded representation comprising a transcription and a visual representation of the nonverbal characteristic of the spoken utterance; generate, based on the identified nonverbal characteristic of the spoken utterance, a prompt to input a second spoken utterance comprising at least one suggestion for changes to one or more different nonverbal characteristics indicative of a different sentiment; and cause display of the generated encoded representation and the prompt. 18. The at least one computer-readable medium of claim 17 , wherein the operations further comprise: modifying the generated encoded representation in response to a received input; and receiving an indication that the modified encoded representation is approved. 19. The at least one computer-readable medium of claim 17 , wherein the operations further comprise: receiving second audio data for the second spoken utterance; and incorporating the displayed encoded representation into a message or a post using a mobile application. 20. The at least one computer-readable medium of claim 17 , wherein generating the encoded representation of the spoken utterance further comprises: automatically inserting into the encoded representation an emoji or a set of characters based on the identified nonverbal characteristic

Assignees

Inventors

Classifications

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Machine learning · CPC title

  • using non-speech characteristics · CPC title

  • Editing, e.g. inserting or deleting · CPC title

  • Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12248748B2 cover?
Systems and methods for generating encoded text representations of spoken utterances are disclosed. Audio data is received for a spoken utterance and analyzed to identify a nonverbal characteristic, such as a sentiment, a speaking rate, or a volume. An encoded text representation of the spoken utterance is generated, comprising a text transcription and a visual representation of the nonverbal c…
Who is the assignee on this patent?
T Mobile Usa Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/126. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 11 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).