Analyzing the tone of textual data
US-2021073255-A1 · Mar 11, 2021 · US
US11380300B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11380300-B2 |
| Application number | US-202016777360-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 30, 2020 |
| Priority date | Oct 11, 2019 |
| Publication date | Jul 5, 2022 |
| Grant date | Jul 5, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In particular embodiments, an apparatus comprises a non-transitory computer-readable storage media and a processor coupled to the media executes instructions to: access a plurality of text, generate, using one or more natural language understanding (NLU) models, one or more scores for at least a portion of the plurality of text. The apparatus determines, based on the scores, one or more prosodic values corresponding to the portion of the plurality of text. The apparatus determines, based on the one or more prosodic values, one or more speech synthesis markup language (SSML) tags. The apparatus then generates, based on the prosodic values, SSML-tagged data comprising each determined SSML tag and that tag's location in the plurality of text.
Opening claim text (preview).
What is claimed is: 1. An apparatus, comprising: one or more non-transitory computer-readable storage media embodying instructions; and one or more processors coupled to the storage media and configured to execute the instructions to: access a plurality of text; generate, using one or more natural language understanding (NLU) models, a sentiment class score indicative of one or more emotions for at least a portion of the plurality of text and a subjectivity score indicative of subjectivity for at least the portion of the plurality of text; determine, based on the subjectivity score, a rate of change in pitch or rate values for the portion of the plurality of text; determine, based on the sentiment class score and the subjectivity score, one or more prosodic values corresponding to the portion of the plurality of text; determine, based on the one or more prosodic values, one or more speech synthesis markup language (SSML) tags corresponding to the one or more emotions indicated by the sentiment class score; and generate, based on the prosodic values, SSML-tagged data comprising the determined one or more SSML tags and respective tag location in the portion of the plurality of text. 2. The apparatus of claim 1 , wherein: the apparatus further comprises a client computing device comprising a speaker; and the one or more processors are further configured to execute the instructions to: access the plurality of text based on a user input received at the client computing device; and initiate transmission of speech output to the speaker, wherein the speech output comprises the plurality of text with instructions to verbalize the portion of the plurality of text according to the SSML-tagged data. 3. The apparatus of claim 1 , wherein: the apparatus further comprises a server computing device; and the one or more processors are further configured to execute the instructions to: receive an identification of the portion of the plurality of text based on an input of a user of a client computing device; and transmit the SSML-tagged data to the client computing device. 4. The apparatus of claim 1 , wherein: the prosodic values comprise a pitch value and a rate value; and the one or more processors are further configured to execute the instructions to dynamically set minimum and maximum ranges for the pitch value and the rate value based on the subjectivity score. 5. The apparatus of claim 1 , wherein the one or more processors are further configured to execute the instructions to: identify in the portion of the plurality of text a plurality of sentences and words; and generate a set of scores including one or more of: the subjectivity score for each sentence of the portion of the plurality of text; a polarity score for each sentence of the portion of the plurality of text; or an importance score for each sentence or each word of the portion of the plurality of text. 6. The apparatus of claim 1 , wherein the one or more NLU models comprise a first NLU model configured to: categorize the portion of the plurality of text according to a set of topics; and generate a polarity score and the subjectivity score for each sentence of the portion of the plurality of text. 7. The apparatus of claim 6 , wherein the one or more NLU models further comprise a second NLU model configured to generate an importance score for each of a plurality of portions of the plurality of text. 8. The apparatus of claim 7 , wherein the plurality of portions of the plurality of text comprise one or more of a sentence, a phrase, or a word in the plurality of text. 9. The apparatus of claim 7 , wherein the one or more NLU models further comprise a third NLU model configured to identify as a trending topic one or more words or phrases in the portions of the plurality of text. 10. The apparatus of claim 9 , wherein the inflection characteristics comprise at least one of: an upward inflection, a downward inflection, or a circumflex inflection. 11. The apparatus of claim 1 , wherein the one or more processors are further configured to execute the instructions to: generate word-level importance scores for words or phrases in the portion of the plurality of text; and determine, based on the word-level importance scores, inflection characteristics for the portion of the plurality of text. 12. The apparatus of claim 1 , wherein the one or more prosodic values correspond to one or more of a pitch, a rate of speech, a volume of speech, an amount of emphasis, or a length of a pause. 13. The apparatus of claim 1 , wherein to determine the one or more prosodic values based on the sentiment class score, the one or more processors are further configured to execute the instructions to: provide, to a neural network, the portion of the plurality of text and the sentiment class score from the one or more NLU models; and receive, from the neural network, the one or more prosodic values corresponding to the portion of the plurality of text. 14. One or more non-transitory computer-readable storage media embodying instructions that, when executed by one or more processors, cause the one or more processors to: access a plurality of text; generate, using one or more natural language understanding (NLU) models, a sentiment class score indicative of one or more emotions for at least a portion of the plurality of text and a subjectivity score indicative of subjectivity for at least the portion of the plurality of text; determine, based on the subjectivity score, a rate of change in pitch or rate values for the portion of the plurality of text; determine, based on the sentiment class score and the subjectivity score, one or more prosodic values corresponding to the portion of the plurality of text; determine, based on the one or more prosodic values, one or more speech synthesis markup language (SSML) tags corresponding to the one or more emotions indicated by the sentiment class score; and generate, based on the prosodic values, SSML-tagged data comprising the determined one or more SSML tags and respective tag location in the portion of the plurality of text. 15. The non-transitory computer-readable storage media of claim 14 , wherein the instructions further comprise instructions to: access the plurality of text based on a user input received at the client computing device; and initiate transmission of speech output to the speaker, wherein the speech output comprises the plurality of text with instructions to verbalize the portion of the plurality of text according to the SSML-tagged data. 16. A method performed by one or more processors of a computing system, comprising: accessing a plurality of text; generating, using one or more natural language understanding (NLU) models a sentiment class score indicative of one or more emotions for at least a portion of the plurality of text and a subjectivity score indicative of subjectivity for at least the portion of the plurality of text; determine, based on the subjectivity score, a rate of change in pitch or rate values for the portion of the plurality of text; determining, based on the sentiment class score and the subjectivity score, one or more prosodic values corresponding to the portion of the plurality of text; determining, based on the one or more prosodic values, one or more speech synthesis markup language (SSML) tags corresponding to the one or more emotions indicated by the sentiment class score; and generating, based on the prosodic values, SSML-tagged data comprising the determined one or more SSML tags and respective tag in the portion of the plur
Architecture of speech synthesisers · CPC title
Pitch control · CPC title
Prosody rules derived from text; Stress or intonation · CPC title
Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD] · CPC title
Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.