Computerized system and method for formatted transcription of multimedia content
US-2017062010-A1 · Mar 2, 2017 · US
US12380277B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12380277-B2 |
| Application number | US-202117453446-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 3, 2021 |
| Priority date | Mar 23, 2018 |
| Publication date | Aug 5, 2025 |
| Grant date | Aug 5, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Present embodiment include a prosody subsystem of a natural language understanding (NLU) framework that is designed to analyze collections of written messages for various prosodic cues to break down the collection into a suitable level of granularity (e.g., into episodes, sessions, segments, utterances, and/or intent segments) for consumption by other components of the NLU framework, enabling operation of the NLU framework. These prosodic cues may include, for example, source prosodic cues that are based on the author and the conversation channel associated with each message, temporal prosodic cues that are based on a respective time associated with each message, and/or written prosodic cues that are based on the content of each message. For example, to improve the domain specificity of the agent automation system, intent segments extracted by the prosody subsystem may be consumed by a training process for a ML-based structure subsystem of the NLU framework.
Opening claim text (preview).
What is claimed is: 1. A method of operating a natural language understanding (NLU) framework, comprising: receiving, via a prosody subsystem of the NLU framework, a conversation log comprising a plurality of written messages; dividing, via the prosody subsystem, a subset of the conversation log into a plurality of sessions based at least in part on temporal prosodic cues of the plurality of written messages, written prosodic cues of the plurality of written messages, a threshold delay stored in a database, a threshold number of messages stored in the database, or any combination thereof, the temporal prosodic cues indicating whether a delay between written messages of the plurality of written messages exceeds the threshold delay, whether a number of written messages sent within a time window exceeds the threshold number of messages, or both; dividing, via the prosody subsystem, each of the plurality of sessions into a plurality of conversation segments based at least in part on the temporal prosodic cues of the plurality of written messages, the written prosodic cues of the plurality of written messages, or any combination thereof; and providing the plurality of sessions, the plurality of conversation segments, or any combination thereof, to a behavior engine of the NLU framework, wherein the behavior engine is configured to generate episodic context information based on the plurality of sessions, the plurality of conversation segments, or the combination thereof. 2. The method of claim 1 , comprising: dividing each of the plurality of conversation segments into a plurality of utterances based at least in part on the temporal prosodic cues of the plurality of written messages, the written prosodic cues of the plurality of written messages, or any combination thereof. 3. The method of claim 2 , comprising: providing the plurality of utterances to a training process of a vocabulary subsystem of the NLU framework, wherein, within the training process, the plurality of utterances is used to generate a plurality of word vectors of a refined word vector distribution model that replaces a word vector distribution model of the vocabulary subsystem, wherein the NLU framework is configured to use the refined word vector distribution model to determine word vectors for words of received natural language requests. 4. The method of claim 2 , comprising: dividing each of the plurality of utterances into a plurality of intent segments based at least in part on the temporal prosodic cues of the plurality of written messages, the written prosodic cues of the plurality of written messages, or any combination thereof. 5. The method of claim 4 , comprising: providing the plurality of intent segments to a training process of a machine-learning (ML)-based parser of the NLU framework, wherein, within the training process, the NLU framework applies a plurality of other parsers of the NLU framework to generate a plurality of utterance trees for each intent segment, and in response to determining that a majority of the plurality of utterance trees for a particular intent segment are the same utterance tree, update a model of the ML-based parser such that the ML-based parser generates the same utterance tree for the particular intent segment. 6. The method of claim 4 , comprising: providing the plurality of intent segments to a semantic mining pipeline of the NLU framework, wherein the semantic mining pipeline performs actions comprising: generating intent vectors for each of the plurality of intent segments; generating meaning clusters of intent vectors based on distances between the intent vectors; detecting stable ranges of cluster radius values for the meaning clusters; and generating an intent/entity model from the meaning clusters and the stable ranges of cluster radius values, wherein the intent/entity model stores relationships between a representative intent of each of the meaning clusters and corresponding intent segments as sample utterances, and wherein the NLU framework is configured to use the intent/entity model to classify intents of received natural language requests. 7. The method of claim 1 , comprising: dividing, via the prosody subsystem, the conversation log into one or more conversation channel groups based on a conversational medium associated with each the plurality of written messages, wherein the subset of the conversation log corresponds to the one or more conversation channel groups. 8. The method of claim 1 , wherein, to generate the episodic context information, the behavior engine generates an episode frame tree set in a persona context database of a persona of the behavior engine based at least in part on the plurality of sessions, the plurality of conversation segments, or the combination thereof. 9. The method of claim 8 , wherein the episode frame tree set comprises an episode start time and an episode end time that are heuristically determined from plurality of sessions, the plurality of conversation segments, or the combination thereof. 10. The method of claim 1 , wherein the temporal prosody cues are determined based on a respective time associated with each of the plurality of written messages of the conversation log. 11. The method of claim 1 , wherein the written prosodic cues are determined based on punctuation, emojis, emphasis, or linguistic structure, or any combination thereof, within each of the plurality of written messages of the conversation log. 12. An agent automation system, comprising: at least one memory configured to store a conversation log including a plurality of written messages, and to store a natural language understanding (NLU) framework including a prosody subsystem and a vocabulary subsystem; and at least one processor configured to execute instructions of the NLU framework to cause the agent automation system to perform actions comprising: dividing, via a prosody subsystem of the NLU framework, a subset of the conversation log into a plurality of utterances based at least in part on temporal prosodic cues of the plurality of written messages, a threshold delay stored in a database, a threshold number of messages stored in the database, written prosodic cues of the plurality of written messages, or any combination thereof, the temporal prosodic cues indicating whether a delay between written messages of the plurality of written messages exceeds the threshold delay, whether a number of written messages sent within a time window exceeds the threshold number of messages, or both; and providing the plurality of utterances to a training process of the vocabulary subsystem, wherein, within the training process, the plurality of utterances is used to generate a plurality of word vectors of a refined word vector distribution model that replaces a word vector distribution model of the vocabulary subsystem, wherein the NLU framework is configured to use the refined word vector distribution model to determine word vectors for words of received natural language requests, the word vectors corresponding to semantic meanings of the words. 13. The system of claim 12 , wherein the at least one processor is configured to execute the instructions of the NLU framework to cause the agent automation system to perform actions comprising: dividing, via the prosody subsystem, the conversation log into one or more conversation channel groups based on a conversational medium associated with each written message of the conversation log; dividing, via the prosody subsystem of the NLU framework, each of the one or more conversation channel groups into a plurality of sessions based at least in part on the temporal prosodic cues of
using artificial neural networks · CPC title
using prosody or stress · CPC title
Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title
Parsing · CPC title
Knowledge engineering; Knowledge acquisition · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.