Total correlation variational autoencoder strengthened with attentions for segmenting syntax and semantics
US-2022012425-A1 · Jan 13, 2022 · US
US12147771B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12147771-B2 |
| Application number | US-202117361878-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 29, 2021 |
| Priority date | Jun 29, 2021 |
| Publication date | Nov 19, 2024 |
| Grant date | Nov 19, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
System and methods for a text summarization system are described. In one example, a text summarization system receives an input utterance and determines whether the utterance should be included in a summary of the text. The text summarization system includes an embedding network, a convolution network, an encoding component, and a summary component. The embedding network generates a semantic embedding of an utterance. The convolution network generates a plurality of feature vectors based on the semantic embedding. The encoding component identifies a plurality of latent codes respectively corresponding to the plurality of feature vectors. The summary component identifies a prominent code among the latent codes and to select the utterance as a summary utterance based on the prominent code.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving text including an utterance; generating a semantic embedding of the utterance using an embedding network; generating a plurality of feature vectors based on the semantic embedding using a convolution network; identifying a first plurality of latent codes respectively corresponding to the plurality of feature vectors by identifying a closest latent code from a second plurality of latent codes of a codebook to each corresponding feature vector of the plurality of feature vectors, wherein the second plurality of latent codes of the codebook discretizes a semantic space based on a number of dimensions of the semantic space, and wherein the closest latent code is identified by computing a similarity between the closest latent code and the corresponding feature vector; identifying a prominent code among the first plurality of latent codes; and generating an indication that the utterance is a summary utterance based on the prominent code. 2. The method of claim 1 , further comprising: receiving audio information; and converting the audio information to produce the text. 3. The method of claim 2 , wherein: the audio information is received in a streaming format, and the utterance is selected as the summary utterance in real time. 4. The method of claim 2 , further comprising: receiving video information; and identifying the audio information from the video information. 5. The method of claim 1 , further comprising: identifying a plurality of summary utterances for the text; and generating a summary for the text based on the plurality of summary utterances. 6. The method of claim 1 , further comprising: appending a sentence tag to the utterance, wherein the semantic embedding of the utterance corresponds to an output of the embedding network corresponding to the sentence tag. 7. The method of claim 1 , wherein: a number of the latent codes in the second plurality of latent codes of the codebook is equal to a number of dimensions of the semantic embedding. 8. The method of claim 1 , wherein: a number of dimensions of the first plurality of latent codes is equal to a number of filters of the convolution network. 9. The method of claim 1 , further comprising: computing a Euclidean distance between each of the feature vectors and each of the second plurality of latent codes from the codebook, wherein the closest latent code is identified based on the Euclidean distance. 10. The method of claim 1 , further comprising: identifying a plurality of text segments in the text; identifying a frequency for each latent code of the second plurality of latent codes from the codebook in each of the text segments; and identifying a set of prominent codes based on the frequency, wherein the prominent code is an element of the set of prominent codes. 11. The method of claim 10 , further comprising: identifying a most frequent code from each of the text segments, wherein the set of prominent codes includes the most frequent code from each of the text segments. 12. The method of claim 10 , further comprising: identifying a set of segment codes associated with a text segment associated with a predetermined location within the text; and refraining from including the set of segment codes in the set of prominent codes based on the association with the text segment, wherein the set of prominent codes includes the prominent code. 13. An apparatus comprising: an embedding network configured to generate a semantic embedding of an utterance; a convolution network generates a plurality of feature vectors based on the semantic embedding; and an encoding component configured to identify a plurality of first latent codes respectively corresponding to the plurality of feature vectors by identifying a closest latent code from a second plurality of latent codes of a codebook to each corresponding feature vector of the plurality of feature vectors, wherein the second plurality of latent codes discretizes a semantic space based on a number of dimensions of the semantic space, and wherein the closest latent code is identified by computing a similarity between the closest latent code and the corresponding feature vector; and a summary component configured to identify a prominent code among the first plurality of latent codes and to select the utterance as a summary utterance based on the prominent code. 14. The apparatus of claim 13 , further comprising: an audio converter configured to receive audio information and convert the audio information to text, wherein the utterance is identified from the text. 15. The apparatus of claim 13 , further comprising: a user interface configured to display the summary utterance. 16. The apparatus of claim 13 , wherein: the summary component is further configured to generate a summary for a text based on the summary utterance. 17. A method of training a neural network, the method comprising: receiving a training set including an input utterance; generating a semantic embedding of the input utterance using an embedding network; generating a plurality of feature vectors based on the semantic embedding using a convolution network; identifying a first plurality of latent codes respectively corresponding to the plurality of feature vectors by identifying a closest latent code from a second plurality of latent codes of a codebook to each corresponding feature vector of the plurality of feature vectors, wherein the second plurality of latent codes discretizes a semantic space based on a number of dimensions of the semantic space, and wherein the closest latent code is identified by computing a similarity between the closest latent code and the corresponding feature vector; generating an output embedding based on the first plurality of latent codes using a convolutional decoder; generating an output text based on the output embedding; computing an autoencoder loss by comparing the input utterance and the output text; and updating parameters of the convolution network based on the autoencoder loss. 18. The method of claim 17 , further comprising: computing a codebook loss by comparing each of the plurality of feature vectors with a corresponding latent code from the first plurality of latent codes, wherein the parameters are updated based on the codebook loss. 19. The method of claim 18 , wherein: the codebook loss is based on a stop-gradient operator on the each of the plurality of feature vectors, a corresponding latent code from the plurality of latent codes, or both. 20. The method of claim 17 , further comprising: updating the codebook based on the autoencoder loss.
Recognition of textual entities · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Semantic analysis · CPC title
Discourse or dialogue representation · CPC title
using statistical methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.