Total correlation variational autoencoder strengthened with attentions for segmenting syntax and semantics

US11748567B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11748567-B2
Application numberUS-202016926525-A
CountryUS
Kind codeB2
Filing dateJul 10, 2020
Priority dateJul 10, 2020
Publication dateSep 5, 2023
Grant dateSep 5, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein are embodiments of a framework named as total correlation variational autoencoder (TC_VAE) to disentangle syntax and semantics by making use of total correlation penalties of KL divergences. One or more Kullback-Leibler (KL) divergence terms in a loss for a variational autoencoder are discomposed so that generated hidden variables may be separated. Embodiments of the TC_VAE framework were examined on semantic similarity tasks and syntactic similarity tasks. Experimental results show that better disentanglement between syntactic and semantic representations have been achieved compared with state-of-the-art (SOTA) results on the same data sets in similar settings.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for segmenting latent representations comprising: given a variational autoencoder (VAE) model comprising an attention layer, a semantic encoder, a syntax encoder, and a decoder: generating, using an embedding layer, a sequence of embeddings for a sequence of tokens, in which the tokens are words or representations of words; generating, using the attention layer, a sequence of attention masks based on the sequence of embeddings; generating a sequence of hidden variables based on the sequence of embeddings and the sequence of attention masks; generating, using the semantic encoder, a first sequence of latent variables based on the sequence of hidden variables; generating, using the syntax encoder, a second sequence of latent variables based on the sequence of hidden variables; inferring, using the decoder, a sequence of reconstructed tokens and a sequence of reconstructed attention masks using at least information of the first sequence of latent variables from the semantic encoder and the second sequence of latent variables from the syntax encoder; and using the sequence of reconstructed tokens and the sequence of reconstructed attention masks to train at least the attention layer, the semantic encoder, the syntax encoder, and the decoder of the VAE model. 2. The computer-implemented method of claim 1 wherein each hidden variable in the sequence of hidden variables is generated by an element-wise multiplication between an embedding of the sequence of embeddings and a corresponding attention mask or masks of the sequence of attention masks. 3. The computer-implemented method of claim 1 wherein inferring a sequence of reconstructed tokens and a sequence of reconstructed attention masks based on at least information of at least the first and second sequences of latent variables comprising: combining a sequence of global latent variables with the first sequence of latent variables and the second sequence of latent variables to generate a first sequence of combined latent variables and a second sequence of combined latent variables respectively; receiving, at the decoder, the first sequence of combined latent variables and the second sequence of combined latent variables; and inferring the sequence of reconstructed tokens and the sequence of reconstructed attention masks. 4. The computer-implemented method of claim 1 further comprising: using the first sequence of latent variables, the second sequence of latent variables, or both to gauge semantics similarity of input words or representations of words, to gauge syntactic similarity of input words or representations of words, or for a natural language processing method. 5. The computer-implemented method of claim 4 wherein the step of using the sequence of reconstructed tokens and the sequence of reconstructed attention masks to train at least the attention layer, the semantic encoder, the syntax encoder, and the decoder comprises determining a loss comprises one or more total correlation (TC) terms to enforce disentanglement of latent variables. 6. The computer-implemented method of claim 5 wherein the one or more TC terms comprise a first Kullback-Leibler (KL) divergence for the semantic encoder and a second KL divergence for the syntax encoder. 7. The computer-implemented method of claim 6 wherein the first KL divergence is a KL divergence between a distribution of a first sequence of combined latent variables and a product of a factorial distribution for each latent variable in the first sequence of latent variables and a factorial distribution for each global latent variable in the first sequence of combined latent variables, and the second KL divergence is a KL divergence between a distribution of a second sequence of combined latent variables and a product of a factorial distribution for each latent variable in the second sequence of latent variables and a factorial distribution for each global latent variable in the second sequence of combined latent variables. 8. The computer-implemented method of claim 1 further comprising: for an input word or an input representation of a word that was input into a system comprising at least a trained attention layer, a trained semantic encoder, and a trained syntax encoder, using a combination of its corresponding first latent variable from the trained semantic encoder and its corresponding second latent variable from the trained syntax encoder. 9. A system for segmenting latent representations comprising: one or more processors; and a non-transitory computer-readable medium or media comprising one or more sets of instructions which, when executed by at least one of the one or more processors, causes steps to be performed comprising: given a variational autoencoder (VAE) model comprising a semantic encoder, a syntax encoder, and a decoder: generating a sequence of embeddings for a sequence of tokens, in which a token is a word or a representation of a word; generating a sequence of attention masks based on the sequence of embeddings; generating a sequence of hidden variables based on the sequence of embeddings with the sequence of attention masks; generating, using the semantic encoder, a first sequence of latent variables based on the sequence of hidden variables; generating, using the syntax encoder, a second sequence of latent variables based on the sequence of hidden variables; inferring, using the decoder, a sequence of reconstructed tokens and a sequence of reconstructed attention masks using at least information of the first sequence of latent variables from the semantic encoder and the second sequence of latent variables from the syntax encoder; and using the sequence of reconstructed tokens and the sequence of reconstructed attention masks to train at least the semantic encoder, the syntax encoder, and the decoder of the VAE model. 10. The system of claim 9 wherein each hidden variable in the sequence of hidden variables is generated by an element-wise multiplication between an embedding of the sequence of embeddings and a corresponding attention mask or masks of the sequence of attention masks. 11. The system of claim 9 wherein inferring a sequence of reconstructed tokens and a sequence of reconstructed attention masks based on at least information of at least the first and second sequences of latent variables comprises steps of: combining a sequence of global latent variables with the first sequence of latent variables and the second sequence of latent variables to generate a first sequence of combined latent variables and a second sequence of combined latent variables respectively; receiving, at the decoder, the first sequence of combined latent variables and the second sequence of combined latent variables; and inferring the sequence of reconstructed tokens and the sequence of reconstructed attention masks. 12. The system of claim 9 wherein the non-transitory computer-readable medium or media further comprises one or more sets of instructions which, when executed by at least one of the one or more processors, causes steps to be performed comprising: using the first sequence of latent variables, the second sequence of latent variables, or both to gauge semantics similarity of input words or representations of words, to gauge syntactic similarity of input words or representations of words, or for a natural language processing method. 13. The system of claim 9 wherein the step of using the sequence of reconstructed tokens and the sequence of reconstructed attention masks to train at least an attention layer, the semantic encoder, the syntax encoder, and the decoder

Assignees

Inventors

Classifications

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Generative networks · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • G06F40/284Primary

    Lexical analysis, e.g. tokenisation or collocates · CPC title

  • G06F40/211Primary

    Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11748567B2 cover?
Described herein are embodiments of a framework named as total correlation variational autoencoder (TC_VAE) to disentangle syntax and semantics by making use of total correlation penalties of KL divergences. One or more Kullback-Leibler (KL) divergence terms in a loss for a variational autoencoder are discomposed so that generated hidden variables may be separated. Embodiments of the TC_VAE fra…
Who is the assignee on this patent?
Baidu Usa Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/284. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 05 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).