Total correlation variational autoencoder strengthened with attentions for segmenting syntax and semantics

US2022012425A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022012425-A1
Application numberUS-202016926525-A
CountryUS
Kind codeA1
Filing dateJul 10, 2020
Priority dateJul 10, 2020
Publication dateJan 13, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein are embodiments of a framework named as total correlation variational autoencoder (TC_VAE) to disentangle syntax and semantics by making use of total correlation penalties of KL divergences. One or more Kullback-Leibler (KL) divergence terms in a loss for a variational autoencoder are discomposed so that generated hidden variables may be separated. Embodiments of the TC_VAE framework were examined on semantic similarity tasks and syntactic similarity tasks. Experimental results show that better disentanglement between syntactic and semantic representations have been achieved compared with state-of-the-art (SOTA) results on the same data sets in similar settings.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for segmenting latent representations comprising: generating, using an embedding layer, a sequence of embeddings for a sequence of tokens; generating, using an attention layer, a sequence of attention masks based on the sequence of embeddings; generating a sequence of hidden variables based on the sequence of embeddings and the sequence of attention masks; generating, using a first encoder and a second encoder respectively, a first sequence of latent variables and a second sequence of latent variables based on the sequence of hidden variables; and inferring, using a decoder, a sequence of reconstructed tokens and a sequence of reconstructed attention masks based on at least information of the first and second sequences of latent variables. 2 . The computer-implemented method of claim 1 wherein each hidden variable in the sequence of hidden variables is generated by an element-wise multiplication between an embedding of the sequence of embeddings and a corresponding attention masks of the sequence of attention masks. 3 . The computer-implemented method of claim 1 wherein inferring a sequence of reconstructed tokens and a sequence of reconstructed attention masks based on at least information of at least the first and second sequences of latent variables comprising: combining a sequence of global latent variables with the first sequence of latent variables and the second sequence of latent variables to generate a first sequence of combined latent variables and a second sequence of combined latent variables respectively; receiving, at the decoder, the first sequence of combined latent variables and the second sequence of combined latent variables; and inferring the sequence of reconstructed tokens and the sequence of reconstructed attention masks. 4 . The computer-implemented method of claim 3 further comprising: using the sequence of reconstructed tokens and the sequence of reconstructed attention masks to establish a loss to train at least the attention layer, the first encoder, the second encoder, and the decoder. 5 . The computer-implemented method of claim 4 wherein the loss comprises one or more total correlation (TC) terms to enforce disentanglement of latent variables. 6 . The computer-implemented method of claim 5 wherein the one or more TC terms comprise a first Kullback-Leibler (KL) divergence for the first encoder and a second KL divergence for the second encoder. 7 . The computer-implemented method of claim 6 wherein the first KL divergence is a KL divergence between a distribution of the first sequence of combined latent variables and a product of a factorial distribution for each latent variable in the first sequence of latent variables and a factorial distribution for each global latent variable in the first combined sequence, the second KL divergence is a KL divergence between a distribution of the second sequence of combined latent variables and a product of a factorial distribution for each latent variable in the second sequence of latent variables and a factorial distribution for each global latent variable in the second combined sequence. 8 . The computer-implemented method of claim 1 wherein the first encoder is a semantic encoder, the second encoder is a syntax encoder. 9 . A system for segmenting latent representations comprising: one or more processors; and a non-transitory computer-readable medium or media comprising one or more sets of instructions which, when executed by at least one of the one or more processors, causes steps to be performed comprising: generating a sequence of embeddings for a sequence of tokens; generating a sequence of attention masks based on the sequence of embeddings; generating a sequence of hidden variables based on the sequence of embeddings with the sequence of attention masks; generating respectively a first sequence of latent variables and a second sequence of latent variables based on the sequence of hidden variables; and inferring a sequence of reconstructed tokens and a sequence of reconstructed attention masks based on at least information of the first and second sequences of latent variables. 10 . The system of claim 9 wherein each hidden variable in the sequence of hidden variables is generated by an element-wise multiplication between an embedding of the sequence of embeddings and a corresponding attention masks of the sequence of attention masks 11 . The system of claim 9 wherein inferring a sequence of reconstructed tokens and a sequence of reconstructed attention masks based on at least information of at least the first and second sequences of latent variables comprises steps of: combining a sequence of global latent variables with the first sequence of latent variables and the second sequence of latent variables to generate a first sequence of combined latent variables and a second sequence of combined latent variables respectively; receiving, at the decoder, the first sequence of combined latent variables and the second sequence of combined latent variables; and inferring the sequence of reconstructed tokens and the sequence of reconstructed attention masks. 12 . The system of claim 11 wherein the non-transitory computer-readable medium or media further comprises one or more sets of instructions which, when executed by at least one of the one or more processors, causes steps to be performed comprising: using the sequence of reconstructed tokens and the sequence of reconstructed attention masks to establish a loss for system training. 13 . The system of claim 12 wherein the loss comprises a total correlation (TC) terms to enforce disentanglement of latent variables, the one or more TC terms comprise a first Kullback-Leibler (KL) divergence for the first encoder and a second KL divergence for the second encoder. 14 . The system of claim 13 wherein the first KL divergence is a KL divergence between a distribution of the first sequence of combined latent variables and a product of a factorial distribution for each latent variable in the first sequence of latent variables and a factorial distribution for each global latent variable in the first combined sequence, the second KL divergence is a KL divergence between a distribution of the second sequence of combined latent variables and a product of a factorial distribution for each latent variable in the second sequence of latent variables and a factorial distribution for each global latent variable in the second combined sequence. 15 . A non-transitory computer-readable medium or media comprising one or more sequences of instructions which, when executed by at least one processor, causes steps for segmenting latent representations comprising: generating, using an embedding layer, a sequence of embeddings for a sequence of tokens; generating, using an attention layer, a sequence of attention masks based on the sequence of embeddings; generating a sequence of hidden variables based on the sequence of embeddings with the sequence of attention masks; generating, using a first encoder and a second encoder respectively, a first sequence of latent variables and a second sequence of latent variables based on the sequence of hidden variables; and inferring, using a decoder, a sequence of reconstructed tokens and a sequence of reconstructed attention masks based on at least information of at least the first and second sequences of latent variables. 16 . The non-transitory computer-readable medium or media of claim 15 wherein each hidden variable in the sequence of hidden va

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Probabilistic or stochastic networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Generative networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022012425A1 cover?
Described herein are embodiments of a framework named as total correlation variational autoencoder (TC_VAE) to disentangle syntax and semantics by making use of total correlation penalties of KL divergences. One or more Kullback-Leibler (KL) divergence terms in a loss for a variational autoencoder are discomposed so that generated hidden variables may be separated. Embodiments of the TC_VAE fra…
Who is the assignee on this patent?
Baidu Usa Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/284. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 13 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).