What technology area does this patent fall under?

Primary CPC classification G06F40/284. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jan 13 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Total correlation variational autoencoder strengthened with attentions for segmenting syntax and semantics

US2022012425A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2022012425-A1
Application number	US-202016926525-A
Country	US
Kind code	A1
Filing date	Jul 10, 2020
Priority date	Jul 10, 2020
Publication date	Jan 13, 2022
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein are embodiments of a framework named as total correlation variational autoencoder (TC_VAE) to disentangle syntax and semantics by making use of total correlation penalties of KL divergences. One or more Kullback-Leibler (KL) divergence terms in a loss for a variational autoencoder are discomposed so that generated hidden variables may be separated. Embodiments of the TC_VAE framework were examined on semantic similarity tasks and syntactic similarity tasks. Experimental results show that better disentanglement between syntactic and semantic representations have been achieved compared with state-of-the-art (SOTA) results on the same data sets in similar settings.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for segmenting latent representations comprising: generating, using an embedding layer, a sequence of embeddings for a sequence of tokens; generating, using an attention layer, a sequence of attention masks based on the sequence of embeddings; generating a sequence of hidden variables based on the sequence of embeddings and the sequence of attention masks; generating, using a first encoder and a second encoder respectively, a first sequence of latent variables and a second sequence of latent variables based on the sequence of hidden variables; and inferring, using a decoder, a sequence of reconstructed tokens and a sequence of reconstructed attention masks based on at least information of the first and second sequences of latent variables. 2 . The computer-implemented method of claim 1 wherein each hidden variable in the sequence of hidden variables is generated by an element-wise multiplication between an embedding of the sequence of embeddings and a corresponding attention masks of the sequence of attention masks. 3 . The computer-implemented method of claim 1 wherein inferring a sequence of reconstructed tokens and a sequence of reconstructed attention masks based on at least information of at least the first and second sequences of latent variables comprising: combining a sequence of global latent variables with the first sequence of latent variables and the second sequence of latent variables to generate a first sequence of combined latent variables and a second sequence of combined latent variables respectively; receiving, at the decoder, the first sequence of combined latent variables and the second sequence of combined latent variables; and inferring the sequence of reconstructed tokens and the sequence of reconstructed attention masks. 4 . The computer-implemented method of claim 3 further comprising: using the sequence of reconstructed tokens and the sequence of reconstructed attention masks to establish a loss to train at least the attention layer, the first encoder, the second encoder, and the decoder. 5 . The computer-implemented method of claim 4 wherein the loss comprises one or more total correlation (TC) terms to enforce disentanglement of latent variables. 6 . The computer-implemented method of claim 5 wherein the one or more TC terms comprise a first Kullback-Leibler (KL) divergence for the first encoder and a second KL divergence for the second encoder. 7 . The computer-implemented method of claim 6 wherein the first KL divergence is a KL divergence between a distribution of the first sequence of combined latent variables and a product of a factorial distribution for each latent variable in the first sequence of latent variables and a factorial distribution for each global latent variable in the first combined sequence, the second KL divergence is a KL divergence between a distribution of the second sequence of combined latent variables and a product of a factorial distribution for each latent variable in the second sequence of latent variables and a factorial distribution for each global latent variable in the second combined sequence. 8 . The computer-implemented method of claim 1 wherein the first encoder is a semantic encoder, the second encoder is a syntax encoder. 9 . A system for segmenting latent representations comprising: one or more processors; and a non-transitory computer-readable medium or media comprising one or more sets of instructions which, when executed by at least one of the one or more processors, causes steps to be performed comprising: generating a sequence of embeddings for a sequence of tokens; generating a sequence of attention masks based on the sequence of embeddings; generating a sequence of hidden variables based on the sequence of embeddings with the sequence of attention masks; generating respectively a first sequence of latent variables and a second sequence of latent variables based on the sequence of hidden variables; and inferring a sequence of reconstructed tokens and a sequence of reconstructed attention masks based on at least information of the first and second sequences of latent variables. 10 . The system of claim 9 wherein each hidden variable in the sequence of hidden variables is generated by an element-wise multiplication between an embedding of the sequence of embeddings and a corresponding attention masks of the sequence of attention masks 11 . The system of claim 9 wherein inferring a sequence of reconstructed tokens and a sequence of reconstructed attention masks based on at least information of at least the first and second sequences of latent variables comprises steps of: combining a sequence of global latent variables with the first sequence of latent variables and the second sequence of latent variables to generate a first sequence of combined latent variables and a second sequence of combined latent variables respectively; receiving, at the decoder, the first sequence of combined latent variables and the second sequence of combined latent variables; and inferring the sequence of reconstructed tokens and the sequence of reconstructed attention masks. 12 . The system of claim 11 wherein the non-transitory computer-readable medium or media further comprises one or more sets of instructions which, when executed by at least one of the one or more processors, causes steps to be performed comprising: using the sequence of reconstructed tokens and the sequence of reconstructed attention masks to establish a loss for system training. 13 . The system of claim 12 wherein the loss comprises a total correlation (TC) terms to enforce disentanglement of latent variables, the one or more TC terms comprise a first Kullback-Leibler (KL) divergence for the first encoder and a second KL divergence for the second encoder. 14 . The system of claim 13 wherein the first KL divergence is a KL divergence between a distribution of the first sequence of combined latent variables and a product of a factorial distribution for each latent variable in the first sequence of latent variables and a factorial distribution for each global latent variable in the first combined sequence, the second KL divergence is a KL divergence between a distribution of the second sequence of combined latent variables and a product of a factorial distribution for each latent variable in the second sequence of latent variables and a factorial distribution for each global latent variable in the second combined sequence. 15 . A non-transitory computer-readable medium or media comprising one or more sequences of instructions which, when executed by at least one processor, causes steps for segmenting latent representations comprising: generating, using an embedding layer, a sequence of embeddings for a sequence of tokens; generating, using an attention layer, a sequence of attention masks based on the sequence of embeddings; generating a sequence of hidden variables based on the sequence of embeddings with the sequence of attention masks; generating, using a first encoder and a second encoder respectively, a first sequence of latent variables and a second sequence of latent variables based on the sequence of hidden variables; and inferring, using a decoder, a sequence of reconstructed tokens and a sequence of reconstructed attention masks based on at least information of at least the first and second sequences of latent variables. 16 . The non-transitory computer-readable medium or media of claim 15 wherein each hidden variable in the sequence of hidden va

Assignees

Baidu Usa Llc

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06N3/047
Probabilistic or stochastic networks · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06N3/0475
Generative networks · CPC title

Patent family

Related publications grouped by family.

View patent family 75994399

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022012425A1 cover?: Described herein are embodiments of a framework named as total correlation variational autoencoder (TC_VAE) to disentangle syntax and semantics by making use of total correlation penalties of KL divergences. One or more Kullback-Leibler (KL) divergence terms in a loss for a variational autoencoder are discomposed so that generated hidden variables may be separated. Embodiments of the TC_VAE fra…
Who is the assignee on this patent?: Baidu Usa Llc
What technology area does this patent fall under?: Primary CPC classification G06F40/284. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jan 13 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

System and method for semantic analysis of multimedia data using attention-based fusion network

Training a joint many-task neural network model using successive regularization

System and method for controllable machine text generation architecture

Assignment of semantic labels to a sequence of words using neural network architectures

Frequently asked questions