What technology area does this patent fall under?

Primary CPC classification G06F40/47. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 04 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Sequence modeling using imputation

Patent metadata
Field	Value
Publication number	US-12242818-B2
Application number	US-202117797872-A
Country	US
Kind code	B2
Filing date	Feb 8, 2021
Priority date	Feb 7, 2020
Publication date	Mar 4, 2025
Grant date	Mar 4, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for sequence modeling. One of the methods includes receiving an input sequence having a plurality of input positions; determining a plurality of blocks of consecutive input positions; processing the input sequence using a neural network to generate a latent alignment, comprising, at each of a plurality of input time steps: receiving a partial latent alignment from a previous input time step; selecting an input position in each block, wherein the token at the selected input position of the partial latent alignment in each block is a mask token; and processing the partial latent alignment and the input sequence using the neural network to generate a new latent alignment, wherein the new latent alignment comprises, at the selected input position in each block, an output token or a blank token; and generating, using the latent alignment, an output sequence.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of generating, from an input sequence having a respective input token at each of a plurality of input positions, an output sequence having a respective output token from a vocabulary of output tokens at each of a plurality of output positions, the method comprising: receiving the input sequence; determining a plurality of blocks, wherein each block comprises a plurality of input tokens having consecutive input positions from the input positions, wherein the input tokens comprise a first token modality associated with audio tokens, text tokens, or image tokens; processing the input sequence using a neural network to generate a latent alignment of the input sequence, wherein the latent alignment comprises, at each of the input positions, either an output token from the vocabulary of output tokens or a blank token, the processing comprising, at each of a plurality of input time steps: receiving a partial latent alignment from a previous input time step, wherein the partial latent alignment comprises, at each of the input positions, one of: an output token, a blank token, or a mask token; selecting an input position in each block, wherein the token at the selected input position of the partial latent alignment in each block is a mask token; and processing i) the partial latent alignment and ii) the input sequence using the neural network to generate a new latent alignment, wherein the new latent alignment comprises, at the selected input position in each block, an output token or a blank token; and generating, using the latent alignment, the output sequence, wherein the output tokens comprise a second token modality associated with audio tokens, text tokens, or image tokens. 2. The method of claim 1 , wherein each block comprises a same number of input tokens, and wherein the same number of input tokens is equal to a number of input time steps. 3. The method of claim 1 , wherein processing i) the partial latent alignment and ii) the input sequence using the neural network to generate a new latent alignment comprises: processing the input sequence using a first embedding subnetwork to generate an input sequence embedding; processing the partial latent alignment using a second embedding subnetwork to generate a partial latent alignment embedding; combining the partial latent alignment embedding and the input sequence embedding to generate a combined embedding; and processing the combined embedding using a self-attention subnetwork to generate the new latent alignment. 4. The method of claim 1 , wherein the first token modality is an audio token modality comprising audio sample tokens and the second token modality is a text token modality comprising text sample tokens. 5. The method of claim 1 , wherein the first token modality is a text token modality comprising text sample in a first language and the second token modality is a text token modality comprising text sample in a second language. 6. The method of claim 1 , wherein processing i) the partial latent alignment and ii) the input sequence using the neural network to generate a new latent alignment comprises: upsampling the input sequence to generate a modified input sequence; and processing i) the partial latent alignment and ii) the modified input sequence using the neural network to generate the new latent alignment. 7. The method of claim 1 , wherein the neural network has been trained by updating parameters θ of the neural network using an objective function that marginalizes over all possible new partial latent alignments that are compatible with a particular partial latent alignment. 8. The method of claim 7 , wherein the objective function is: J DP (θ)= E a˜q ϕ′ [E ã˜r [log Σ a′∈β′(ã, a) p θ ( a′|ã, x )]], where x is the input sequence, a is a particular latent alignment, ã is a particular partial latent alignment of the latent alignment a, ϕ′ is a pseudo-expert policy, q ϕ′ is a distribution over all possible latent alignments of x under the pseudo-expert policy ϕ′, r(a) is a distribution over all possible masking permutations of latent alignments of x, and β′(ã, a) returns a set of all possible new partial latent alignments compatible with the particular partial latent alignment ã drawn from the distribution q ϕ′ ×r. 9. The method of claim 1 , wherein the neural network has been trained by updating parameters θ of the neural network using an objective function that computes a loss according to a pseudo-expert policy. 10. The method of claim 9 , wherein the objective function is: J IM (θ)= E a˜q ϕ′ [E ã˜r [log p θ ( a|ã, x )]], where x is the input sequence, a is a particular latent alignment, ã is a particular partial latent alignment of the latent alignment a, ϕ′ is a pseudo-expert policy, q ϕ′ is a distribution over all possible alignments of x under the pseudo-expert policy ϕ′, and r(a) is a distribution over all possible masking permutations of alignments of x. 11. The method of claim 8 , wherein q ϕ′ =â* ϕ +N, where N is a noise distribution and â* ϕ is a best empirical alignment under an expert policy ϕ, a ^ ϕ ⋆ = arg ⁢ max a ⁢ q ϕ ( a ❘ x , y ) , where q ϕ is a distribution over all possible alignments of x under the expert policy ϕ. 12. The method of claim 11 , wherein â* ϕ is computed using dynamic programming. 13. The method of claim 8 , wherein q ϕ′ =q θ′ , where q θ′ is a stationary distribution created from a stale copy θ′ of the parameters θ of the neural network. 14. The method of claim 7 , wherein training the neural network comprises: sampling a particular latent alignment a˜q ϕ′ for a particular input sequence x; sampling a particular partial latent alignment ã˜r(a) by sampling a particular masking permutation from r and applying the particular masking permutation to a; processing the particular partial latent alignment ã and the particular input sequence x using the neural network to generate a prediction; computing the objective function; computing an error in the prediction using the computed objective function; backpropagating the error through the neural network to determine an update to the parameters θ of the neural network. 15. The method of claim 14 , wherein the objective function is computed using dynamic programming. 16. The method of claim 7 , wherein r(a) is a Bernoulli or Uniform distribution. 17. The method of claim 1 , wherein selecting an input position in each block comprises computing an arg max input position for each block in parallel across all blocks. 18. The method of claim 1 , wh

Assignees

Google Llc

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title
G06F40/284
Lexical analysis, e.g. tokenisation or collocates · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/047
Probabilistic or stochastic networks · CPC title

Patent family

Related publications grouped by family.

View patent family 74860400

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12242818B2 cover?: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for sequence modeling. One of the methods includes receiving an input sequence having a plurality of input positions; determining a plurality of blocks of consecutive input positions; processing the input sequence using a neural network to generate a latent alignment, comprising, at each of a plurali…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G06F40/47. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 04 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).