Sequence modeling using imputation

US12242818B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12242818-B2
Application numberUS-202117797872-A
CountryUS
Kind codeB2
Filing dateFeb 8, 2021
Priority dateFeb 7, 2020
Publication dateMar 4, 2025
Grant dateMar 4, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for sequence modeling. One of the methods includes receiving an input sequence having a plurality of input positions; determining a plurality of blocks of consecutive input positions; processing the input sequence using a neural network to generate a latent alignment, comprising, at each of a plurality of input time steps: receiving a partial latent alignment from a previous input time step; selecting an input position in each block, wherein the token at the selected input position of the partial latent alignment in each block is a mask token; and processing the partial latent alignment and the input sequence using the neural network to generate a new latent alignment, wherein the new latent alignment comprises, at the selected input position in each block, an output token or a blank token; and generating, using the latent alignment, an output sequence.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of generating, from an input sequence having a respective input token at each of a plurality of input positions, an output sequence having a respective output token from a vocabulary of output tokens at each of a plurality of output positions, the method comprising: receiving the input sequence; determining a plurality of blocks, wherein each block comprises a plurality of input tokens having consecutive input positions from the input positions, wherein the input tokens comprise a first token modality associated with audio tokens, text tokens, or image tokens; processing the input sequence using a neural network to generate a latent alignment of the input sequence, wherein the latent alignment comprises, at each of the input positions, either an output token from the vocabulary of output tokens or a blank token, the processing comprising, at each of a plurality of input time steps: receiving a partial latent alignment from a previous input time step, wherein the partial latent alignment comprises, at each of the input positions, one of: an output token, a blank token, or a mask token; selecting an input position in each block, wherein the token at the selected input position of the partial latent alignment in each block is a mask token; and processing i) the partial latent alignment and ii) the input sequence using the neural network to generate a new latent alignment, wherein the new latent alignment comprises, at the selected input position in each block, an output token or a blank token; and generating, using the latent alignment, the output sequence, wherein the output tokens comprise a second token modality associated with audio tokens, text tokens, or image tokens. 2. The method of claim 1 , wherein each block comprises a same number of input tokens, and wherein the same number of input tokens is equal to a number of input time steps. 3. The method of claim 1 , wherein processing i) the partial latent alignment and ii) the input sequence using the neural network to generate a new latent alignment comprises: processing the input sequence using a first embedding subnetwork to generate an input sequence embedding; processing the partial latent alignment using a second embedding subnetwork to generate a partial latent alignment embedding; combining the partial latent alignment embedding and the input sequence embedding to generate a combined embedding; and processing the combined embedding using a self-attention subnetwork to generate the new latent alignment. 4. The method of claim 1 , wherein the first token modality is an audio token modality comprising audio sample tokens and the second token modality is a text token modality comprising text sample tokens. 5. The method of claim 1 , wherein the first token modality is a text token modality comprising text sample in a first language and the second token modality is a text token modality comprising text sample in a second language. 6. The method of claim 1 , wherein processing i) the partial latent alignment and ii) the input sequence using the neural network to generate a new latent alignment comprises: upsampling the input sequence to generate a modified input sequence; and processing i) the partial latent alignment and ii) the modified input sequence using the neural network to generate the new latent alignment. 7. The method of claim 1 , wherein the neural network has been trained by updating parameters θ of the neural network using an objective function that marginalizes over all possible new partial latent alignments that are compatible with a particular partial latent alignment. 8. The method of claim 7 , wherein the objective function is: J DP (θ)= E a˜q ϕ′ [E ã˜r [log Σ a′∈β′(ã, a) p θ ( a′|ã, x )]], where x is the input sequence, a is a particular latent alignment, ã is a particular partial latent alignment of the latent alignment a, ϕ′ is a pseudo-expert policy, q ϕ′ is a distribution over all possible latent alignments of x under the pseudo-expert policy ϕ′, r(a) is a distribution over all possible masking permutations of latent alignments of x, and β′(ã, a) returns a set of all possible new partial latent alignments compatible with the particular partial latent alignment ã drawn from the distribution q ϕ′ ×r. 9. The method of claim 1 , wherein the neural network has been trained by updating parameters θ of the neural network using an objective function that computes a loss according to a pseudo-expert policy. 10. The method of claim 9 , wherein the objective function is: J IM (θ)= E a˜q ϕ′ [E ã˜r [log p θ ( a|ã, x )]], where x is the input sequence, a is a particular latent alignment, ã is a particular partial latent alignment of the latent alignment a, ϕ′ is a pseudo-expert policy, q ϕ′ is a distribution over all possible alignments of x under the pseudo-expert policy ϕ′, and r(a) is a distribution over all possible masking permutations of alignments of x. 11. The method of claim 8 , wherein q ϕ′ =â* ϕ +N, where N is a noise distribution and â* ϕ is a best empirical alignment under an expert policy ϕ, a ^ ϕ ⋆ = arg ⁢ max a ⁢ q ϕ ( a ❘ x , y ) , where q ϕ is a distribution over all possible alignments of x under the expert policy ϕ. 12. The method of claim 11 , wherein â* ϕ is computed using dynamic programming. 13. The method of claim 8 , wherein q ϕ′ =q θ′ , where q θ′ is a stationary distribution created from a stale copy θ′ of the parameters θ of the neural network. 14. The method of claim 7 , wherein training the neural network comprises: sampling a particular latent alignment a˜q ϕ′ for a particular input sequence x; sampling a particular partial latent alignment ã˜r(a) by sampling a particular masking permutation from r and applying the particular masking permutation to a; processing the particular partial latent alignment ã and the particular input sequence x using the neural network to generate a prediction; computing the objective function; computing an error in the prediction using the computed objective function; backpropagating the error through the neural network to determine an update to the parameters θ of the neural network. 15. The method of claim 14 , wherein the objective function is computed using dynamic programming. 16. The method of claim 7 , wherein r(a) is a Bernoulli or Uniform distribution. 17. The method of claim 1 , wherein selecting an input position in each block comprises computing an arg max input position for each block in parallel across all blocks. 18. The method of claim 1 , wh

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Combinations of networks · CPC title

  • Probabilistic or stochastic networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12242818B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for sequence modeling. One of the methods includes receiving an input sequence having a plurality of input positions; determining a plurality of blocks of consecutive input positions; processing the input sequence using a neural network to generate a latent alignment, comprising, at each of a plurali…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/47. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 04 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).