What technology area does this patent fall under?

Primary CPC classification G10L15/16. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 29 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Convolution-augmented transformer models

US12373666B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12373666-B2
Application number	US-202418766038-A
Country	US
Kind code	B2
Filing date	Jul 8, 2024
Priority date	Dec 31, 2020
Publication date	Jul 29, 2025
Grant date	Jul 29, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods can utilize a conformer model to process a data set for various data processing tasks, including, but not limited to, speech recognition, sound separation, protein synthesis determination, video or other image set analysis, and natural language processing. The conformer model can use feed-forward blocks, a self-attention block, and a convolution block to process data to learn global interactions and relative-offset-based local correlations of the input data.

First claim

Opening claim text (preview).

What is claimed is: 1. A computing system for efficiently processing data which accounts for both local and global dependencies, comprising: one or more processors; one or more non-transitory computer-readable media that collectively store: a machine-learned conformer model, wherein the machine-learned conformer model comprises: a first half-step feed-forward block configured to process a block input to generate a first feed-forward output, wherein the first half-step feed-forward block comprises half-step residual weights; a self-attention block configured to perform self-attention to process the first feed-forward output to generate an attention output; a convolutional block configured to receive and process the attention output of the self-attention block to generate a convolutional output; and a second half-step feed-forward block configured to process the convolutional output of the convolutional block to generate a second feed-forward output, wherein the second half-step feed-forward block comprises half-step residual weights; and instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising: obtaining input data; and processing the input data with the machine-learned conformer model to generate output data, wherein processing the input data with the machine-learned conformer model comprises determining position-wise local features and content-based global interactions based on processing the block input associated with the input data with the first half-step feed-forward block followed by processing with the self-attention block, the convolutional block, and the second half-step feed-forward block. 2. The system of claim 1 , wherein the machine-learned conformer model further comprises: an audio encoder configured to encode the input data to generate the block input. 3. The system of claim 2 , wherein the audio encoder comprises convolution subsampling layer. 4. The system of claim 1 , wherein processing the input data with the machine-learned conformer model to generate the output data comprises: processing the input data with the first half-step feed-forward block to generate the first feed-forward output; processing the first feed-forward output with the self-attention block to generate the attention output; processing the attention output with the convolutional block to generate the convolutional output; processing the convolutional output with the second half-step feed-forward block to generate the second feed-forward output; and generating the output data based on the second feed-forward output. 5. The system of claim 4 , wherein processing the input data with the machine-learned conformer model to generate the output data further comprises: adding variational noise to perform regularization. 6. The system of claim 1 , wherein the machine-learned conformer model was trained on labeled speech data. 7. The system of claim 6 , wherein the machine-learned conformer model was further trained on an additional dataset comprising a text-only corpus. 8. The system of claim 1 , wherein the machine-learned conformer model further comprises a single layer decoder. 9. The system of claim 8 , wherein the single layer decoder comprises a long short-term memory recurrent neural network. 10. The system of claim 1 , wherein the convolutional block comprises a layer normalization block, a first pointwise convolution block, a plurality of activation blocks, a depthwise convolution block, a second pointwise convolution block, and a dropout block. 11. A computer-implemented method for efficiently processing data which accounts for both local and global dependencies, the method comprising: obtaining, by a computing system comprising one or more processors, input data; processing, by the computing system, the input data with a conformer model, wherein the conformer model comprises: a first half-step feed-forward block configured to process a block input to generate a first feed-forward output, wherein the first half-step feed-forward block comprises half-step residual weights; a self-attention block configured to perform self-attention to process the first feed-forward output to generate an attention output; a convolutional block configured to receive and process the attention output of the self-attention block to generate a convolutional output; and a second half-step feed-forward block configured to process the convolutional output of the convolutional block to generate a second feed-forward output, wherein the second half-step feed-forward block comprises half-step residual weights; wherein processing the input data with the conformer model comprises determining position-wise local features and content-based global interactions based on processing the block input associated with the input data with the first half-step feed-forward block followed by processing with the self-attention block, the convolutional block, and the second half-step feed-forward block; and in response to processing the input data with the conformer model, generating, by the computing system, an output data. 12. The method of claim 11 , wherein the input data comprises audio data, and wherein the output data comprises text data descriptive of speech recognition for the audio data and further comprises sound separation data for the audio data. 13. The method of claim 11 , wherein the output data is generated based on determining global interactions and local correlations from the input data. 14. The method of claim 13 , wherein the attention output is descriptive of the global interactions determined by the self-attention block. 15. The method of claim 13 , wherein the convolutional output is descriptive of the local correlations determined by the convolutional block. 16. The method of claim 11 , wherein the input data comprises spectrograph data descriptive of human speech, and the output data comprises text data descriptive of speech recognized data for the human speech. 17. One or more non-transitory computer-readable media that collectively store instructions that, when executed by one or more computing devices, cause the one or more computing devices to perform operations, the operations comprising: accessing data descriptive of a machine-learned conformer model that comprises one or more conformer blocks, each of the one or more conformer blocks configured to process a block input to generate a block output, each of the one or more conformer blocks comprising: a first half-step feed-forward block configured to process the block input to generate a first feed-forward output, wherein the first half-step feed-forward block comprises half-step residual weights; a self-attention block configured to perform self-attention to process the first feed-forward output to generate an attention output; a convolutional block configured to receive and process the attention output of the self-attention block to generate a convolutional output; and a second half-step feed-forward block configured to process the convolutional output of the convolutional block to generate a second feed-forward output, wherein the second half-step feed-forward block comprises half-step residual weights; and obtaining input data; and processing the input data with the machine-learned conformer model to generate output data, wherein processing the input data with the machine-learned conformer model comprises determining position-wise local features and content-based global interactions based on processing the block input associate

Assignees

Google Llc

Inventors

Classifications

G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G10L15/16Primary
using artificial neural networks · CPC title
G06N20/00
Machine learning · CPC title

Patent family

Related publications grouped by family.

View patent family 82119226

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12373666B2 cover?: Systems and methods can utilize a conformer model to process a data set for various data processing tasks, including, but not limited to, speech recognition, sound separation, protein synthesis determination, video or other image set analysis, and natural language processing. The conformer model can use feed-forward blocks, a self-attention block, and a convolution block to process data to lear…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 29 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Long-context End-to-end Speech Recognition System

Designing and folding structural proteins from the primary amino acid sequence

Multistream acoustic models with dilations

Frequently asked questions