What technology area does this patent fall under?

Primary CPC classification G06N3/048. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Apr 24 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Routing to expert subnetworks in mixture-of-experts neural networks

Patent metadata
Field	Value
Publication number	US-2025131251-A1
Application number	US-202318834070-A
Country	US
Kind code	A1
Filing date	Jan 30, 2023
Priority date	Jan 28, 2022
Publication date	Apr 24, 2025
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing a machine learning task on a network input to generate a network output. In one aspect, one of the systems includes a neural network configured to perform the machine learning task, the neural network including one or more expert neural network blocks that each include router that performs expert-choice routing between multiple expert neural networks.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to implement a neural network that is configured to process a network input and to generate a network output for the network input, the neural network comprising a sequence of one or more network blocks, the sequence comprising one or more expert network blocks configured to perform operations comprising: obtaining a block input that represents an intermediate representation of the network input, the block input comprising a plurality of elements; determining a plurality of sub-inputs from the block input, each sub-input comprising a respective different subset of the plurality of elements of the block input; for each of a plurality of expert subnetworks of the expert network block: processing the plurality of sub-inputs to generate a respective score for each sub-input; selecting one or more of the sub-inputs according to the respective scores; and for each selected sub-input, processing the selected sub-input using the expert subnetwork to generate a respective sub-output; for each of the plurality of sub-inputs, processing the sub-outputs corresponding to the sub-input generated by respective expert subnetworks to generate a combined sub-output for the sub-input; and generating a block output by combining the respective combined sub-outputs for the plurality of sub-inputs. 2 . The system of claim 1 , wherein each expert subnetwork is configured to process a same number of sub-inputs. 3 . The system of claim 2 , wherein the same number k of sub-inputs processed by each expert subnetwork is equal to: k = l · c e wherein I is a number of sub-inputs in the block input, e is a number of expert subnetworks in the expert network block, and c is a hyperparameter of the neural network representing an average number of sub-inputs to be processed per expert subnetwork. 4 . The system of claim 1 , wherein, for each expert subnetwork, processing the plurality of sub-inputs to generate a respective score for each sub-input comprises computing: S =Softmax( X·W g ) wherein X∈ l×d is a matrix that includes a respective row corresponding to each sub-input, l is a number of sub-inputs in the block input, d is a dimensionality of each sub-input, W g ∈ d×s is a matrix that includes a respective column corresponding to each expert subnetwork, and e is a number of expert subnetworks in the expert network block. 5 . The system of claim 4 , wherein, for each expert subnetwork, selecting one or more of the sub-inputs according to the respective scores comprises computing: G,I =TopK( S T ,k ) P =Onehot( I ) wherein k is a number of sub-inputs selected by each expert subnetwork, I∈ e×k is a matrix whose (i,j) th element identifies the sub-input that has the f th -largest score for the i th expert subnetwork, and G∈ e×k is a matrix whose (i,j) th element represents the score of the sub-input that has the j th -largest score for the i th expert subnetwork, and P∈ e×k×l is a one-hot matrix whose (i,f,m) th element is equal to one if the m th sub-input has the j th -largest score for the i th expert subnetwork and zero otherwise. 6 . The system of claim 1 , wherein each sub-input is processed by at most a threshold number b different expert subnetworks. 7 . The system of claim 6 , wherein for each expert subnetwork, selecting one or more of the sub-inputs according to the respective scores comprises: computing: max A 〈 S ⊤ , A 〉 + λ ⁢ H ⁡ ( A ) ⁢ s . t . ∀ i : ∑ j ′ A [ i , j ′ ] = k ⁢ ∀ j : ∑ i ′ A [ i ′ , j ] ≤ b ⁢ ∀ i , j : 0 ≤ A [ i , j ] ≤ 1 wherein (S T ,A) represents an inner product between S T and A, and wherein H(A)=Σ ij −A[i,j] log A [i,j]; and computing: G,I =TopK( A,k ) P =Onehot( I ) wherein k is a number of sub-inputs selected by each expert subnetwork, J∈ e×k is a matrix whose (i,j) th element identifies the sub-input that has the j th -largest score for the i th expert subnetwork, and G∈ e×k is a matrix whose (i,j) th element represents the score of the sub-input that has the j th -largest score fo

Assignees

Google Llc

Inventors

Classifications

G06N3/048Primary
Activation functions · CPC title
G06N3/098
Distributed learning, e.g. federated learning · CPC title
G06N3/0895
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
G06N3/084
Backpropagation, e.g. using gradient descent · CPC title
G06N3/042Primary
Knowledge-based neural networks; Logical representations of neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 85410468

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025131251A1 cover?: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing a machine learning task on a network input to generate a network output. In one aspect, one of the systems includes a neural network configured to perform the machine learning task, the neural network including one or more expert neural network blocks that each include router that p…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G06N3/048. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Apr 24 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).