Compound model scaling for neural networks

US10909457B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10909457-B2
Application numberUS-202016751081-A
CountryUS
Kind codeB2
Filing dateJan 23, 2020
Priority dateJan 23, 2019
Publication dateFeb 2, 2021
Grant dateFeb 2, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for determining a final architecture for a neural network to perform a particular machine learning task is described. The method includes receiving a baseline architecture for the neural network, wherein the baseline architecture has a network width dimension, a network depth dimension, and a resolution dimension; receiving data defining a compound coefficient that controls extra computational resources used for scaling the baseline architecture; performing a search to determine a baseline width, depth and resolution coefficient that specify how to assign the extra computational resources to the network width, depth and resolution dimensions of the baseline architecture, respectively; determining a width, depth and resolution coefficient based on the baseline width, depth, and resolution coefficient and the compound coefficient; and generating the final architecture that scales the network width, network depth, and resolution dimensions of the baseline architecture based on the corresponding width, depth, and resolution coefficients.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of determining a final architecture for a neural network to perform a particular machine learning task, the method comprising: receiving a baseline architecture for the neural network, wherein the baseline architecture has been trained to perform the particular machine learning task, and wherein the baseline architecture has a network width dimension, a network depth dimension, and a resolution dimension; receiving data defining a compound coefficient that controls extra computational resources used for scaling the baseline architecture; performing a search to determine a baseline width coefficient, a baseline depth coefficient, and a baseline resolution coefficient that specify how to assign the extra computational resources to the network width dimension, the network depth dimension, and the resolution dimension of the baseline architecture, respectively; determining a width coefficient, a depth coefficient, and a resolution coefficient based on the baseline width coefficient, the baseline depth coefficient, the baseline resolution coefficient and the compound coefficient, comprising: generating the width coefficient based on the compound coefficient and the baseline width coefficient, generating the depth coefficient based on the compound coefficient and the baseline depth coefficient, and generating the resolution coefficient based on the compound coefficient and the baseline resolution coefficient; and generating the final architecture that scales the network width, network depth, and resolution dimensions of the baseline architecture based on the corresponding width, depth, and resolution coefficients. 2. The method of claim 1 , wherein the baseline architecture has a plurality of network stages and each of the plurality of network stages has a plurality of neural network layers. 3. The method of claim 2 , wherein the plurality of neural network layers in each network stage of the baseline architecture share the same architecture. 4. The method of claim 2 , wherein the network depth dimension of the baseline architecture is a set of numbers of layers in the plurality of network stages of the baseline architecture. 5. The method of claim 2 , wherein each neural network layer in the baseline architecture is configured to receive an input tensor from a previous layer and to generate, for the input tensor, an output tensor to be fed as input to the next neural network layer, wherein the input tensor has a height dimension, a width dimension, and a channel dimension that specifies a number of channels in the input tensor. 6. The method of claim 5 , wherein the network width dimension of the baseline architecture is a set of numbers of input channels associated with input tensors to the plurality of neural network layers of the baseline architecture. 7. The method of claim 5 , wherein the resolution dimension of the baseline architecture is a set of height dimensions and width dimensions of input tensors to the plurality of neural network layers of the baseline architecture. 8. The method of claim 1 , wherein generating the width coefficient based on the compound coefficient and the baseline width coefficient comprises: summing a constant and a product of the baseline width coefficient and the compound coefficient. 9. The method of claim 1 , wherein generating the depth coefficient based on the compound coefficient and the baseline depth coefficient comprises: summing the constant and a product of the baseline depth coefficient and the compound coefficient. 10. The method of claim 1 , wherein generating the resolution coefficient based on the compound coefficient and the baseline resolution coefficient comprises: summing the constant and a product of the baseline resolution coefficient and the compound coefficient. 11. The method of claim 10 , wherein generating the final architecture comprises: scaling the network width dimension of the baseline architecture by the width coefficient; scaling the network depth dimension of the baseline architecture by the depth coefficient; and scaling the resolution of the baseline architecture by the resolution coefficient. 12. The method of claim 1 , wherein performing the search to determine the baseline width coefficient, the baseline depth coefficient, and the baseline resolution coefficient comprises: performing a grid search on a range of values for each coefficient while using the compound coefficient to determine the baseline width coefficient, the baseline depth coefficient, and the baseline resolution coefficient. 13. The method of claim 1 , further comprising: determining a performance score representing performance of the final architecture on the particular machine learning task, comprising: training the final architecture on the particular machine learning task to update values of parameters of the final architecture, and determining the performance of the trained final architecture on the particular machine learning task. 14. The method of claim 1 , wherein the received data further includes target resource usage data that specifies (i) a target memory size that indicates the maximum memory size allowed for creating the final architecture, and (ii) a target number of operations that indicates the maximum number of operations that the final architecture can execute to perform the particular machine learning task. 15. The method of claim 1 , wherein generating the width coefficient based on the compound coefficient and the baseline width coefficient comprise: raising the baseline width coefficient to the power of the compound coefficient, wherein generating the depth coefficient based on the compound coefficient and the baseline depth coefficient comprises: raising the baseline depth coefficient to the power of the compound coefficient, and wherein generating the resolution coefficient based on the compound coefficient and the baseline resolution coefficient comprises: raising the baseline resolution coefficient to the power of the compound coefficient. 16. A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising: receiving a baseline architecture for the neural network, wherein the baseline architecture has been trained to perform the particular machine learning task, and wherein the baseline architecture has a network width dimension, a network depth dimension, and a resolution dimension; receiving data defining a compound coefficient that controls extra computational resources used for scaling the baseline architecture; performing a search to determine a baseline width coefficient, a baseline depth coefficient, and a baseline resolution coefficient that specify how to assign the extra computational resources to the network width dimension, the network depth dimension, and the resolution dimension of the baseline architecture, respectively; determining a width coefficient, a depth coefficient, and a resolution coefficient based on the baseline width coefficient, the baseline depth coefficient, the baseline resolution coefficient and the compound coefficient, comprising: generating the width coefficient based on the compound coefficient and the baseline width coefficient, generating the depth coefficient based on the compound coefficient and the baseline depth coefficient, and generating the resolution coefficient based on the compound coefficient and the baseline resolution coefficient; and generating the final architecture

Assignees

Inventors

Classifications

  • G06N3/082Primary

    modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

  • G06N3/045Primary

    Combinations of networks · CPC title

  • Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10909457B2 cover?
A method for determining a final architecture for a neural network to perform a particular machine learning task is described. The method includes receiving a baseline architecture for the neural network, wherein the baseline architecture has a network width dimension, a network depth dimension, and a resolution dimension; receiving data defining a compound coefficient that controls extra compu…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/082. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 02 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).