Decomposition of weight tensors in network with value quantization

US12061981B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-12061981-B1
Application numberUS-202017089660-A
CountryUS
Kind codeB1
Filing dateNov 4, 2020
Priority dateAug 13, 2020
Publication dateAug 13, 2024
Grant dateAug 13, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Some embodiments provide a method for training parameters of a network. the method receives a machine-trained (MT) network with multiple layers of computation nodes. Each computation node of a set of the layers computes an output value based on a set of input values and a set of trained weight values. A first layer of the MT network includes a first number of filters. The method replaces the first layer with (i) a second layer having a second number of filters that is less than the first number of filters and (ii) a third layer having the first number of filters. Output values of computation nodes of the second layer are quantized and the third layer using the quantized output values of the second layer as input values.

First claim

Opening claim text (preview).

We claim: 1. A method for training a plurality of parameters of a network, the method comprising: receiving a machine-trained (MT) network comprising a plurality of layers of computation nodes, wherein each computation node of a set of the layers computes an output value based on a set of input values and a set of trained weight values, wherein a first layer of the MT network comprises a first number of filters; for each of a number of training iterations: determining a decomposition of the first layer into (i) a second layer having a second number of filters that is less than the first number of filters, wherein output values of computation nodes of the second layer are quantized, and (ii) a third layer having the first number of filters, the third layer using the quantized output values of the second layer as input values; and after determining the decomposition of the first layer, retraining the weight values of the MT network, including the first layer, with a loss function that includes (i) a first term measuring a difference between expected outputs and generated outputs and (ii) a second term that biases the weight values of the first layer towards a matrix representing the weight values of the second and third layers; and after a last training iteration, replacing the trained first layer with the second and third layers using the decomposition determined during the last training iteration, the MT network with the second and third layers for execution by an inference circuit that uses quantized output values. 2. The method of claim 1 , wherein for each training iteration, each filter of the second layer comprises a same number of weight values as each filter of the first layer and each filter of the third layer comprises a smaller number of weight values. 3. The method of claim 2 , wherein the third layer for each training iteration is a 1×1 convolutional layer, such that a number of weight values in each filter of the third layer is equal to the second number of filters in the second layer. 4. The method of claim 2 , wherein each filter of the first layer has an associated stride and zero-padding, wherein each filter of the second layer for each training iteration has the same associated stride and zero-padding. 5. The method of claim 4 , wherein each filter in the third layer for each training iteration is a 1×1 convolutional filter with a stride of 1 and no zero-padding. 6. The method of claim 1 , wherein: the first layer is trained with floating-point weight values; the weight values of the second and third layer, for each training iteration, are ternary weight values; and determining the decomposition of the first layer, for each training iteration, comprises decomposing the floating-point weight values of the first layer into (i) the ternary weight values of the second layer, (ii) a first set of scale values for the second layer, (iii) the ternary weight values of the third layer, and (iv) a second set of scale values for the third layer. 7. The method of claim 6 , wherein decomposing the floating-point values of the first layer comprises performing singular value decomposition. 8. The method of claim 7 , wherein the decomposition accounts for (i) relative importances of each weight value in the filters of the first layer and (ii) a sparsity requirement that defines a minimum number of weight values that are set to zero. 9. The method of claim 1 , wherein the MT network is for execution by a neural network inference circuit that (i) uses quantized weight values and (ii) quantizes computation node output values to a particular number of bits. 10. The method of claim 1 , wherein retraining the weight values of the MT network comprises: forward propagating a plurality of inputs through the MT network to generate the generated outputs; computing a value for the loss function; and backpropagating the computed value for the loss function to modify the weight values. 11. The method of claim 10 , wherein noise is added to outputs of the first layer during forward propagation to simulate the quantization of the output values of the second layer. 12. The method of claim 11 further comprising determining a scale for the noise. 13. The method of claim 12 , wherein the scale is based on portions of the input values to the first layer. 14. The method of claim 1 , wherein the second number of filters that is less than the first number of filters changes between at least first and second training iterations. 15. A non-transitory machine-readable medium storing a program which when executed by at least one processing unit trains a plurality of parameters of a network, the program comprising sets of instructions for: receiving a machine-trained (MT) network comprising a plurality of layers of computation nodes, wherein each computation node of a set of the layers computes an output value based on a set of input values and a set of trained weight values, wherein a first layer of the MT network comprises a first number of filters; for each of a number of training iterations: determining a decomposition of the first layer into (i) a second layer having a second number of filters that is less than the first number of filters, wherein output values of computation nodes of the second layer are quantized, and (ii) a third layer having the first number of filters, the third layer using the quantized output values of the second layer as input values; and after determining the decomposition of the first layer, retraining the weight values of the MT network, including the first layer, with a loss function that includes (i) a first term measuring a difference between expected outputs and generated outputs and (ii) a second term that biases the weight values of the first layer towards a matrix representing the weight values of the second and third layers; and after a last training iteration, replacing the trained first layer with the second and third layers using the decomposition determined during the last training iteration, the MT network with the second and third layers for execution by an inference circuit that uses quantized output values. 16. The non-transitory machine-readable medium of claim 15 , wherein for each training iteration, each filter of the second layer comprises a same number of weight values as each filter of the first layer and each filter of the third layer comprises a smaller number of weight values. 17. The non-transitory machine-readable medium of claim 16 , wherein: each filter of the first layer has an associated stride and zero-padding; each filter of the second layer for each training iteration has the same associated stride and zero-padding; and each filter in the third layer for each training iteration is a 1×1 convolutional filter with a stride of 1 and no zero-padding, such that a number of weight values in each filter of the third layer is equal to the second number of filters in the second layer. 18. The non-transitory machine-readable medium of claim 15 , wherein: the first layer is trained with floating-point weight values; the weight values of the second and third layer, for each training iteration, are ternary weight values; and the set of instructions for determining the decomposition of the first layer, for each training iteration, comprises a set of instructions for decomposing the floating-point weight values of the first layer into (i) the ternary weight values of the second layer, (ii) a first set of scale values for the second layer, (iii) the ternary weight values of the third layer, and (iv) a second set

Assignees

Inventors

Classifications

  • using electronic means · CPC title

  • Combinations of networks · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12061981B1 cover?
Some embodiments provide a method for training parameters of a network. the method receives a machine-trained (MT) network with multiple layers of computation nodes. Each computation node of a set of the layers computes an output value based on a set of input values and a set of trained weight values. A first layer of the MT network includes a first number of filters. The method replaces the fi…
Who is the assignee on this patent?
Perceive Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).