Data splitting by gradient direction for neural networks
US-2020134451-A1 · Apr 30, 2020 · US
US12511528B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12511528-B2 |
| Application number | US-201916249279-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 16, 2019 |
| Priority date | Jul 4, 2018 |
| Publication date | Dec 30, 2025 |
| Grant date | Dec 30, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A neural network method and apparatus are provided. A processor implemented neural network includes calculating respective individual gradient values for updating a weight of a neural network, calculating a residual gradient value based on an accumulated gradient value obtained by accumulating the individual gradient values and a bit digit representing the weight, tuning the respective individual gradient values to correspond to a bit digit of the residual gradient value, summing the tuned respective individual gradient values, the residual gradient value, and the weight, and updating the weight and the residual gradient value based on a result of the summing to train the neural network.
Opening claim text (preview).
What is claimed is: 1 . A processor-implemented neural network method, the method comprising: generating one or more respective individual gradient values for training a neural network by updating a weight of the neural network; tuning the one or more respective individual gradient values to correspond to bit digits of a residual gradient value, where each respective individual gradient value has respective bit digits; generating an intermediate summation value using the tuned one or more respective individual gradient values and the residual gradient value; summing the weight and the intermediate summation value; generating an updated residual gradient value by updating the residual gradient value to be a portion of the intermediate summation value not overlapping bit digits of the weight and storing the updated residual gradient value in an accumulation buffer; generating an updated weight by updating the weight with a portion of the intermediate summation value overlapping the bit digits of the weight, in response to the intermediate summation value overlapping bit digits of the weight, to train the neural network; and generating a trained neural network by training the neural network using the updated residual gradient value and updated weight, wherein the residual gradient value is dependent on an accumulating of one or more previous individual gradient values for updating the weight in a previous time. 2 . The method of claim 1 , wherein the updating of the residual gradient value comprises: determining an effective gradient value dependent on the result of the summing, where the effective gradient value has a value divisible by the least significant bit digit of the weight; and updating the residual gradient value by subtracting the effective gradient value from the result of the summing. 3 . The method of claim 1 , wherein the tuning of the one or more respective individual gradient values comprises: quantizing each of the one or more respective individual gradient values, including omitting respective values of the one or more respective individual gradient values that are less than a least significant bit digit of the residual gradient value; and padding each of the quantized one or more respective individual gradient values, wherein a value up to a bit digit corresponding to a most significant bit digit of the residual gradient value is present in each padded quantized one or more respective individual gradient values. 4 . The method of claim 1 , wherein the summing comprises: mapping the tuned one or more respective individual gradient values and the residual gradient value based on a set bit number, and calculating the intermediate summation value based on the mapped tuned one or more respective individual gradient values and the mapped residual gradient value; and mapping the weight based on the set bit number and summing the intermediate summation value and the weight. 5 . The method of claim 1 , wherein the summing comprises: padding the tuned one or more respective individual gradient values, the residual gradient value, and the weight; and summing the padded weight and the padded intermediate summation value of the padded tuned one or more respective individual gradient values and the padded residual gradient value. 6 . The method of claim 1 , wherein the updating of the weight comprises updating a bit digit value of a portion of the result of the summing, corresponding to the bit digit representing the weight, to the updated weight, and wherein the updating of the residual gradient value comprises updating a bit digit value of a remaining portion of the result of the summing, not corresponding to the bit digit representing the weight, to the residual gradient value. 7 . The method of claim 1 , further comprising: obtaining a sign bit that is a Most Significant Bit of the result of the summing; and adding the obtained sign bit such that the obtained sign bit is a Most Significant Bit of the updated weight and/or the updated residual gradient value. 8 . A non-transitory computer-readable recording medium having recorded thereon computer readable instructions, which, when executed by one or more processors, performs the method of claim 1 . 9 . A processor-implemented neural network method, the method comprising: generating one or more respective individual gradient values for training a neural network by updating a weight of the neural network; tuning the one or more respective individual gradient values to correspond to bit digits of a residual gradient value, where each respective bit individual gradient value has respective bit digits; generate an intermediate concatenation value by concatenating a remaining value of the residual gradient value, excluding a sign bit, to the weight; summing the tuned one or more respective individual gradient values and the intermediate concatenation value; generate an updated residual gradient value by updating the residual gradient value to be a portion of the intermediate concatenation value not overlapping bit digits of the weight; generate an updated weight by updating the weight with a portion of the intermediate concatenation value overlapping the bit digits of the weight, in response to a summation of the tuned one or more respective individual gradient values overlapping bit digits of the weight, to train the neural network; and generating a trained neural network by training the neural network using the updated residual gradient value and updated weight, wherein the residual gradient value is dependent on an accumulating of one or more previous individual gradient values for updating the weight in a previous time. 10 . The method of claim 9 , wherein the updating of the residual gradient value comprises: determining an effective gradient value dependent on the result of the summing, where the effective gradient value has a value divisible by the least significant bit digit of the weight; and updating the residual gradient value by subtracting the effective gradient value from the result of the summing. 11 . The method of claim 9 , wherein the tuning of the one or more respective individual gradient values comprises: quantizing each of the one or more respective individual gradient values, including omitting respective values of the one or more respective individual gradient values that are less than a least significant bit digit of the residual gradient value; and padding each of the quantized one or more respective individual gradient values, wherein a value up to a bit digit corresponding to a most significant bit digit of the residual gradient value is present in each padded quantized one or more respective individual gradient values. 12 . The method of claim 9 , wherein the summing comprises: mapping the tuned one or more respective individual gradient values and the intermediate concatenation value based on a set bit number, and summing the mapped tuned one or more respective individual gradient values and the mapped intermediate concatenation value. 13 . A non-transitory computer-readable recording medium having recorded thereon computer readable instructions, which, when executed by one or more processors, causes the one or more processors to perform the method of claim 9 . 14 . The method of claim 9 , wherein the summing comprises: padding the tuned one or more respective individual gradient values and the intermediate concatenation value; and summing the padded tuned one or more respective individual gradient values and the padded intermediate concatenation value. 15 .
Architecture, e.g. interconnection topology · CPC title
Adding; Subtracting (G06F7/483 - G06F7/491, G06F7/544 - G06F7/556 take precedence) · CPC title
Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title
Learning methods · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.