Learning coach for machine learning system
US-2020184337-A1 · Jun 11, 2020 · US
US12505349B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12505349-B2 |
| Application number | US-202217572625-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 10, 2022 |
| Priority date | Dec 12, 2017 |
| Publication date | Dec 23, 2025 |
| Grant date | Dec 23, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A technique to prune weights of a neural network using an analytic threshold function h(w) provides a neural network having weights that have been optimally pruned. The neural network includes a plurality of layers in which each layer includes a set of weights w associated with the layer that enhance a speed performance of the neural network, an accuracy of the neural network, or a combination thereof. Each set of weights is based on a cost function C that has been minimized by back-propagating an output of the neural network in response to input training data. The cost function C is also minimized based on a derivative of the cost function C with respect to a first parameter of the analytic threshold function h(w) and on a derivative of the cost function C with respect to a second parameter of the analytic threshold function h(w).
Opening claim text (preview).
What is claimed is: 1 . A data-processing device, comprising: a processor; and a memory, the data-processing device being configured as a neural network comprising a plurality of layers, at least one layer of the plurality of layers comprising a convolutional layer, each layer of the plurality of layers comprising a set of weights w associated with the layer that enhance a speed performance of the neural network, an accuracy of the neural network, or a combination thereof, each set of weights being pruned using an analytic threshold function h(w), the analytic threshold function h(w) comprising a first predetermined value for a first set of continuous weight values centered around 0, and a second predetermined value for a second set of continuous weight values and for a third set of continuous weight values, the first predetermined value being different from the second predetermined value, the second set of continuous weight values being different from and greater than the first set of continuous weight values and the third set of continuous weight values being different from and less than the first set of continuous weight values, when graphed the analytic threshold function h(w) comprising a first parameter that sets a sharpness characteristic of a first edge and of a second edge of the analytic threshold function h(w) between the first predetermined value and the second predetermined value, and a second parameter that sets a distance between the first edge and the second edge of the analytic threshold function h(w); and the processor being configured to: determine the neural network is trained based on each set of the weights being pruned; and perform an inference operation using the trained neural network. 2 . The data-processing device of claim 1 , wherein the first predetermined value equals 0 and the second predetermined value equals 1. 3 . The data-processing device of claim 1 , wherein the analytic threshold function h(w) further comprising a first edge between the first set of continuous weight values and the second set of continuous weight values and a second edge between the first set of continuous weight values and the third set of continuous weight values, the sharpness characteristic of the first edge and of the second edge between the first predetermined value and the second predetermined value being based on a value of the first parameter of the analytic threshold function h(w) and the distance between the first and second edges being based on a value of the second parameter of the analytic threshold function h(w). 4 . The data-processing device of claim 3 , wherein the analytic threshold function h(w) is proportional to β and inversely proportional to α, in which α is the first parameter, and β is the second parameter. 5 . The data-processing device of claim 4 , wherein an initial value for the first parameter α and an initial value for the second parameter β is based on a partial second derivative of the analytic threshold function h(w) with respect to w being equal to zero. 6 . The data-processing device of claim 5 , wherein the first parameter α and the second parameter β for each set of weights is based on: a cost function C that is minimized by back-propagating an output of the neural network in response to input training data, the cost function C being based on a number of layers, an index of weights in a final layer and one or more regularization parameters, a derivative of the cost function C with respect to a first parameter of the analytic threshold function h(w), and a derivative of the cost function C with respect to a second parameter of the analytic threshold function h(w). 7 . The data-processing device of claim 6 , wherein the cost function C is minimized based on the derivative of the cost function C with respect to the first parameter α by updating the first parameter α during back-propagating the output through the neural network, and wherein the cost function C is minimized based on the derivative of the cost function C with respect to the second parameter β by updating the second parameter β during back-propagating the output through the neural network. 8 . The data-processing device of claim 7 , wherein the cost function C is further minimized by updating values for weights w of each set of weights during back-propagating the output through the neural network. 9 . The data-processing device of claim 1 , wherein the neural network comprises a deep neural network. 10 . A method to prune weights of a neural network, the method comprising: forming a weight function f(w) for weights w associated with each layer of a plurality of layers of the neural network based on an analytic threshold function h(w), the analytic threshold function h(w) comprising a first predetermined value for a first set of continuous weight values centered around 0, and a second predetermined value for a second set of continuous weight values and for a third set of continuous weight values, the first predetermined value being different from the second predetermined value, the second set of continuous weight values being different from and greater than the first set of continuous weight values and the third set of continuous weight values being different from and less than the first set of continuous weight values, when graphed the analytic threshold function h(w) further comprising a first edge between the first set of continuous weight values and the second set of continuous weight values and a second edge between the first set of continuous weight values and the third set of continuous weight values, a sharpness characteristic of each of the first and second edges between the first predetermined value and the second predetermined value being based on a value of a first parameter of the analytic threshold function h(w) and a distance between the first and second edges being based on a value of a second parameter of the analytic threshold function h(w); inputting training data to the neural network to generate an output based on the training data; back-propagating the output through the neural network; and minimizing a difference between the output and the training data to determine a set of weights w that enhance a speed performance of the neural network, an accuracy of the neural network, or a combination thereof, by minimizing a cost function C based on a derivative of the cost function C with respect to the first parameter and based on a derivative of the cost function C with respect to the second parameter; determining the neural network is trained based on minimizing the difference between the output and the training data; and performing an inference operation using the trained neural network. 11 . The method of claim 10 , wherein the analytic threshold function h(w) is proportional to β and inversely proportional to α, in which α is the first parameter, and β is the second parameter. 12 . The method of claim 11 , further comprising initializing the first parameter α and the second parameter β based on a partial second derivative of the analytic threshold function h(w) with respect to w being equal to zero. 13 . The method of claim 11 , wherein the weight function f(w) comprises weights of the neural network multiplied by the analytic threshold function h(w). 14 . The method of claim 13 , wherein the cost function C is based on a number of layers, an index of weights in a final layer and one or regularization parameters. 15 . The method of claim 14 , wherein minimizing the cost function C based on the derivative of the cost function C with respect to the first paramete
Architecture, e.g. interconnection topology · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Supervised learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.