System and method enabling one-hot neural networks on a machine learning compute platform
US-2020159534-A1 · May 21, 2020 · US
US12039439B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12039439-B2 |
| Application number | US-202017129038-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 21, 2020 |
| Priority date | Sep 20, 2017 |
| Publication date | Jul 16, 2024 |
| Grant date | Jul 16, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An overall gradient vector is computed at a server from a set of ISA vectors corresponding to a set of worker machines. An ISA vector of a worker machine including ISA instructions corresponding to a set of gradients, each gradient corresponding to a weight of a node of a neural network being distributedly trained in the worker machine. A set of register values is optimized for use in an approximation computation with an opcode to produce an x-th approximate gradient of an x-th gradient. A server ISA vector is constructed in which a server ISA instruction in an x-th position corresponds to the x-th gradient in the overall gradient vector. A processor at the worker machine is caused to update a set of weights of the neural network, using the set of optimized register values and the server ISA vector, thereby completing one iteration of training.
Opening claim text (preview).
What is claimed is: 1. A method comprising: computing, using a processor and a memory at a parameter server, an overall gradient vector from a set of instruction set architecture (ISA) vectors corresponding to a set of worker machines, a first ISA vector of a first worker machine comprising a first set of ISA instructions corresponding to a first set of gradients, each gradient in the first set of gradients corresponding to a weight of a node of a first neural network instance being distributedly trained in the first worker machine; optimizing a set of register values such that when an optimized register value in the set of optimized register values is used in an approximation computation with an opcode from a set of opcodes the approximation computation produces an x-th approximate gradient that is within a tolerance value of an actual value of an x-th gradient in the overall gradient vector; constructing a server ISA vector, wherein in the server ISA vector, a server ISA instruction in an x-th position in the server ISA vector corresponds to the x-th gradient in the overall gradient vector; and causing a processor and a memory at the first worker machine to update a set of weights of a set of nodes of the first neural network instance being distributedly trained in the first worker machine, the set of weights being updated using the set of optimized register values and the server ISA vector. 2. The method of claim 1 , further comprising: configuring, before initiating a first iteration of training a distributed set of neural network instances, the set of opcodes in each worker machine of the set of worker machines, wherein each worker machine in the set of worker machines trains one neural network instance from the set of distributed neural network instances, each neural network instance in the set of distributed neural network instances being identical, and wherein different neural network instances in different worker machines are subjected to different training inputs; and initializing the set of register values in each worker machine of the set of worker machines. 3. The method of claim 1 , further comprising: transmitting the optimized set of register values and the server ISA vector to the set of worker machines, the transmitting causing: computing, using the processor and the memory of the first worker machine, a server gradient vector, the server gradient vector comprising a set of approximate gradients corresponding to the set of nodes, the set of approximate gradients including the x-th approximate gradient; and adding, using the processor and the memory of the first worker machine, the x-th approximate gradient to a previous x-th weight of an x-th node in the set of nodes of the first neural network instance at the first worker machine, the previous x-th weight being included in the set of weights, the adding being a part of the updating the set of weights, the adding forming an updated set of weights. 4. The method of claim 1 , wherein updating the set of weights completes one iteration of distributed training of the first neural network instance and forms an iteration-trained first neural network instance, further comprising: transmitting the optimized set of register values and the server ISA vector to the set of worker machines, the transmitting causing: subjecting the iteration-trained first neural network instance to a new training input; computing a y-th gradient corresponding to a y-th weight of a y-th node of the iteration-trained first neural network instance; constructing a y-th ISA instruction in a new first ISA vector, the y-th ISA instruction comprising an opcode from the set of opcodes and an optimized register value from the set of optimized register values; and transmitting the new first ISA vector to the parameter server. 5. The method of claim 1 , further comprising: forming, as a part of constructing the server ISA vector, the server ISA instruction by combining an opcode from the set of opcodes and an optimized register value from the set of optimized register values. 6. The method of claim 1 , further comprising: computing, using the processor and the memory of the parameter server, a server gradient vector, the server gradient vector comprising a set of approximate gradients, the set of approximate gradients including the x-th approximate gradient; and adding, using the processor and the memory of the parameter server, the x-th approximate gradient to a previous x-th weight used by each x-th node in each set of nodes of each neural network instance in the set of worker machines, the adding forming an updated set of weights. 7. The method of claim 1 , wherein the server ISA instruction comprises: a first number of bytes that is less than a second number of bytes needed to represent the gradient in the first set of gradients, wherein a set of bits corresponding to the first number of bytes is divided into a first subset of bits and a second subset of bits, wherein the first subset of bits representing an opcode from the set of opcodes, and wherein the second subset of bits representing an index into a bank of registers, the bank of registers holding the set of optimized register values. 8. The method of claim 7 , wherein the set of opcodes includes as many opcodes as can be represented by the first subset of bits, and wherein the set of opcodes is a selected subset of a second set of opcodes. 9. The method of claim 7 , wherein the bank of registers includes as many registers as can be represented by the second subset of bits, and wherein the bank of registers is a selected subset of a second bank of registers. 10. The method of claim 1 , further comprising: concluding training the neural network instances responsive to determining that an n-th iteration overall gradient vector includes at least a threshold number of gradients that are within a specified tolerance of corresponding gradients in an (n−1)-th iteration gradient. 11. A computer usable program product comprising one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storage devices, the stored program instructions comprising: program instructions to compute, using a processor and a memory at a parameter server, an overall gradient vector from a set of instruction set architecture (ISA) vectors corresponding to a set of worker machines, a first ISA vector of a first worker machine comprising a first set of ISA instructions corresponding to a first set of gradients, each gradient in the first set of gradients corresponding to a weight of a node of a first neural network instance being distributedly trained in the first worker machine; program instructions to optimize a set of register values such that when an optimized register value in the set of optimized register values is used in an approximation computation with an opcode from a set of opcodes the approximation computation produces an x-th approximate gradient that is within a tolerance value of an actual value of an x-th gradient in the overall gradient vector; program instructions to construct a server ISA vector, wherein in the server ISA vector, a server ISA instruction in an x-th position in the server ISA vector corresponds to the x-th gradient in the overall gradient vector; and program instructions to cause a processor and a memory at the first worker machine to update a set of weights of a set of nodes of the first neural network instance being distributedly trained in the first worker machine, the set of weights being updated using the set of optimized register values and the server ISA vector. 12. The computer usable program product
Distributed learning, e.g. federated learning · CPC title
Supervised learning · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Multiprogramming arrangements · CPC title
Arrangements for executing machine instructions, e.g. instruction decode (for executing microinstructions G06F9/22) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.