What technology area does this patent fall under?

Primary CPC classification G06N3/098. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 16 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

ISA-based compression in distributed training of neural networks

US12039439B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12039439-B2
Application number	US-202017129038-A
Country	US
Kind code	B2
Filing date	Dec 21, 2020
Priority date	Sep 20, 2017
Publication date	Jul 16, 2024
Grant date	Jul 16, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An overall gradient vector is computed at a server from a set of ISA vectors corresponding to a set of worker machines. An ISA vector of a worker machine including ISA instructions corresponding to a set of gradients, each gradient corresponding to a weight of a node of a neural network being distributedly trained in the worker machine. A set of register values is optimized for use in an approximation computation with an opcode to produce an x-th approximate gradient of an x-th gradient. A server ISA vector is constructed in which a server ISA instruction in an x-th position corresponds to the x-th gradient in the overall gradient vector. A processor at the worker machine is caused to update a set of weights of the neural network, using the set of optimized register values and the server ISA vector, thereby completing one iteration of training.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: computing, using a processor and a memory at a parameter server, an overall gradient vector from a set of instruction set architecture (ISA) vectors corresponding to a set of worker machines, a first ISA vector of a first worker machine comprising a first set of ISA instructions corresponding to a first set of gradients, each gradient in the first set of gradients corresponding to a weight of a node of a first neural network instance being distributedly trained in the first worker machine; optimizing a set of register values such that when an optimized register value in the set of optimized register values is used in an approximation computation with an opcode from a set of opcodes the approximation computation produces an x-th approximate gradient that is within a tolerance value of an actual value of an x-th gradient in the overall gradient vector; constructing a server ISA vector, wherein in the server ISA vector, a server ISA instruction in an x-th position in the server ISA vector corresponds to the x-th gradient in the overall gradient vector; and causing a processor and a memory at the first worker machine to update a set of weights of a set of nodes of the first neural network instance being distributedly trained in the first worker machine, the set of weights being updated using the set of optimized register values and the server ISA vector. 2. The method of claim 1 , further comprising: configuring, before initiating a first iteration of training a distributed set of neural network instances, the set of opcodes in each worker machine of the set of worker machines, wherein each worker machine in the set of worker machines trains one neural network instance from the set of distributed neural network instances, each neural network instance in the set of distributed neural network instances being identical, and wherein different neural network instances in different worker machines are subjected to different training inputs; and initializing the set of register values in each worker machine of the set of worker machines. 3. The method of claim 1 , further comprising: transmitting the optimized set of register values and the server ISA vector to the set of worker machines, the transmitting causing: computing, using the processor and the memory of the first worker machine, a server gradient vector, the server gradient vector comprising a set of approximate gradients corresponding to the set of nodes, the set of approximate gradients including the x-th approximate gradient; and adding, using the processor and the memory of the first worker machine, the x-th approximate gradient to a previous x-th weight of an x-th node in the set of nodes of the first neural network instance at the first worker machine, the previous x-th weight being included in the set of weights, the adding being a part of the updating the set of weights, the adding forming an updated set of weights. 4. The method of claim 1 , wherein updating the set of weights completes one iteration of distributed training of the first neural network instance and forms an iteration-trained first neural network instance, further comprising: transmitting the optimized set of register values and the server ISA vector to the set of worker machines, the transmitting causing: subjecting the iteration-trained first neural network instance to a new training input; computing a y-th gradient corresponding to a y-th weight of a y-th node of the iteration-trained first neural network instance; constructing a y-th ISA instruction in a new first ISA vector, the y-th ISA instruction comprising an opcode from the set of opcodes and an optimized register value from the set of optimized register values; and transmitting the new first ISA vector to the parameter server. 5. The method of claim 1 , further comprising: forming, as a part of constructing the server ISA vector, the server ISA instruction by combining an opcode from the set of opcodes and an optimized register value from the set of optimized register values. 6. The method of claim 1 , further comprising: computing, using the processor and the memory of the parameter server, a server gradient vector, the server gradient vector comprising a set of approximate gradients, the set of approximate gradients including the x-th approximate gradient; and adding, using the processor and the memory of the parameter server, the x-th approximate gradient to a previous x-th weight used by each x-th node in each set of nodes of each neural network instance in the set of worker machines, the adding forming an updated set of weights. 7. The method of claim 1 , wherein the server ISA instruction comprises: a first number of bytes that is less than a second number of bytes needed to represent the gradient in the first set of gradients, wherein a set of bits corresponding to the first number of bytes is divided into a first subset of bits and a second subset of bits, wherein the first subset of bits representing an opcode from the set of opcodes, and wherein the second subset of bits representing an index into a bank of registers, the bank of registers holding the set of optimized register values. 8. The method of claim 7 , wherein the set of opcodes includes as many opcodes as can be represented by the first subset of bits, and wherein the set of opcodes is a selected subset of a second set of opcodes. 9. The method of claim 7 , wherein the bank of registers includes as many registers as can be represented by the second subset of bits, and wherein the bank of registers is a selected subset of a second bank of registers. 10. The method of claim 1 , further comprising: concluding training the neural network instances responsive to determining that an n-th iteration overall gradient vector includes at least a threshold number of gradients that are within a specified tolerance of corresponding gradients in an (n−1)-th iteration gradient. 11. A computer usable program product comprising one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storage devices, the stored program instructions comprising: program instructions to compute, using a processor and a memory at a parameter server, an overall gradient vector from a set of instruction set architecture (ISA) vectors corresponding to a set of worker machines, a first ISA vector of a first worker machine comprising a first set of ISA instructions corresponding to a first set of gradients, each gradient in the first set of gradients corresponding to a weight of a node of a first neural network instance being distributedly trained in the first worker machine; program instructions to optimize a set of register values such that when an optimized register value in the set of optimized register values is used in an approximation computation with an opcode from a set of opcodes the approximation computation produces an x-th approximate gradient that is within a tolerance value of an actual value of an x-th gradient in the overall gradient vector; program instructions to construct a server ISA vector, wherein in the server ISA vector, a server ISA instruction in an x-th position in the server ISA vector corresponds to the x-th gradient in the overall gradient vector; and program instructions to cause a processor and a memory at the first worker machine to update a set of weights of a set of nodes of the first neural network instance being distributedly trained in the first worker machine, the set of weights being updated using the set of optimized register values and the server ISA vector. 12. The computer usable program product

Assignees

Inventors

Classifications

G06N3/098Primary
Distributed learning, e.g. federated learning · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0495
Quantised networks; Sparse networks; Compressed networks · CPC title
G06F9/46
Multiprogramming arrangements · CPC title
G06F9/30
Arrangements for executing machine instructions, e.g. instruction decode (for executing microinstructions G06F9/22) · CPC title

Patent family

Related publications grouped by family.

View patent family 65721175

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12039439B2 cover?: An overall gradient vector is computed at a server from a set of ISA vectors corresponding to a set of worker machines. An ISA vector of a worker machine including ISA instructions corresponding to a set of gradients, each gradient corresponding to a weight of a node of a neural network being distributedly trained in the worker machine. A set of register values is optimized for use in an approx…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06N3/098. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 16 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).