Techniques for modifying and training a neural network
US-2021374518-A1 · Dec 2, 2021 · US
US12555362B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12555362-B2 |
| Application number | US-202318316365-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 12, 2023 |
| Priority date | Nov 13, 2020 |
| Publication date | Feb 17, 2026 |
| Grant date | Feb 17, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The technology of this application relates to a neural network model training method, an image processing method, and an apparatus in the artificial intelligence. The training method includes each of at least one first accelerator training a neural network model based on at least one training sample. Before forward computation at an i th layer is performed, different parameters of the i th layer are obtained locally and from another accelerator, to obtain a complete model parameter of the i th layer. According to the method in this application, storage pressure of the first accelerator can be reduced.
Opening claim text (preview).
What is claimed is: 1 . A neural network model training method, comprising: obtaining, by at least one first accelerator, at least one training sample; obtaining a forward computation result by performing, by the at least one first accelerator, forward computation of a neural network model on the at least one training sample, wherein before performing the forward computation at an i th layer in the neural network model, the at least one first accelerator obtains a complete model parameter of the i th layer by obtaining different parameters of the i th layer locally and from another accelerator, wherein i is a positive integer; obtaining a first parameter gradient of the neural network model by performing, by the at least one first accelerator, backward computation based on the forward computation result; and updating, by the at least one first accelerator, a parameter of the neural network model based on the first parameter gradient of the neural network model. 2 . The method according to claim 1 , further comprising: after performing the forward computation at the i th layer in the neural network model, releasing, by the at least one first accelerator, a parameter of the i th layer obtained from the another accelerator. 3 . The method according to claim 1 , wherein before performing the backward computation at a j th layer in the neural network model, the at least one first accelerator obtains a complete model parameter of the i th layer by obtaining different parameters of the j th layer locally and from another first accelerator, wherein j is a positive integer. 4 . The method according to claim 1 , wherein in a time period in which the at least one first accelerator performs the forward computation at any one or more layers before the i th layer in the neural network model, the at least one first accelerator obtains the complete model parameter of the i th layer by obtaining the different parameters of the i th layer locally and from the another accelerator. 5 . The method according to claim 1 , wherein the at least one first accelerator is located in a first server. 6 . The method according to claim 1 , further comprising: sending, by the at least one first accelerator, the first parameter gradient to the another accelerator. 7 . The method according to claim 6 , wherein the at least one first accelerator sends a parameter gradient of a k th layer in the first parameter gradient to the another accelerator in a time period in which the at least one first accelerator performs the backward computation at any one or more layers before the k th layer in the neural network model, wherein k is a positive integer. 8 . The method according to claim 1 , further comprising: receiving, by the at least one first accelerator, a second parameter gradient of the neural network model sent by the another accelerator; and updating, by the at least one first accelerator, the parameter of the neural network model based on the first parameter gradient of the neural network model comprises: updating, by the at least one first accelerator, the parameter of the neural network model based on the first parameter gradient of the neural network model and the second parameter gradient of the neural network model. 9 . An image processing method, comprising: obtaining, by a second accelerator, a to-be-processed image; and obtaining a processing result of the to-be-processed image by performing, by the second accelerator, forward computation of a target neural network model on the to-be-processed image, wherein before performing the forward computation at a p th layer in the target neural network model, the second accelerator obtains a complete model parameter of the p th layer by obtaining different parameters of the p th layer locally and from another accelerator, wherein p is a positive integer. 10 . The method according to claim 9 , wherein after performing the forward computation at the p th layer in the target neural network model, the second accelerator releases a parameter of the p th layer obtained from the another accelerator. 11 . The method according to claim 9 , wherein in a time period in which the second accelerator performs the forward computation at any one or more layers before the p th layer in the target neural network model, the second accelerator obtains the complete model parameter of the p th layer by obtaining the different parameters of the p th layer locally and from the another accelerator. 12 . The method according to claim 9 , further comprising: obtaining a parameter of the target neural network model by updating a parameter of a neural network model by at least one first accelerator based on a first parameter gradient of the neural network model; obtaining the first parameter gradient of the neural network model by performing backward computation by the at least one first accelerator based on a forward computation result; obtaining the forward computation result by performing the forward computation of the neural network model on at least one training sample by the at least one first accelerator; and obtaining a complete model parameter of an i th layer in the neural network model by obtaining different parameters of the i th layer locally and from the another accelerator. 13 . The method according to claim 12 , further comprising: when the at least one first accelerator performs the backward computation at a j th layer in the neural network model, obtaining a complete model parameter of the j th layer in the neural network model by obtaining different parameters of the j th layer locally and from the another accelerator. 14 . The method according to claim 13 , further comprising obtaining the complete model parameter of the j th layer in a time period in which the at least one first accelerator performs the backward computation at any one or more layers after the j th layer in the neural network model. 15 . The method according to claim 12 , wherein the parameter of the target neural network model being obtained by updating the parameter of the neural network model by the at least one first accelerator based on the first parameter gradient of the neural network model comprises: obtaining the parameter of the target neural network model by updating the parameter of the neural network model by the at least one first accelerator based on the first parameter gradient of the neural network model and a second parameter gradient of the neural network model, wherein the second parameter gradient of the neural network model comprises a parameter gradient sent by the another accelerator and received by the at least one first accelerator. 16 . A neural network model training apparatus, comprising: a processor; and a memory configured to store computer readable instructions that, when executed by the processor, cause the apparatus to: obtain at least one training sample; obtain a forward computation result by performing forward computation of a neural network model on the at least one training sample, wherein before performing the forward computation at an i th layer in the neural network model, a complete model parameter of the i th layer is obtained by obtaining different parameters of the i th layer locally and from another accelerator, wherein i is a positive integer; obtain a first parameter gradient of the neural network model by performing backward computation based on the forward computation result; and update a parameter of the neural network model based on the first parameter gradient of the neural n
Related publications grouped by family.
Answers are generated from the same data shown on this page.