Neural network computation circuit, control circuit therefor, and control method therefor
US-2024411520-A1 · Dec 12, 2024 · US
US2020410354A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2020410354-A1 |
| Application number | US-201916455329-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 27, 2019 |
| Priority date | Jun 27, 2019 |
| Publication date | Dec 31, 2020 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are disclosed for debugging a neural network execution on a target processor. A reference processor may generate a plurality of first reference tensors for the neural network. The neural network may be repeatedly reduced to produce a plurality of lengths. For each of the lengths, a compiler converts the neural network into first machine instructions, the target processor executes the first machine instructions to generate a first device tensor, and the debugger program determines whether the first device tensor matches a first reference tensor. A shortest length is identified for which the first device tensor does not match the first reference tensor. Tensor output is enabled for a lower-level intermediate representation of the shortest neural network, and the neural network is converted into second machine instructions, which are executed by the target processor to generate a second device tensor.
Opening claim text (preview).
What is claimed is: 1 . A method of debugging a neural network execution on a target processor, the method comprising: receiving, by a debugger program operating on a host system, a request to debug an execution of a neural network on the target processor, the neural network comprising a plurality of layers; generating, using a reference processor on the host system and based on a first sample input, a plurality of first reference tensors for the neural network; repeatedly reducing the plurality of layers of the neural network to produce a plurality of lengths, and for each particular length of a plurality of lengths: converting, by a compiler operating on the host system, the neural network having the particular length into first machine instructions; executing, using the target processor and based on the first sample input or on one of the plurality of first reference tensors, the first machine instructions to generate a first device tensor; and determining, by the debugger program, whether the first device tensor matches a first reference tensor of the plurality of first reference tensors; identifying a shortest length of the plurality of lengths for which the first device tensor does not match the first reference tensor; generating, using the reference processor and based on a second sample input, a plurality of second reference tensors for a lower-level representation of the neural network having the shortest length; converting, by the compiler, the neural network having the shortest length into second machine instructions, wherein the second machine instructions includes additional instructions that enable tensor output for the lower-level representation; executing, using the target processor and based on the second sample input or on one of the plurality of second reference tensors, the second machine instructions to generate a second device tensor for the lower-level representation; and determining, by the debugger program, whether the second device tensor matches a second reference tensor of the plurality of second reference tensors. 2 . The method of claim 1 , wherein the additional instructions enable tensor output for multiple lower-level representations of the neural network. 3 . The method of claim 2 , wherein executing the second machine instructions further generates a third device tensor for a second lower-level representation of the neural network, wherein the lower-level representation is a first lower-level representation. 4 . The method of claim 3 , further comprising: determining, by the debugger program, whether the third device tensor matches a third reference tensor of the plurality of second reference tensors. 5 . The method of claim 1 , wherein the plurality of first reference tensors and the plurality of second reference tensors are generated by the debugger program. 6 . A method of debugging a neural network execution on a target processor, the method comprising: receiving a plurality of first reference tensors for a neural network; repeatedly reducing a plurality of layers of the neural network to produce a plurality of lengths, and for each particular length of a plurality of lengths: converting, by a compiler, the neural network having the particular length into first machine instructions; executing, using the target processor, the first machine instructions to generate a first device tensor; and determining whether the first device tensor matches a first reference tensor of the plurality of first reference tensors; identifying a shortened length of the plurality of lengths for which the first device tensor does not match the first reference tensor; generating a plurality of second reference tensors for a lower-level representation of the neural network having the shortened length; converting, by the compiler, the neural network having the shortened length into second machine instructions; and executing, using the target processor, the second machine instructions to generate a second device tensor for the lower-level representation. 7 . The method of claim 6 , wherein the shortened length is a shortest length of the plurality of lengths. 8 . The method of claim 6 , further comprising: determining, by the debugger program, whether the second device tensor matches a second reference tensor of the plurality of second reference tensors. 9 . The method of claim 6 , wherein the second machine instructions include additional instructions that enable tensor output for the lower-level representation. 10 . The method of claim 9 , wherein the additional instructions enable tensor output for multiple lower-level representations of the neural network. 11 . The method of claim 10 , wherein executing the second machine instructions further generates a third device tensor for a second lower-level representation of the neural network, wherein the lower-level representation is a first lower-level representation. 12 . The method of claim 11 , further comprising: determining, by the debugger program, whether the third device tensor matches a third reference tensor of the plurality of second reference tensors. 13 . The method of claim 6 , wherein the plurality of first reference tensors and the plurality of second reference tensors are generated by the debugger program. 14 . A non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform operations including: receiving a plurality of first reference tensors for a neural network; repeatedly reducing a plurality of layers of the neural network to produce a plurality of lengths, and for each particular length of a plurality of lengths: converting, by a compiler, the neural network having the particular length into first machine instructions; executing, using the target processor, the first machine instructions to generate a first device tensor; and determining whether the first device tensor matches a first reference tensor of the plurality of first reference tensors; identifying a shortened length of the plurality of lengths for which the first device tensor does not match the first reference tensor; generating a plurality of second reference tensors for a lower-level representation of the neural network having the shortened length; converting, by the compiler, the neural network having the shortened length into second machine instructions; and executing, using the target processor, the second machine instructions to generate a second device tensor for the lower-level representation. 15 . The non-transitory computer-readable medium of claim 14 , wherein the shortened length is a shortest length of the plurality of lengths. 16 . The non-transitory computer-readable medium of claim 14 , wherein the operations further comprise: determining, by the debugger program, whether the second device tensor matches a second reference tensor of the plurality of second reference tensors. 17 . The non-transitory computer-readable medium of claim 14 , wherein the second machine instructions include additional instructions that enable tensor output for the lower-level representation. 18 . The non-transitory computer-readable medium of claim 17 , wherein the additional instructions enable tensor output for multiple lower-level representations of the neural network. 19 . The non-transitory computer-readable medium of claim 18 , wherein executing the second machine instructions further generates a third device tensor for a s
Activation functions · CPC title
Combinations of networks · CPC title
using electronic means · CPC title
Learning methods · CPC title
Recovery, e.g. branch miss-prediction, exception handling (error detection or correction G06F11/00) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.