Batch processing in a neural network processor
US-2016342890-A1 · Nov 24, 2016 · US
US10489704B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10489704-B2 |
| Application number | US-201916268457-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 5, 2019 |
| Priority date | Aug 5, 2016 |
| Publication date | Nov 26, 2019 |
| Grant date | Nov 26, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Aspects for supporting operation data of different bit widths in neural networks are described herein. The aspects may include a processing module that includes one or more processors. The processor may be capable of processing data of one or more respective bit-widths. Further, the aspects may include a determiner module configured to receive one or more instructions that include one or more operands and one or more width fields. The operands may correspond to one or more operand types and each of the width fields may indicate an operand bit-width of one operand type. The determiner module may be further configured to identify at least one operand bit-widths that is greater than each of the bit-widths. In addition, the aspects may include a processor combiner configured to designate a combination of two or more of the processors to process the operands.
Opening claim text (preview).
We claim: 1. An apparatus for neural network processing, comprising: a processing module that includes multiple processors, wherein the multiple processors are capable of processing data of one or more respective bit-widths; a determiner module configured to: receive one or more instructions that include one or more operands and one or more width fields, wherein the one or more operands correspond to one or more operand types, and wherein each of the one or more width fields indicates an operand bit-width of one of the one or more operand types, and identify at least one of the one or more operand bit-widths that is greater than each of the one or more bit-widths, transmit the operands that correspond to the at least one of the one or more operand bit-widths; and a processor combiner configured to designate a combination of two or more of the multiple processors to process the operands that correspond to the at least one of the operand bit-widths. 2. The apparatus of claim 1 , wherein the one or more operands and the one or more width fields are included in one of the one or more instructions. 3. The apparatus of claim 1 , wherein the one or more operands are included in a first instruction and the one or more width fields are included in a second instruction. 4. The apparatus of claim 1 , wherein the one or more instructions include one or more opcodes that indicate operations to be performed by the multiple processors. 5. The apparatus of claim 1 , further comprising a caching unit configured to store data identified by the one or more operands. 6. The apparatus of claim 1 , wherein the one or more operands in one of the one or more instructions include one or more iterators and one or more addresses, and wherein each of the one or more addresses corresponds to one of the width fields. 7. The apparatus of claim 1 , wherein the one or more operands in one of the one or more instructions include at least a column count of a matrix, at least a row count of the matrix, at least one input address, or at least one output address, and wherein the input address and the output address respectively correspond to one of the width fields. 8. The apparatus of claim 1 , wherein the one or more operands in one of the one or more instructions include at least one vector size, at least one input address, or at least one output address, and wherein the input address and the output address respectively correspond to one of the width fields. 9. The apparatus of claim 1 , wherein the one or more operands in one of the one or more instructions include at least a column count of a matrix, at least a row count of the matrix, at least a vector size, at least one input address, at least one vector address, or at least one output address, and wherein the at least one input address, the at least one vector address, and the at least one output address respectively correspond to one of the width fields. 10. The apparatus of claim 1 , further comprising a controller unit configured to transmit the one or more instructions to the determiner module. 11. The apparatus of claim 10 , wherein the controller unit includes an instruction obtaining module configured to obtain the one or more instruction from an instruction storage device. 12. The apparatus of claim 11 , wherein the controller unit includes a decoding module configured to decode each of the one or more instructions into respective one or more micro-instructions. 13. The apparatus of claim 12 , wherein the controller unit includes a high-speed register configured to store scalar values included in the one or more instructions. 14. The apparatus of claim 13 , wherein the controller unit includes a dependency processing unit configured to determine whether at least one of the one or more instructions has a dependency relationship with a previously received instruction. 15. The apparatus of claim 14 , wherein the controller unit includes a storage queue module configured to store the one or more instructions while the dependency processing unit is determining an existence of the dependency relationship. 16. A method for neural network processing, comprising: receiving, by a determiner module, one or more instructions that include one or more operands and one or more width fields, wherein the one or more operands correspond to one or more operand types, and wherein each of the one or more width fields indicates an operand bit-width of one of the one or more operand types; identifying, by the determiner module, at least one of the one or more operand bit-widths that is greater than each of one or more bit-widths that multiple processors in a processing module are respectively capable of processing; transmitting, by the determiner module, the operands that correspond to the at least one operand bit-widths to a processor combiner; designating, by the processor combiner, a combination of two or more of the multiple processors to process the operands that correspond to the at least one of the operand bit-widths. 17. The method of claim 16 , wherein the one or more operands and the one or more width fields are included in one of the one or more instructions. 18. The method of claim 16 , wherein the one or more operands are included in a first instruction and the one or more width fields are included in a second instruction. 19. The method of claim 16 , wherein the one or more instructions include one or more opcodes that indicate operations to be performed by the multiple processors. 20. The method of claim 16 , further comprising storing, by a caching unit, data identified by the one or more operands. 21. The method of claim 16 , wherein the one or more operands in one of the one or more instructions include one or more iterators and one or more addresses, and wherein each of the one or more addresses corresponds to one of the width fields. 22. The method of claim 16 , wherein the one or more operands in one of the one or more instructions include at least a column count of a matrix, at least a row count of the matrix, at least one input address, or at least one output address, and wherein the input address and the output address respectively correspond to one of the width fields. 23. The method of claim 16 , wherein the one or more operands in one of the one or more instructions include at least one vector size, at least one input address, or at least one output address, and wherein the input address and the output address respectively correspond to one of the width fields. 24. The method of claim 16 , wherein the one or more operands in one of the one or more instructions include at least a column count of a matrix, at least a row count of the matrix, at least a vector size, at least one input address, at least one vector address, or at least one output address, and wherein the at least one input address, the at least one vector address, and the at least one output address respectively correspond to one of the width fields. 25. The method of claim 16 , further comprising transmitting, by a controller unit, the one or more instructions to the determiner module. 26. The method of claim 25 , further comprising obtaining, by an instruction obtaining module of the controller unit, the one or more instruction from an instruction storage device. 27. The method of claim 26 , further comprising decoding, by a decoding module of the controller unit, each of
according to one or more bits in the instruction, e.g. prefix, sub-opcode · CPC title
Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons · CPC title
Dependency mechanisms, e.g. register scoreboarding · CPC title
using electronic means · CPC title
Microinstruction function, e.g. input/output microinstruction; diagnostic microinstruction; microinstruction format · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.