Neural network unit
US-2018225116-A1 · Aug 9, 2018 · US
US12406174B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12406174-B2 |
| Application number | US-201816161867-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 16, 2018 |
| Priority date | Oct 16, 2018 |
| Publication date | Sep 2, 2025 |
| Grant date | Sep 2, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Multi-agent instruction execution engines for neural inference processing are provided. In various embodiments, a neural core is provided. The neural core includes an instruction memory. The instruction memory comprises a plurality of instruction streams, each instruction stream associated with one of a plurality of agents. The instruction memory further comprises a plurality of shared functional units. The neural core is adapted to concurrently execute the plurality of instruction streams on the plurality of associated agents. The execution includes maintaining a separate program counter for each of the plurality of agents, determining a plurality of operations from the instructions of each instruction stream, and directing the operations to the shared functional units. The instructions of each instruction stream are statically scheduled prior to runtime to ensure their execution is conflict free.
Opening claim text (preview).
What is claimed is: 1. A neural core comprising: an instruction memory, the instruction memory comprising a plurality of instruction streams, each instruction stream associated with one of a plurality of agents, each agent of the plurality of agents comprising a control engine and a program counter such that a first agent comprises a first control engine and a first program counter, each control engine of the plurality of control engines performing independent control operations, each program counter comprising a register; and a plurality of shared functional units, wherein the neural core is adapted to concurrently execute the plurality of instruction streams on the plurality of associated agents, wherein the execution comprises: modifying, by the first control engine, the first program counter, determining a plurality of operations from the instructions of each instruction stream, and directing the operations and at least one no-operation instruction to the shared functional units according to a prior modeling of access to the shared functional units and an offline state of each of the separate program counters, the operations being directed from the instruction memory and the at least one no-operation instruction delaying one or more of the plurality of agents to avoid simultaneous agent access to the shared functional units. 2. The neural core of claim 1 , wherein the shared functional units comprise arithmetic, communication, address, and/or computation units. 3. The neural core of claim 1 , wherein the each of the plurality of operations control one of the shared functional units. 4. The neural core of claim 1 , wherein each of the plurality of instruction streams is statically scheduled. 5. The neural core of claim 4 , wherein the static schedule is conflict free. 6. The neural core of claim 5 , wherein the static schedule requires that no two operations access the same shared functional unit simultaneously. 7. The neural core of claim 1 , wherein the plurality of operations are directed to the shared functional units at runtime. 8. The neural core of claim 7 , wherein the plurality of operations are directed to the shared functional units within a sequence of time windows. 9. The neural core of claim 7 , wherein directing the plurality of operations to the shared functional units comprises merging operations from each of the plurality of instruction streams. 10. The neural core of claim 9 , wherein merging operations comprises detecting conflicts between operations directed to the same shared functional unit. 11. The neural core of claim 1 , wherein determining the plurality of operations comprises decoding instructions of each instruction stream. 12. The neural core of claim 1 , adapted to map the plurality of operations to any of the shared functional units. 13. The neural core of claim 1 , wherein the instruction memory is logically segmented. 14. The neural core of claim 1 , wherein the execution is divided into a plurality of cycles. 15. The neural core of claim 1 , further comprising a plurality of parallel data paths, each comprising a subset of the plurality of shared functional units. 16. The neural core of claim 1 , wherein the plurality of agents execute synchronously. 17. The neural core of claim 16 , wherein synchronous execution is provided via a synchronization signal. 18. The neural core of claim 1 , wherein the independent control operations comprise updating one or more loop counter and/or sequence counter. 19. A method comprising: reading a plurality of instruction streams from an instruction memory of a neural core, each instruction stream associated with one of a plurality of agents, each agent of the plurality of agents comprising a control engine and a program counter such that a first agent comprises a first control engine and a first program counter, each control engine of the plurality of control engines performing independent control operations, each program counter comprising a register; concurrently executing the plurality of agents by the neural core; modifying, by the first control engine, the first program counter; determining a plurality of operations from the instructions of each instruction stream; and directing the operations and at least one no-operation instruction to shared functional units of the neural core according to a prior modeling of access to the shared functional units and to an offline state of each of the separate program counters, the operations being directed from the instruction memory and the at least one no-operation instruction delaying one or more of the plurality of agents to avoid simultaneous agent access to the shared functional units. 20. The method of claim 19 , further comprising: computing by the neural core a portion of a neural network layer. 21. A method comprising: executing a plurality of instruction streams, each by one of a plurality of agents, each agent of the plurality of agents comprising a control engine and a program counter such that a first agent comprises a first control engine and a first program counter, each control engine of the plurality of control engines performing independent control operations, each program counter comprising a register; and modifying, by the first control engine, the first program counter, wherein a plurality of shared functional units is controlled by the plurality of instruction streams, the plurality of the shared functional units performing an inference operation, and wherein a prior modeling of access to the shared functional units and offline states of the plurality of program counters are used to avoid simultaneous agent access to the shared functional units by delaying one or more of the plurality of agents using at least one no-operation instruction. 22. The method of claim 21 , wherein the inference operations comprise computation, communication, or memory addressing operations.
Feedforward networks · CPC title
using electronic means · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Architecture, e.g. interconnection topology · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.