MANAGING DATAFLOW EXECUTION OF LOOP INSTRUCTIONS BY OUT-OF-ORDER PROCESSORS (OOPs), AND RELATED CIRCUITS, METHODS, AND COMPUTER-READABLE MEDIA
US-2016019061-A1 · Jan 21, 2016 · US
US12169771B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12169771-B2 |
| Application number | US-202017764691-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 15, 2020 |
| Priority date | Oct 16, 2019 |
| Publication date | Dec 17, 2024 |
| Grant date | Dec 17, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques in wavelet filtering for advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element comprises a compute element to execute programmed instructions using the data and a router to route the wavelets in accordance with virtual channel specifiers. Each processing element is enabled to perform local filtering of wavelets received at the processing element, selectively, conditionally, and/or optionally discarding zero or more of the received wavelets, thereby preventing further processing of the discarded wavelets. The wavelet filtering is performed by one or more configurable wavelet filters operable in various modes, such as counter, sparse, and range modes.
Opening claim text (preview).
What is claimed is: 1. A method comprising: communicating packets between a plurality of processing elements coupled via a fabric, wherein each of the communicated packets comprises a respective color field; receiving, from the fabric, a particular one of the communicated packets in a particular one of the processing elements, wherein the color field of the particular packet is a particular color value; managing a wavelet filter of the particular processing element, wherein the wavelet filter is enabled to retain a state comprising a filtered color; and queueing information of the particular packet to enable execution, by the particular processing element, of an instruction that uses the queued information, wherein the queueing is conditional based on one or more queueing criteria and a first one of the queueing criteria is that the particular color value and the filtered color are different. 2. A method comprising: communicating packets between a plurality of processing elements coupled via a fabric, wherein each of the communicated packets comprises a respective color field; receiving, from the fabric, a particular one of the communicated packets in a particular one of the processing elements, wherein the color field of the particular packet is a particular color value; managing a wavelet filter of the particular processing element, wherein the wavelet filter is enabled to retain a state comprising a filtered color; and updating a counter, wherein: the updating is conditional based on one or more updating criteria, a first one of the updating criteria is that the particular color value and the filtered color are identical, the particular packet is associated, based on the particular color value, with one or more input queues, each of the input queues is configurable to operate in one of a plurality of mutually exclusive input queue operating modes, a second one of the updating criteria is that at least one of the input queues is configured to operate in a particular one of the input queue operating modes, the particular input queue operating mode is indicated by an indicator, the updating is responsive to the receiving, and the state further comprises the counter and the indicator. 3. A method comprising: communicating packets between a plurality of processing elements coupled via a fabric, wherein each of the communicated packets comprises a respective color field; receiving, from the fabric, a particular one of the communicated packets in a particular one of the processing elements, wherein the color field of the particular packet is a particular color value; managing a wavelet filter of the particular processing element, wherein the wavelet filter is enabled to retain a state comprising a filtered color; queueing information of the particular packet to enable execution, by the particular processing element, of an instruction that uses the queued information, wherein the queueing is conditional based on one or more queueing criteria and a first one of the queueing criteria is that the particular color value and the filtered color are different; and updating a counter, wherein: the updating is conditional based on one or more updating criteria and a first one of the updating criteria is that the particular color value and the filtered color are identical, the updating is responsive to the receiving, and the state further comprises the counter. 4. The method of claim 3 , wherein the state further comprises a filter mode indicator enabled to indicate a mutually exclusive one of a plurality of filter operating modes and a second one of the queueing criteria is that the filter mode indicator indicates a particular one of the filter operating modes. 5. The method of claim 3 , wherein the state further comprises a filter mode indicator enabled to indicate a mutually exclusive one of a plurality of filter operating modes and a second one of the updating criteria is that the filter mode indicator indicates a particular one of the filter operating modes. 6. The method of claim 3 , wherein the state further comprises a filter mode indicator enabled to indicate a mutually exclusive one of a plurality of filter operating modes, a second one of the queueing criteria is that the filter mode indicator indicates a particular one of the filter operating modes, and a second one of the updating criteria is that the filter mode indicator indicates the particular filter operating mode. 7. The method of claim 4, claim 5, or claim 6 , wherein the filter operating modes comprise a counter mode, a sparse mode, and a range mode. 8. The method of claim 3 , wherein the state further comprises a limit and a second one of the queueing criteria is that the counter is one of less than the limit and equal to the limit. 9. The method of claim 8 , wherein the counter is conditionally resettable based on a comparison to the limit. 10. The method of claim 9 , wherein the limit is a first limit, the state further comprises a second limit, and the first limit is conditionally loadable with the second limit. 11. The method of claim 2 or claim 3 , wherein the managing is a first managing, the wavelet filter is a first wavelet filter, the state is a first state, the filtered color is a first filtered color, the updating is a first updating, the counter is a first counter, and the updating criteria is a first updating criteria; and further comprising: a second managing a second wavelet filter of the particular processing element, wherein the second wavelet filter is enabled to retain a second state comprising a second filtered color; and a second updating a second counter, wherein the second updating is conditional based on one or more second updating criteria and a first one of the second updating criteria is that the particular color value and the second filtered color are identical, and the second updating is responsive to the receiving. 12. The method of claim 1, claim 2, or claim 3 , wherein one or more configuration registers of the particular processing element comprise one or more portions of the state. 13. The method of claim 12 , wherein at least one of the configuration registers is modifiable by a load control register instruction that is executable by the particular processing element. 14. The method of claim 12 , wherein at least one of the configuration registers is memory-mapped and modifiable by at least one memory store instruction that is executable by the particular processing element. 15. The method of claim 12 , wherein at least one of the configuration registers is modifiable via a system interface of the particular processing element. 16. The method of claim 1 or claim 3 , wherein the execution by the particular processing element is implemented at least in part by a compute element that the particular processing element comprises. 17. The method of claim 3 , further comprising, transmitting to the fabric a copy of the particular packet, as one of the communicated packets, to another one of the processing elements, wherein the transmitting is irrespective of the queueing criteria and irrespective of the updating criteria. 18. The method of claim 1 or claim 3 , wherein the queued information comprises an integer data value and the instruction comprises an integer arithmetic instruction. 19. The method of claim 1 or claim 3 , wherein the queued information comprises a floating-point data value and the instruction comprises a floating-point arithmetic instruction. 20. The method of claim 1 or claim 3 , wher
Convolutional networks [CNN, ConvNet] · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Supervised learning · CPC title
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
Bus transfer protocol, e.g. handshake; Synchronisation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.