System decoder for training accelerators
US-2023039631-A1 · Feb 9, 2023 · US
US12591535B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12591535-B2 |
| Application number | US-202318395019-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 22, 2023 |
| Priority date | Dec 22, 2023 |
| Publication date | Mar 31, 2026 |
| Grant date | Mar 31, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A device includes a plurality of hardware accelerator islands. The accelerator islands have a plurality of processing elements, a plurality of streaming engines, and a stream switch coupled to the plurality of processing elements and to the plurality of streaming engines. The stream switch streams data between the plurality of processing elements of the accelerator island, and between the plurality of streaming engines of the accelerator island and the plurality of processing elements of the accelerator island. Unidirectional stream switch connections (SSCONNs) are coupled between pairs of stream switches of the plurality of accelerator islands. The stream switches of the plurality of hardware accelerator islands and the SSCONNs form a run-time reconfigurable interconnection mesh between the plurality of processing elements of the plurality of hardware accelerator islands. In operation, the interconnection mesh streams data between processing elements of multiple hardware accelerator islands of the plurality of hardware accelerator islands.
Opening claim text (preview).
The invention claimed is: 1 . A device, comprising: a plurality of hardware accelerator islands, each including: a plurality of processing elements; a plurality of streaming engines; and a stream switch coupled to the plurality of processing elements and to the plurality of streaming engines, wherein the stream switch, in operation, streams data between the plurality of processing elements of the hardware accelerator island, and between the plurality of streaming engines of the hardware accelerator island and the plurality of processing elements of the hardware accelerator island; and a plurality of unidirectional stream switch connections (SSCONNs) coupled between pairs of stream switches of the plurality of hardware accelerator islands, wherein, the stream switches of the plurality of hardware accelerator islands and the SSCONNs form a run-time reconfigurable interconnection mesh between the plurality of processing elements of the plurality of hardware accelerator islands, and, in operation, the interconnection mesh streams data between processing elements of multiple hardware accelerator islands of the plurality of hardware accelerator islands, wherein at least one of the SSCONNs includes stream link conversion circuitry, which, in operation, coverts data streamed via the at least one SSCONN between stream switches of the hardware accelerator islands operating with different data widths, with different channel configurations, or with different data widths and different channel configurations. 2 . The device of claim 1 , comprising: a plurality of unidirectional stream links including: unidirectional stream links coupled between processing elements and stream switches of respective hardware accelerator islands of the plurality of hardware accelerator islands; and unidirectional stream links coupled between a SSCONN and stream switches of a pair of stream switches coupled together by the SSCONN. 3 . The device of claim 1 , wherein at least one SSCONN includes an asynchronous first-in-first-out (FIFO) buffer, which, in operation, synchronizes data streamed via the at least one SSCONN between stream switches of hardware accelerator islands operating with different clocks. 4 . The device of claim 1 , wherein at least one of the SSCONNs includes virtual channel control circuitry, which, in operation, adds or removes virtual channel support to data streams streamed via the at least one SSCONN between stream switches providing different levels of virtual channel support. 5 . The device of claim 1 , wherein each of the plurality of hardware accelerator islands is coupled to each of the other hardware accelerator islands of the plurality of hardware accelerator islands via one or more SSCONNs. 6 . The device of claim 1 , wherein one of the plurality of hardware accelerator islands has a different number of processing elements than another of the plurality of hardware accelerator island. 7 . The device of claim 1 , wherein, in operation: a first set of hardware accelerator islands of the plurality of hardware accelerator islands executes one or more tasks of a first neural network in parallel with execution of one or more tasks of a second neural network by a second set of hardware accelerator islands of the plurality of hardware accelerator islands. 8 . The device of claim 1 , wherein, in operation, multiple hardware accelerator islands of the plurality of accelerator islands process batches of a neural network task in parallel. 9 . The device of claim 1 , wherein, the interconnection mesh, in operation, streams data in parallel between a processing element of a hardware accelerator island of the plurality of hardware accelerator islands and multiple other processing elements coupled to the interconnection mesh. 10 . The device of claim 1 , comprising power control circuitry, which, in operation, applies real-time power tuning to individual hardware accelerator islands of the plurality of hardware accelerator islands. 11 . The device of claim 10 , wherein, in operation, the real-time power tuning is applied based on: detection of events by processes executing on hardware accelerator islands of the plurality of hardware accelerator island; types of processes being executed by respective hardware accelerator islands of the plurality of hardware accelerator islands; operating environment conditions; or various combinations thereof. 12 . The device of claim 10 , wherein, in operation, the applying real-time power tuning includes: independently controlling operating frequencies of hardware accelerator islands of the plurality of hardware accelerator islands; independently controlling supply voltages of hardware accelerator islands of the plurality of hardware accelerator islands; independently controlling body-bias voltages of hardware accelerator islands of the plurality of hardware accelerator islands; independently controlling operational states of hardware accelerator islands of the plurality of hardware accelerator islands; or various combinations thereof. 13 . The device of claim 1 , wherein, in operation, the plurality of hardware accelerator islands are organized into multiple security regions on a hardware accelerator island basis. 14 . The device of claim 13 , wherein the organizing of the plurality of hardware accelerator islands into multiple security regions is based on configuration bus ID associated with respective hardware accelerator islands of the plurality of hardware accelerator islands. 15 . The device of claim 1 , wherein each of the hardware accelerator islands of the plurality of hardware accelerator islands includes a bus interface, which, in operation, couples the hardware accelerator island to a host system bus. 16 . The device of claim 15 , wherein, the bus interface of a first hardware accelerator island of the plurality of hardware accelerator islands, in operation, couples the first hardware accelerator island to a first host system bus; and the bus interface of a second hardware accelerator island of the plurality of hardware accelerator islands, in operation, couples the second hardware accelerator island to a second host system bus. 17 . The device of claim 16 , wherein the interconnection mesh, in operation, streams data from a processing element of the first hardware accelerator island to a processing element of the second hardware accelerator island. 18 . The device of claim 1 , wherein the run-time reconfigurable interconnection mesh, in operation, employs synchronizing mechanisms and back-pressure signaling. 19 . A system, comprising: a memory; a host processor coupled to the memory; a host system bus; a plurality of hardware accelerator islands coupled to the host system bus, each including: a plurality of processing elements; a plurality of streaming engines; and a stream switch coupled to the plurality of processing elements and to the plurality of streaming engines, wherein the stream switch, in operation, streams data between the plurality of processing elements of the hardware accelerator island, and between the plurality of streaming engines of the hardware accelerator island and the plurality of processing elements of the hardware accelerator island; and a plurality of unidirectional stream switch connections (SSCONNs) coupled between pairs of stream switches of the plurality of hardware accelerator islands, wherein, the stream switches of the plurality of hardware accelerator islands and the SSCONNs form a run-time reconfigurable
using switching circuits, e.g. switching matrix, connection or expansion network (G06F13/4009 takes precedence) · CPC title
using an input/output type connection, e.g. channel, I/O port · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.