Stream-based modular and scalable HW accelerator sub-system with design-time parametric reconfigurable NPU cores

US12591535B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12591535-B2
Application numberUS-202318395019-A
CountryUS
Kind codeB2
Filing dateDec 22, 2023
Priority dateDec 22, 2023
Publication dateMar 31, 2026
Grant dateMar 31, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device includes a plurality of hardware accelerator islands. The accelerator islands have a plurality of processing elements, a plurality of streaming engines, and a stream switch coupled to the plurality of processing elements and to the plurality of streaming engines. The stream switch streams data between the plurality of processing elements of the accelerator island, and between the plurality of streaming engines of the accelerator island and the plurality of processing elements of the accelerator island. Unidirectional stream switch connections (SSCONNs) are coupled between pairs of stream switches of the plurality of accelerator islands. The stream switches of the plurality of hardware accelerator islands and the SSCONNs form a run-time reconfigurable interconnection mesh between the plurality of processing elements of the plurality of hardware accelerator islands. In operation, the interconnection mesh streams data between processing elements of multiple hardware accelerator islands of the plurality of hardware accelerator islands.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A device, comprising: a plurality of hardware accelerator islands, each including: a plurality of processing elements; a plurality of streaming engines; and a stream switch coupled to the plurality of processing elements and to the plurality of streaming engines, wherein the stream switch, in operation, streams data between the plurality of processing elements of the hardware accelerator island, and between the plurality of streaming engines of the hardware accelerator island and the plurality of processing elements of the hardware accelerator island; and a plurality of unidirectional stream switch connections (SSCONNs) coupled between pairs of stream switches of the plurality of hardware accelerator islands, wherein, the stream switches of the plurality of hardware accelerator islands and the SSCONNs form a run-time reconfigurable interconnection mesh between the plurality of processing elements of the plurality of hardware accelerator islands, and, in operation, the interconnection mesh streams data between processing elements of multiple hardware accelerator islands of the plurality of hardware accelerator islands, wherein at least one of the SSCONNs includes stream link conversion circuitry, which, in operation, coverts data streamed via the at least one SSCONN between stream switches of the hardware accelerator islands operating with different data widths, with different channel configurations, or with different data widths and different channel configurations. 2 . The device of claim 1 , comprising: a plurality of unidirectional stream links including: unidirectional stream links coupled between processing elements and stream switches of respective hardware accelerator islands of the plurality of hardware accelerator islands; and unidirectional stream links coupled between a SSCONN and stream switches of a pair of stream switches coupled together by the SSCONN. 3 . The device of claim 1 , wherein at least one SSCONN includes an asynchronous first-in-first-out (FIFO) buffer, which, in operation, synchronizes data streamed via the at least one SSCONN between stream switches of hardware accelerator islands operating with different clocks. 4 . The device of claim 1 , wherein at least one of the SSCONNs includes virtual channel control circuitry, which, in operation, adds or removes virtual channel support to data streams streamed via the at least one SSCONN between stream switches providing different levels of virtual channel support. 5 . The device of claim 1 , wherein each of the plurality of hardware accelerator islands is coupled to each of the other hardware accelerator islands of the plurality of hardware accelerator islands via one or more SSCONNs. 6 . The device of claim 1 , wherein one of the plurality of hardware accelerator islands has a different number of processing elements than another of the plurality of hardware accelerator island. 7 . The device of claim 1 , wherein, in operation: a first set of hardware accelerator islands of the plurality of hardware accelerator islands executes one or more tasks of a first neural network in parallel with execution of one or more tasks of a second neural network by a second set of hardware accelerator islands of the plurality of hardware accelerator islands. 8 . The device of claim 1 , wherein, in operation, multiple hardware accelerator islands of the plurality of accelerator islands process batches of a neural network task in parallel. 9 . The device of claim 1 , wherein, the interconnection mesh, in operation, streams data in parallel between a processing element of a hardware accelerator island of the plurality of hardware accelerator islands and multiple other processing elements coupled to the interconnection mesh. 10 . The device of claim 1 , comprising power control circuitry, which, in operation, applies real-time power tuning to individual hardware accelerator islands of the plurality of hardware accelerator islands. 11 . The device of claim 10 , wherein, in operation, the real-time power tuning is applied based on: detection of events by processes executing on hardware accelerator islands of the plurality of hardware accelerator island; types of processes being executed by respective hardware accelerator islands of the plurality of hardware accelerator islands; operating environment conditions; or various combinations thereof. 12 . The device of claim 10 , wherein, in operation, the applying real-time power tuning includes: independently controlling operating frequencies of hardware accelerator islands of the plurality of hardware accelerator islands; independently controlling supply voltages of hardware accelerator islands of the plurality of hardware accelerator islands; independently controlling body-bias voltages of hardware accelerator islands of the plurality of hardware accelerator islands; independently controlling operational states of hardware accelerator islands of the plurality of hardware accelerator islands; or various combinations thereof. 13 . The device of claim 1 , wherein, in operation, the plurality of hardware accelerator islands are organized into multiple security regions on a hardware accelerator island basis. 14 . The device of claim 13 , wherein the organizing of the plurality of hardware accelerator islands into multiple security regions is based on configuration bus ID associated with respective hardware accelerator islands of the plurality of hardware accelerator islands. 15 . The device of claim 1 , wherein each of the hardware accelerator islands of the plurality of hardware accelerator islands includes a bus interface, which, in operation, couples the hardware accelerator island to a host system bus. 16 . The device of claim 15 , wherein, the bus interface of a first hardware accelerator island of the plurality of hardware accelerator islands, in operation, couples the first hardware accelerator island to a first host system bus; and the bus interface of a second hardware accelerator island of the plurality of hardware accelerator islands, in operation, couples the second hardware accelerator island to a second host system bus. 17 . The device of claim 16 , wherein the interconnection mesh, in operation, streams data from a processing element of the first hardware accelerator island to a processing element of the second hardware accelerator island. 18 . The device of claim 1 , wherein the run-time reconfigurable interconnection mesh, in operation, employs synchronizing mechanisms and back-pressure signaling. 19 . A system, comprising: a memory; a host processor coupled to the memory; a host system bus; a plurality of hardware accelerator islands coupled to the host system bus, each including: a plurality of processing elements; a plurality of streaming engines; and a stream switch coupled to the plurality of processing elements and to the plurality of streaming engines, wherein the stream switch, in operation, streams data between the plurality of processing elements of the hardware accelerator island, and between the plurality of streaming engines of the hardware accelerator island and the plurality of processing elements of the hardware accelerator island; and a plurality of unidirectional stream switch connections (SSCONNs) coupled between pairs of stream switches of the plurality of hardware accelerator islands, wherein, the stream switches of the plurality of hardware accelerator islands and the SSCONNs form a run-time reconfigurable

Assignees

Inventors

Classifications

  • using switching circuits, e.g. switching matrix, connection or expansion network (G06F13/4009 takes precedence) · CPC title

  • G06F15/17Primary

    using an input/output type connection, e.g. channel, I/O port · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12591535B2 cover?
A device includes a plurality of hardware accelerator islands. The accelerator islands have a plurality of processing elements, a plurality of streaming engines, and a stream switch coupled to the plurality of processing elements and to the plurality of streaming engines. The stream switch streams data between the plurality of processing elements of the accelerator island, and between the plura…
Who is the assignee on this patent?
St Microelectronics Int Nv
What technology area does this patent fall under?
Primary CPC classification G06F13/4022. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 31 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).