Duplication of tensors for memory allocation in a reconfigurable data processor
US-2025053518-A1 · Feb 13, 2025 · US
US2026017228A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2026017228-A1 |
| Application number | US-202418769220-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jul 10, 2024 |
| Priority date | Jul 10, 2024 |
| Publication date | Jan 15, 2026 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A hardware accelerator is disclosed that can flexibly be configured to support differing data types and differing operation flows. The hardware accelerator includes a plurality of fixed tensor operation logic units, tensor operation pipeline logic configured to receive from the processor a pipeline command including a software-defined tensor operation pipeline definition defining a plurality of tensor operation stages in a tensor operation pipeline and associated predetermined tensor operations to be performed at each of the defined tensor operation stages. The hardware accelerator is further configured to receive tensor data to be computed by the tensor operation pipeline, and implement the tensor operation pipeline to perform the tensor operations in each of the tensor operation stages on the tensor data, to thereby produce a tensor operation pipeline result for the tensor data, and output the tensor operation pipeline result to the processor.
Opening claim text (preview).
1 . A hardware accelerator for use with a processor of a computing system, comprising: a configurable pipeline processing element array including a plurality of processing elements, each processing element including a configurable plurality of fixed tensor operation logic units, the configurable pipeline processing element array being configured to receive a tensor operation pipeline definition and tensor data from the processor, wherein each processing element is configured to process the tensor data by implementing a configurable tensor operation pipeline including one or more of the fixed tensor operation logic units according to the tensor operation pipeline definition; the tensor operation pipeline definition defines a plurality of stages, each stage specifying a corresponding one of the configurable plurality of fixed tensor operation logic units; the stages are in a predetermined order defined by an on-chip hardware layout, and individual fixed tensor operation logic units can be turned on or off by command; at least one of the stages includes a lookup table logic unit as the fixed tensor operation logic unit for that stage; and the configurable pipeline processing element array is configured to output a tensor operation pipeline result based on the processing of the tensor data by each tensor operation pipeline in each processing element. 2 . The hardware accelerator of claim 1 , wherein the configurable plurality of fixed tensor operation logic units are selected from the group consisting of a split logic unit, add logic unit, subtract logic unit, select logic unit, concatenate logic unit, and the lookup table logic unit. 3 . (canceled) 4 . The hardware accelerator of claim 1 , wherein the tensor data is encoded with a distribution encoding, and the tensor operation pipeline decodes the distribution encoding. 5 . The hardware accelerator of claim 1 , wherein the tensor operation pipeline performs block scaling on the tensor data. 6 . The hardware accelerator of claim 1 , wherein the tensor operation logic units that form the tensor operation pipeline are separate from a tensor arithmetic unit of the hardware accelerator. 7 . The hardware accelerator of claim 6 , wherein the tensor operation pipeline result is passed to the tensor arithmetic unit for further on-chip processing prior to outputting the tensor operation pipeline result. 8 . A hardware accelerator for use with a processor of a computing system, comprising: a configurable plurality of fixed tensor operation logic units configured to perform a plurality of predetermined types of tensor operations; and tensor operation pipeline logic configured to: receive from the processor a pipeline command including a software-defined tensor operation pipeline definition defining a plurality of tensor operation stages in a tensor operation pipeline and associated predetermined tensor operations to be performed at each of the defined tensor operation stages, wherein: each of the tensor operation stages specifies a corresponding one of the configurable plurality of fixed tensor operation logic units; the tensor operation stages are in a predetermined order defined by an on-chip hardware layout, and individual fixed tensor operation logic units can be turned on or off by command; and at least one of the tensor operation stages includes a lookup table logic unit as the fixed tensor operation logic unit for that tensor operation stage; receive tensor data to be computed by the tensor operation pipeline, and implement the tensor operation pipeline to perform the tensor operations in each of the tensor operation stages on the tensor data, to thereby produce a tensor operation pipeline result for the tensor data, and output the tensor operation pipeline result to the processor. 9 . The hardware accelerator of claim 8 , wherein the tensor data includes numerical parameters of a neural network. 10 . The hardware accelerator of claim 9 , wherein the numerical parameters of the neural network are floating point values including one or more mantissa bits and one or more exponent bits. 11 . The hardware accelerator of claim 8 , wherein the predetermined types of tensor operations are selected from the group consisting of split, add, subtract, select, concatenate, and perform a lookup to a lookup table. 12 . The hardware accelerator of claim 11 , wherein the lookup table is programmable to implement a user-defined function. 13 . The hardware accelerator of claim 12 , wherein the user-defined function is block scaling or decoding a distribution encoding of the tensor data. 14 . The hardware accelerator of claim 11 , wherein the tensor data includes floating point values and the split function splits floating point values into constituent mantissa and exponent portions. 15 . A computing system comprising: a processor; and a hardware accelerator communicatively coupled to the processor, the hardware accelerator including a configurable plurality of fixed tensor operation units configured to perform a plurality of predetermined types of tensor operations, wherein the hardware accelerator is configured to receive from the processor a pipeline command including a software-defined tensor operation pipeline definition defining a plurality of tensor operation stages in a tensor operation pipeline and associated predetermined tensor operations to be performed at each of the defined tensor operation stages, each of the tensor operation stages specifies a corresponding one of the configurable plurality of fixed tensor operation logic units, the tensor operation stages are in a predetermined order defined by an on-chip hardware layout, and individual fixed tensor operation logic units can be turned on or off by command, at least one of the tensor operation stages includes a lookup table logic unit as the fixed tensor operation logic unit for that tensor operation stage, the hardware accelerator is further configured to receive tensor data to be computed by the tensor operation pipeline, and in response to receiving the tensor pipeline command and the tensor data, the hardware accelerator is configured to implement the tensor operation pipeline to perform the tensor operations in each of the tensor operation stages on the tensor data, to thereby produce a tensor operation pipeline result for the tensor data, and output the tensor operation pipeline result to the processor. 16 . The computing system of claim 15 , wherein the tensor data includes numerical parameters of a neural network. 17 . The computing system of claim 16 , wherein the numerical parameters of the neural network are floating point values including one or more mantissa bits and one or more exponent bits. 18 . The computing system of claim 15 , wherein the tensor operations are selected from the group consisting of split, add, subtract, select, concatenate, and perform a lookup to a lookup table. 19 . The computing system of claim 18 , wherein the tensor operations include the lookup table, and the lookup table is programmable to implement a user-defined function. 20 . The computing system of claim 18 , wherein the tensor data includes floating point values and the split function splits floating point values into constituent mantissa and exponent portions.
Dataflow computers · CPC title
Reconfigurable logic embedded in CPU, e.g. reconfigurable unit · CPC title
Reconfigurable logic implemented as a co-processor (instruction execution using a coprocessor G06F9/3877) · CPC title
Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS · CPC title
with reconfigurable architecture · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.