Hardware accelerator with configurable tensor operation pipeline

US2026017228A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2026017228-A1
Application numberUS-202418769220-A
CountryUS
Kind codeA1
Filing dateJul 10, 2024
Priority dateJul 10, 2024
Publication dateJan 15, 2026
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A hardware accelerator is disclosed that can flexibly be configured to support differing data types and differing operation flows. The hardware accelerator includes a plurality of fixed tensor operation logic units, tensor operation pipeline logic configured to receive from the processor a pipeline command including a software-defined tensor operation pipeline definition defining a plurality of tensor operation stages in a tensor operation pipeline and associated predetermined tensor operations to be performed at each of the defined tensor operation stages. The hardware accelerator is further configured to receive tensor data to be computed by the tensor operation pipeline, and implement the tensor operation pipeline to perform the tensor operations in each of the tensor operation stages on the tensor data, to thereby produce a tensor operation pipeline result for the tensor data, and output the tensor operation pipeline result to the processor.

First claim

Opening claim text (preview).

1 . A hardware accelerator for use with a processor of a computing system, comprising: a configurable pipeline processing element array including a plurality of processing elements, each processing element including a configurable plurality of fixed tensor operation logic units, the configurable pipeline processing element array being configured to receive a tensor operation pipeline definition and tensor data from the processor, wherein each processing element is configured to process the tensor data by implementing a configurable tensor operation pipeline including one or more of the fixed tensor operation logic units according to the tensor operation pipeline definition; the tensor operation pipeline definition defines a plurality of stages, each stage specifying a corresponding one of the configurable plurality of fixed tensor operation logic units; the stages are in a predetermined order defined by an on-chip hardware layout, and individual fixed tensor operation logic units can be turned on or off by command; at least one of the stages includes a lookup table logic unit as the fixed tensor operation logic unit for that stage; and the configurable pipeline processing element array is configured to output a tensor operation pipeline result based on the processing of the tensor data by each tensor operation pipeline in each processing element. 2 . The hardware accelerator of claim 1 , wherein the configurable plurality of fixed tensor operation logic units are selected from the group consisting of a split logic unit, add logic unit, subtract logic unit, select logic unit, concatenate logic unit, and the lookup table logic unit. 3 . (canceled) 4 . The hardware accelerator of claim 1 , wherein the tensor data is encoded with a distribution encoding, and the tensor operation pipeline decodes the distribution encoding. 5 . The hardware accelerator of claim 1 , wherein the tensor operation pipeline performs block scaling on the tensor data. 6 . The hardware accelerator of claim 1 , wherein the tensor operation logic units that form the tensor operation pipeline are separate from a tensor arithmetic unit of the hardware accelerator. 7 . The hardware accelerator of claim 6 , wherein the tensor operation pipeline result is passed to the tensor arithmetic unit for further on-chip processing prior to outputting the tensor operation pipeline result. 8 . A hardware accelerator for use with a processor of a computing system, comprising: a configurable plurality of fixed tensor operation logic units configured to perform a plurality of predetermined types of tensor operations; and tensor operation pipeline logic configured to: receive from the processor a pipeline command including a software-defined tensor operation pipeline definition defining a plurality of tensor operation stages in a tensor operation pipeline and associated predetermined tensor operations to be performed at each of the defined tensor operation stages, wherein: each of the tensor operation stages specifies a corresponding one of the configurable plurality of fixed tensor operation logic units; the tensor operation stages are in a predetermined order defined by an on-chip hardware layout, and individual fixed tensor operation logic units can be turned on or off by command; and at least one of the tensor operation stages includes a lookup table logic unit as the fixed tensor operation logic unit for that tensor operation stage; receive tensor data to be computed by the tensor operation pipeline, and implement the tensor operation pipeline to perform the tensor operations in each of the tensor operation stages on the tensor data, to thereby produce a tensor operation pipeline result for the tensor data, and output the tensor operation pipeline result to the processor. 9 . The hardware accelerator of claim 8 , wherein the tensor data includes numerical parameters of a neural network. 10 . The hardware accelerator of claim 9 , wherein the numerical parameters of the neural network are floating point values including one or more mantissa bits and one or more exponent bits. 11 . The hardware accelerator of claim 8 , wherein the predetermined types of tensor operations are selected from the group consisting of split, add, subtract, select, concatenate, and perform a lookup to a lookup table. 12 . The hardware accelerator of claim 11 , wherein the lookup table is programmable to implement a user-defined function. 13 . The hardware accelerator of claim 12 , wherein the user-defined function is block scaling or decoding a distribution encoding of the tensor data. 14 . The hardware accelerator of claim 11 , wherein the tensor data includes floating point values and the split function splits floating point values into constituent mantissa and exponent portions. 15 . A computing system comprising: a processor; and a hardware accelerator communicatively coupled to the processor, the hardware accelerator including a configurable plurality of fixed tensor operation units configured to perform a plurality of predetermined types of tensor operations, wherein the hardware accelerator is configured to receive from the processor a pipeline command including a software-defined tensor operation pipeline definition defining a plurality of tensor operation stages in a tensor operation pipeline and associated predetermined tensor operations to be performed at each of the defined tensor operation stages, each of the tensor operation stages specifies a corresponding one of the configurable plurality of fixed tensor operation logic units, the tensor operation stages are in a predetermined order defined by an on-chip hardware layout, and individual fixed tensor operation logic units can be turned on or off by command, at least one of the tensor operation stages includes a lookup table logic unit as the fixed tensor operation logic unit for that tensor operation stage, the hardware accelerator is further configured to receive tensor data to be computed by the tensor operation pipeline, and in response to receiving the tensor pipeline command and the tensor data, the hardware accelerator is configured to implement the tensor operation pipeline to perform the tensor operations in each of the tensor operation stages on the tensor data, to thereby produce a tensor operation pipeline result for the tensor data, and output the tensor operation pipeline result to the processor. 16 . The computing system of claim 15 , wherein the tensor data includes numerical parameters of a neural network. 17 . The computing system of claim 16 , wherein the numerical parameters of the neural network are floating point values including one or more mantissa bits and one or more exponent bits. 18 . The computing system of claim 15 , wherein the tensor operations are selected from the group consisting of split, add, subtract, select, concatenate, and perform a lookup to a lookup table. 19 . The computing system of claim 18 , wherein the tensor operations include the lookup table, and the lookup table is programmable to implement a user-defined function. 20 . The computing system of claim 18 , wherein the tensor data includes floating point values and the split function splits floating point values into constituent mantissa and exponent portions.

Assignees

Inventors

Classifications

  • Dataflow computers · CPC title

  • Reconfigurable logic embedded in CPU, e.g. reconfigurable unit · CPC title

  • Reconfigurable logic implemented as a co-processor (instruction execution using a coprocessor G06F9/3877) · CPC title

  • Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS · CPC title

  • with reconfigurable architecture · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2026017228A1 cover?
A hardware accelerator is disclosed that can flexibly be configured to support differing data types and differing operation flows. The hardware accelerator includes a plurality of fixed tensor operation logic units, tensor operation pipeline logic configured to receive from the processor a pipeline command including a software-defined tensor operation pipeline definition defining a plurality of…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F15/80. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 15 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).