Hybrid hardware accelerator and programmable array architecture

US12314217B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12314217-B2
Application numberUS-202117560637-A
CountryUS
Kind codeB2
Filing dateDec 23, 2021
Priority dateDec 23, 2021
Publication dateMay 27, 2025
Grant dateMay 27, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed for the use of a hybrid architecture that combines a programmable processing array and a hardware accelerator. The hybrid architecture dedicates the most computationally intensive blocks to the hardware accelerator, while maintaining flexibility for additional computations to be performed by the programmable processing array. An interface is also described for coupling the processing array to the hardware accelerator, which achieves a division of functionality and connects the programmable processing array components to the hardware accelerator components without sacrificing flexibility. This results in a balance between power/area and flexibility.

First claim

Opening claim text (preview).

What is claimed is: 1. A system on a chip (SoC), comprising: an array of processing elements, each one of the processing elements being configured to perform processing operations on an array of data samples; a hardware accelerator; and a data interface comprising a buffer, the data interface being coupled to the array of processing elements and to the hardware accelerator, wherein the data interface is configured to transfer the array of data samples from the array of processing elements to the hardware accelerator by storing the array of data samples at an address location in the buffer based upon a respective one of the processing elements from which the array of data samples was received, and wherein the hardware accelerator is configured to compute, based upon the transferred array of data samples, a set of computed terms in accordance with a predetermined processing function to generate processed data samples including the set of computed terms. 2. The SoC of claim 1 , wherein the array of processing elements comprises a vector processor, and wherein the processing elements comprise execution units of the vector processor. 3. The SoC of claim 1 , wherein the hardware accelerator is configured to compute the set of computed terms in accordance with the predetermined processing function by summing non-linear terms computed based upon the transferred array of data samples. 4. The SoC of claim 3 , wherein the hardware accelerator is configured to compute the set of computed terms in accordance with the predetermined processing function by summing the non-linear terms, which are evaluated in accordance with a polynomial function. 5. The SoC of claim 1 , wherein the hardware accelerator comprises a set of lookup tables (LUTs) that contain data entries corresponding to the set of computed terms, and wherein hardware accelerator is configured to compute the set of computed terms in accordance with the predetermined processing function by correlating the array of data samples transferred via the data interface to the LUT data entries. 6. The SoC of claim 1 , wherein: one of the processing elements in the array of processing elements is configured to perform processing operations on the array of data samples to generate an array of processed data samples including a set of additional computed terms, the data interface is configured to synchronize the set of additional computed terms with the set of computed terms, and the hardware accelerator further comprises summation circuitry configured to add the set of additional computed terms with the set of computed terms such that the processed data samples include the set of computed terms and the set of additional computed terms. 7. The SoC of claim 1 , wherein: the data interface is configured to transfer the processed data samples including the set of computed terms to one of the processing elements as a first array of data samples, the one of the processing elements in the array of processing elements is configured to perform processing operations on the first array of data samples to generate a second array of data samples including a set of modified computed terms, and the hardware accelerator is configured to output further processed data samples including the set of modified computed terms based upon the second array of data samples. 8. The SoC of claim 1 , wherein the predetermined processing function comprises a digital pre-distortion (DPD) function, and wherein the hardware accelerator is configured to compute, as the terms in accordance with the DPD function, DPD coefficients for a wireless data transmission. 9. The SoC of claim 1 , wherein the data interface further comprises routing circuitry, and wherein the routing circuitry is configured to transfer the array of data samples from the array of processing elements to the hardware accelerator by writing the array of data samples to a predetermined range of addresses in the buffer based upon the respective one of the array of processing elements from which the array of data samples were received. 10. The SoC of claim 1 , wherein: the hardware accelerator is a first hardware accelerator from among a plurality of hardware accelerators comprising a second hardware accelerator, the first hardware accelerator being associated with a first port, and the second hardware accelerator being associated with a second port, each of the first and the second hardware accelerator is configured to compute a respective set of computed terms in accordance with a predetermined processing function based upon a respective array of data samples transferred via the data interface, and the processed data samples including the set of computed terms represent a summation of each respective set of computed terms computed via the first and the second hardware accelerator. 11. A wireless device, comprising: an array of processing elements, each one of the processing elements being configured to perform processing operations on an array of data samples; a hardware accelerator; and a data interface comprising a buffer, the data interface being coupled to the array of processing elements and to the hardware accelerator, wherein the data interface is configured to transfer the array of data samples from the array of processing elements to the hardware accelerator by storing the array of data samples at an address location in the buffer based upon a respective one of the processing elements from which the array of data samples was received, and wherein the hardware accelerator is configured to compute, based upon the transferred array of data samples, a set of computed terms in accordance with a predetermined processing function to generate processed data samples including the set of computed terms. 12. The wireless device of claim 11 , wherein the array of processing elements comprises a vector processor, and wherein the processing elements comprise execution units of the vector processor. 13. The wireless device of claim 11 , wherein the hardware accelerator is configured to compute the set of computed terms in accordance with the predetermined processing function by summing non-linear terms computed using-based upon the transferred array of data samples. 14. The wireless device of claim 13 , wherein the hardware accelerator is configured to compute the set of computed terms in accordance with the predetermined processing function by summing the non-linear terms, which are evaluated in accordance with a polynomial function. 15. The wireless device of claim 11 , wherein the hardware accelerator comprises a set of lookup tables (LUTs) that contain data entries corresponding to the set of computed terms, and wherein hardware accelerator is configured to compute the set of computed terms in accordance with the predetermined processing function by correlating the array of data samples transferred via the data interface to the LUT data entries. 16. The wireless device of claim 11 , wherein: one of the processing elements in the array of processing elements is configured to perform processing operations on the array of data samples to generate an array of processed data samples including a set of additional computed terms, the data interface is configured to synchronize the set of additional computed terms with the set of computed terms, and the hardware accelerator further comprises summation circuitry configured to add the set of additional computed terms with the set of computed terms such that the processed data samples include the set of computed terms and the set of additional computed terms.

Assignees

Inventors

Classifications

  • for solving equations {, e.g. nonlinear equations, general mathematical optimization problems (optimization specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title

  • System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package · CPC title

  • Vector processors · CPC title

  • with reconfigurable architecture · CPC title

  • Array of vector units · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12314217B2 cover?
Techniques are disclosed for the use of a hybrid architecture that combines a programmable processing array and a hardware accelerator. The hybrid architecture dedicates the most computationally intensive blocks to the hardware accelerator, while maintaining flexibility for additional computations to be performed by the programmable processing array. An interface is also described for coupling …
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F15/8092. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 27 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).