Memory-network processor with programmable optimizations

US11900124B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11900124-B2
Application numberUS-202318092712-A
CountryUS
Kind codeB2
Filing dateJan 3, 2023
Priority dateMay 24, 2013
Publication dateFeb 13, 2024
Grant dateFeb 13, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various embodiments are disclosed of a multiprocessor system with processing elements optimized for high performance and low power dissipation and an associated method of programming the processing elements. Each processing element may comprise a fetch unit and a plurality of address generator units and a plurality of pipelined datapaths. The fetch unit may be configured to receive a multi-part instruction, wherein the multi-part instruction includes a plurality of fields. First and second address generator units may generate, based on different fields of the multi-part instruction, addresses from which to retrieve first and second data for use by an execution unit for the multi-part instruction or a subsequent multi-part instruction. The execution units may perform operations using a single pipeline or multiple pipelines based on third and fourth fields of the multi-part instruction.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus, comprising: one or more execution units that include multiple pipelines; a fetch unit configured to fetch a multi-part instruction from an instruction stream, wherein the multi-part instruction includes a plurality of fields; and a plurality of address generator units; wherein a first address generator unit of the plurality of address generator units is configured to generate, based on a first field of the multi-part instruction, an address from which to retrieve first data to be stored in a first register for use by an execution unit for the multi-part instruction or a subsequent multi-part instruction; wherein a second address generator unit of the plurality of address generator units is configured to generate, based on a second field of the multi-part instruction, an address from which to retrieve second data to be stored in a second register for use by an execution unit for the multi-part instruction or a subsequent multi-part instruction; and wherein the one or more execution units are configured to: perform a first operation, operating on operands from a plurality of registers, using a first pipeline of the multiple pipelines, based on a third field of the plurality of fields; and perform a second operation, operating on operands from the plurality of registers, using at least two pipelines of the multiple pipelines in parallel, based on a fourth field of the plurality of fields, wherein performing the second operation using the at least two pipelines in parallel comprises generating, by a second pipeline of the at least two pipelines, a first set of partial products and generating, by a third pipeline of the at least two pipelines, a second set of partial products dependent upon at least one partial product of the first set of partial products. 2. The apparatus of claim 1 , wherein the first and second address generator units are independently controlled. 3. The apparatus of claim 1 , wherein the first and second operations are for different threads of execution. 4. The apparatus of claim 1 , wherein an execution unit is configured to forward a result operand for storage in a register. 5. The apparatus of claim 1 , wherein the first address generator unit is configured to generate an address at which to store a result operand generated by an execution unit. 6. The apparatus of claim 1 , wherein generating the first set of partial products comprises compressing a first number of partial products to a second number of partial products, wherein the second number is less than the first number. 7. The apparatus of claim 1 , wherein each pipeline of the multiple pipelines comprises a plurality of respective multiplier units and adder units. 8. A method, comprising: fetching, by a fetch unit of a processor, a multi-part instruction from an instruction stream, wherein the multi-part instruction includes a plurality of fields; generating, by a first address generator unit of the processor based on a first field of the multi-part instruction, an address from which to retrieve first data to be stored in a first register for use by an execution unit for the multi-part instruction or a subsequent multi-part instruction; generating by a second address generator unit of the processor based on a second field of the multi-part instruction, an address from which to retrieve second data to be stored in a second register for use by an execution unit for the multi-part instruction or a subsequent multi-part instruction; performing a first operation that operates on operands from a plurality of registers, by one or more execution units, using a first pipeline of a plurality of pipelines, based on a third field of the plurality of fields; and performing a second operation that operates on operands from a plurality of registers, by one or more execution units, using at least two pipelines of the plurality of pipelines in parallel, based on a fourth field of the plurality of fields, wherein performing the second operation using the at least two pipelines in parallel comprises generating, by a second pipeline of the at least two pipelines, a first set of partial products and generating, by a third pipeline of the at least two pipelines, a second set of partial products dependent upon at least one partial product of the first set of partial products. 9. The method of claim 8 , wherein the first data and the second data are input operands for different threads of execution. 10. The method of claim 8 , further comprising forwarding, by an execution unit, a result operand for storage in one of multiple datapath registers. 11. The method of claim 8 , further comprising: independently performing, by two or more execution units of the processor, first and second respective math operations based on different fields of a multi-part instruction. 12. The method of claim 8 , wherein generating the first set of partial products comprises compressing a first number of partial products to a second number of partial products, wherein the second number is less than the first number. 13. The method of claim 8 , wherein each pipeline of the multiple pipelines comprises a plurality of respective multiplier units and adder units. 14. A non-transitory computer-readable medium having instructions stored thereon that are executable by a computing device to perform operations comprising: fetching a multi-part instruction from an instruction stream, wherein the multi-part instruction includes a plurality of fields; generating, based on a first field of the multi-part instruction, an address from which to retrieve first data to be stored in a first register for use by an execution unit for the multi-part instruction or a subsequent multi-part instruction; generating, based on a second field of the multi-part instruction, an address from which to retrieve second data to be stored in a second register for use by an execution unit for the multi-part instruction or a subsequent multi-part instruction; performing a first operation that operates on operands from a plurality of registers, based on a third field of the plurality of fields; and performing a second operation that operates on operands from a plurality of registers using at least two pipelines of a plurality of pipelines in parallel, based on a fourth field of the plurality of fields, wherein performing the second operation using the at least two pipelines in parallel comprises generating, by a first pipeline of the at least two pipelines, a first set of partial products and generating, by a second pipeline of the at least two pipelines, a second set of partial products dependent upon at least one partial product of the first set of partial products. 15. The non-transitory computer-readable medium of claim 14 , wherein the multi-part instruction indicates independent control of first and second address generator units that generate the addresses from which to retrieve the first and second data. 16. The non-transitory computer-readable medium of claim 14 , wherein the first and second operations are for different threads of execution. 17. The non-transitory computer-readable medium of claim 14 , wherein the first and second operations further comprise generating, based on the multi-part instruction, an address at which to store a result operand generated by an execution unit. 18. The non-transitory computer-readable medium of claim 14 , wherein generating the first set of partial products comprises compressing a first number of partial products to a second number of partial products, wherein

Assignees

Inventors

Classifications

  • to perform operations for flow control · CPC title

  • for indirect branch instructions · CPC title

  • from multiple instruction streams, e.g. multistreaming · CPC title

  • G06F9/3895Primary

    for complex operations, e.g. multidimensional or interleaved address generators, macros · CPC title

  • Power or thermal control instructions · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11900124B2 cover?
Various embodiments are disclosed of a multiprocessor system with processing elements optimized for high performance and low power dissipation and an associated method of programming the processing elements. Each processing element may comprise a fetch unit and a plurality of address generator units and a plurality of pipelined datapaths. The fetch unit may be configured to receive a multi-part…
Who is the assignee on this patent?
Coherent Logix Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/3895. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).