Data selection network for a data processing engine in an integrated circuit

US11061673B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11061673-B1
Application numberUS-201815944393-A
CountryUS
Kind codeB1
Filing dateApr 3, 2018
Priority dateApr 3, 2018
Publication dateJul 13, 2021
Grant dateJul 13, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An example core for data processing engine (DPE) includes a first register file configured to provide a first plurality of output lanes, a processor, coupled to the register file, including: a multiply-accumulate (MAC) circuit, and a first permute circuit coupled between the first register file and the MAC circuit. The first permute circuit is configured to generate a first vector by selecting a first set of output lanes from the first plurality of output lanes, and a second permute circuit coupled between the first register file and the MAC circuit. The second permute circuit is configured to generate a second vector by selecting a second set of output lanes from the first plurality of output lanes.

First claim

Opening claim text (preview).

What is claimed is: 1. A core for a data processing engine (DPE), comprising: a first register file configured to provide first register output lanes; a second register file configured to provide second register output lanes; and a processor, coupled to the first register file and the second register file, including: a multiply-accumulate (MAC) circuit; a first permute circuit coupled between the first register file and the MAC circuit, the first permute circuit coupled to each lane of the first register output lanes and not coupled to any lane of the second register output lanes, the first permute circuit being configured to generate a first vector by selecting, as first permute output lanes, a first set from the first register output lanes; a second permute circuit coupled between the first register file and the MAC circuit, the second permute circuit coupled to each lane of the first register output lanes and not coupled to any lane of the second register output lanes, the second permute circuit being configured to generate a second vector by selecting, as second permute output lanes, a second set from the first register output lanes; a third permute circuit coupled between the second register file and the MAC circuit, the third permute circuit coupled to each lane of the second register output lanes and not coupled to any lane of the first register output lanes, the third permute circuit being configured to generate a third vector by selecting, as third permute output lanes, a third set from the second register output lanes; a pre-adder circuit coupled between (i) both the first permute circuit and the second permute circuit and (ii) the MAC circuit, the pre-adder circuit being configured to process the first vector received from the first permute output lanes and the second vector received from the second permute output lanes, the pre-adder circuit having pre-adder output lanes coupled to the MAC circuit; and a special operation circuit coupled between (i) both the second permute circuit and the third permute circuit and (ii) the MAC circuit, the special operation circuit being configured to selectively output a unity value and a respective sign-extended version of the second vector received from the second permute output lanes and the third vector received from the third permute output lanes, the special operation circuit having special operation output lanes coupled to the MAC circuit. 2. The core of claim 1 , wherein the MAC circuit comprises: a multiplier configured to process an output of the pre-adder circuit and an output of the special operation circuit; at least one post-adder configured to process an output of the multiplier; and an accumulator configured to process an output of the at least one post-adder. 3. The core of claim 2 , wherein the at least one post-adder comprises a first post-adder coupled to a second post-adder. 4. The core of claim 2 , further comprising: a first multiplexer configured to select among the first permute output lanes, the second permute output lanes, and the third permute output lanes; an upshift circuit coupled to an output of the first multiplexer; and a second multiplexer coupled to an output of the upshift circuit and to the output of the at least one post-adder; wherein the accumulator is further configured to process an output of the second multiplexer. 5. The core of claim 4 , wherein the first multiplexer is further configured to select among an output of a third register file in addition to the first permute circuit output lanes, the second permute circuit output lanes, and the third permute circuit output lanes. 6. The core of claim 1 , wherein the first register file and the second register file are portions of a single register file. 7. The core of claim 1 , wherein the first permute circuit, the second permute circuit, and the third permute circuit are portions of a single permute circuit. 8. The core of claim 1 , wherein a number of bits of the first permute output lanes is less than a number of bits of the first register output lanes, and a number of bits of the second permute output lanes is less than a number of bits of the first register output lanes. 9. An integrated circuit (IC), comprising: a data processing engine (DPE) array having a plurality of DPEs, each of the plurality of DPEs including a core, the core including: a first register file configured to provide first register output lanes; a second register file configured to provide second register output lanes; and a processor, coupled to the first register file and the second register file, including: a multiply-accumulate (MAC) circuit; a first permute circuit coupled between the first register file and the MAC circuit, the first permute circuit coupled to each lane of the first register output lanes and not coupled to any lane of the second register output lanes, the first permute circuit being configured to generate a first vector by selecting, as first permute output lanes, a first set from the first register output lanes; a second permute circuit coupled between the first register file and the MAC circuit, the second permute circuit coupled to each lane of the first register output lanes and not coupled to any lane of the second register output lanes, the second permute circuit being configured to generate a second vector by selecting, as second permute output lanes, a second set from the first register output lanes; a third permute circuit coupled between the second register file and the MAC circuit, the third permute circuit coupled to each lane of the second register output lanes and not coupled to any lane of the first register output lanes, the third permute circuit being configured to generate a third vector by selecting, as third permute output lanes, a third set from the second register output lanes; a pre-adder circuit coupled between (i) both the first permute circuit and the second permute circuit and (ii) the MAC circuit, the pre-adder circuit being configured to process the first vector received from the first permute output lanes and the second vector received from the second permute output lanes, the pre-adder circuit having pre-adder output lanes coupled to the MAC circuit; and a special operation circuit coupled between (i) both the second permute circuit and the third permute circuit and (ii) the MAC circuit, the special operation circuit being configured to selectively output a unity value and a respective sign-extended version of the second vector received from the second permute output lanes and the third vector received from the third permute output lanes, the special operation circuit having special operation output lanes coupled to the MAC circuit. 10. The IC of claim 9 , wherein the MAC circuit comprises: a multiplier configured to process an output of the pre-adder circuit and an output of the special operation circuit; at least one post-adder configured to process an output of the multiplier; and an accumulator configured to process an output of the at least one post-adder. 11. The IC of claim 10 , wherein the at least one post-adder comprises a first post-adder coupled to a second post-adder. 12. The IC of claim 10 , further comprising: a first multiplexer configured to select among the first permute output lanes, the second permute output lanes, and the third permute output lanes; an upshift circuit coupled to an output of the first multiplexer; and a second multiplexer coupled to an output of the upshift circuit and to the output of the at least one post-adder; wherein the accumulator is further configured to process an output of the second multiplexer. 13. Th

Assignees

Inventors

Classifications

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • comprising data of variable length · CPC title

  • having multiple operands in a single register · CPC title

  • Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title

  • G06F9/3001Primary

    Arithmetic instructions · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11061673B1 cover?
An example core for data processing engine (DPE) includes a first register file configured to provide a first plurality of output lanes, a processor, coupled to the register file, including: a multiply-accumulate (MAC) circuit, and a first permute circuit coupled between the first register file and the MAC circuit. The first permute circuit is configured to generate a first vector by selecting …
Who is the assignee on this patent?
Xilinx Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/30032. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 13 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).