Shared local memory read merge and multicast return

US2020402198A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020402198-A1
Application numberUS-201916449545-A
CountryUS
Kind codeA1
Filing dateJun 24, 2019
Priority dateJun 24, 2019
Publication dateDec 24, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A general-purpose graphics processor comprising a first set of compute units, a second set of compute units, and a memory coupled with the first set of compute units and the second set of compute units is described. The memory is configured to merge a first read request to an address block of the memory with a second read request to the address block of the memory to reduce a number of memory accesses to a memory bank associated with the address block. The graphics processor can also include a memory arbiter that can multicast merged reads to the compute units associated with the merged reads.

First claim

Opening claim text (preview).

What is claimed is: 1 . A general-purpose graphics processor comprising: a first set of compute units; a second set of compute units; and a memory coupled with the first set of compute units and the second set of compute units, wherein the memory is to merge a first read request to an address block of the memory with a second read request to the address block of the memory to reduce a number of memory accesses to a memory bank associated with the address block. 2 . The general-purpose graphics processor as in claim 1 , wherein the memory includes a scoreboard to store a first entry for the first read request and a second entry for the second read request and the memory is to merge the first read request with the second read request upon determination that the first entry and the second entry have a matching address block. 3 . The general-purpose graphics processor as in claim 2 , wherein the first entry for the first read request and the second entry for the second read request are to store an identifier for a compute unit or thread associated with the respective read request. 4 . The general-purpose graphics processor as in claim 3 , wherein the first read request is associated with a first thread of the first set of compute units and the second read request is associated with a second thread of the first set of compute units. 5 . The general-purpose graphics processor as in claim 3 , wherein the first read request is associated with a first thread of the first set of compute units and the second read request is associated with a second thread of the second set of compute units. 6 . The general-purpose graphics processor as in claim 1 , further comprising a memory arbiter coupled with the first set of compute units and the second set of compute units, wherein the memory is coupled with the first set of compute units and the second set of compute units via the memory arbiter. 7 . The general-purpose graphics processor as in claim 6 , wherein the memory arbiter includes a thread dispatch buffer and a thread dispatch bus to dispatch threads for execution to the first set of compute units and the second set of compute units. 8 . The general-purpose graphics processor as in claim 6 , wherein the memory arbiter is to multicast a read result for the first read request and the second read request. 9 . The general-purpose graphics processor as in claim 8 , wherein the memory is to send a bitmask and a read return message to the memory arbiter, the read return message includes the read result for the first read request and the second read request, and the bitmask indicates compute units or threads associated with the read return message. 10 . The general-purpose graphics processor as in claim 9 , wherein the memory arbiter is to multicast the read result for the first read request and the second read request based on the bitmask. 11 . The general-purpose graphics processor as in claim 10 , wherein the memory arbiter includes one or more buffers to store multicast data before the multicast data is received by the compute units, wherein data is maintained in the one or more buffers until each compute unit associated with a multicast receives the multicast data. 12 . A method comprising: receiving a first read request at a memory, the memory shared between a first set of compute units and a second set of compute units of a general-purpose graphics processor; receiving a second read request at the memory while the first read request is pending; determining that an address block associated with the first read request matches the address block associated with the second read request; and merging the second read request with the first read request. 13 . The method as in claim 12 , wherein merging the second read request with the first read request includes performing one access to one or more memory banks of the memory for the first read request and the second read request. 14 . The method as in claim 13 , further comprising determining that an address block associated with the first read request differs from the address block associated with the second read request and performing separate accesses to the one or more memory banks for the first read request and the second read request. 15 . The method as in claim 12 , further comprising transmitting a read return for a merged request, the read return including a merge mask to indicate multiple recipients for the read return. 16 . The method as in claim 15 , wherein the merge mask is a bit mask that identifies multiple compute units or threads associated with the read return. 17 . The method as in claim 16 , further comprising: receiving the read return for the merged request at a memory arbiter coupled with the multiple compute units or threads associated with the read return; and multicasting the read return to the multiple compute units or threads associated with the read return. 18 . A general-purpose graphics processing system comprising: a memory device; and a general-purpose graphics processor coupled with the memory device, wherein the general-purpose graphics processor includes: a first set of compute units; a second set of compute units; a memory coupled with the first set of compute units and the second set of compute units, wherein the memory is to merge a first read request to an address block of the memory with a second read request to the address block of the memory to reduce a number of memory accesses to a memory bank associated with the address block; and a memory arbiter coupled with the first set of compute units, the second set of compute units and the memory, wherein the memory arbiter is to receive a read return message from the memory, the read return message including read data for the first read request and the second read request and transmit the read return message to compute units within the first set of compute units or second set of compute units associated with the first read request and the second read request. 19 . The general-purpose graphics processing system as in claim 18 , wherein the memory arbiter is to transmit the read return message to the compute units within the first set of compute units or second set of compute units associated with the first read request and the second read request as a single multicast message. 20 . The general-purpose graphics processing system as in claim 19 , wherein the memory arbiter is to transmit the single multicast message to compute units identified by a bitmask received from the memory.

Assignees

Inventors

Classifications

  • Energy efficient computing, e.g. low power processors, power management or thermal management · CPC title

  • Image or video data · CPC title

  • Power efficiency · CPC title

  • for multiprocessing or multitasking · CPC title

  • with a shared cache · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020402198A1 cover?
A general-purpose graphics processor comprising a first set of compute units, a second set of compute units, and a memory coupled with the first set of compute units and the second set of compute units is described. The memory is configured to merge a first read request to an address block of the memory with a second read request to the address block of the memory to reduce a number of memory a…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F3/0604. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 24 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).