Unified address space for multiple hardware accelerators using dedicated low latency links

US10802995B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10802995-B2
Application numberUS-201816046602-A
CountryUS
Kind codeB2
Filing dateJul 26, 2018
Priority dateJul 26, 2018
Publication dateOct 13, 2020
Grant dateOct 13, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system may include a host processor coupled to a communication bus, a first hardware accelerator communicatively linked to the host processor through the communication bus, and a second hardware accelerator communicatively linked to the host processor through the communication bus. The first hardware accelerator and the second hardware accelerator are directly coupled through an accelerator link independent of the communication bus. The host processor is configured to initiate a data transfer between the first hardware accelerator and the second hardware accelerator directly through the accelerator link.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a host processor coupled to a communication bus; a first hardware accelerator communicatively linked to the host processor through the communication bus; and a second hardware accelerator communicatively linked to the host processor through the communication bus; wherein the first hardware accelerator and the second hardware accelerator are directly coupled through an accelerator link independent of the communication bus; and wherein the host processor is configured to initiate a data transfer between the first hardware accelerator and the second hardware accelerator directly through the accelerator link. 2. The system of claim 1 , wherein the host processor is configured to communicate with the first hardware accelerator and the second hardware accelerator over the communication bus. 3. The system of claim 1 , wherein the data transfer includes the first hardware accelerator accessing a memory of the second hardware accelerator through the accelerator link. 4. The system of claim 3 , wherein the host processor is configured to access the memory of the second hardware accelerator by sending data including a target address to the first hardware accelerator, wherein the target address is translated by the host processor to correspond to the second hardware accelerator, and wherein the first hardware accelerator initiates a transaction to access the memory of the second hardware accelerator over the accelerator link based upon the target address. 5. The system of claim 1 , wherein the second hardware accelerator is configured to adjust a target address for the data transfer by an upper bound of an address range for the second hardware accelerator in response to receiving a transaction via the accelerator link and determine whether the adjusted target address is local. 6. The system of claim 1 , wherein the host processor is configured to initiate the data transfer between the first hardware accelerator and the second hardware accelerator based on a status of a direct memory access circuit of the second hardware accelerator coupled to the communication bus. 7. The system of claim 1 , wherein the host processor is configured to automatically determine a sequence of the first hardware accelerator and the second hardware accelerator in a ring topology. 8. The system of claim 1 , wherein the host processor is configured to track buffers corresponding to the first hardware accelerator and the second hardware accelerator using remote buffer flags. 9. A hardware accelerator, comprising: an endpoint configured to communicate with a host processor over a communication bus; a memory controller coupled to a memory local to the hardware accelerator; and a link circuit coupled to the endpoint and the memory controller, wherein the link circuit is configured to establish an accelerator link with a target hardware accelerator also coupled to the communication bus, wherein the accelerator link is a direct connection between the hardware accelerator and the target hardware accelerator that is independent of the communication bus. 10. The hardware accelerator of claim 9 , wherein the link circuit is configured to initiate a data transfer with the target hardware accelerator over the accelerator link and the data transfer occurs in response to an instruction from the host processor received by the hardware accelerator over the communication bus. 11. The hardware accelerator of claim 9 , wherein the link circuit comprises: a first memory-mapped to stream mapper circuit and a second memory-mapped to stream mapper circuit, each configured to convert data streams to memory mapped transactions and memory mapped transactions to data stream. 12. The hardware accelerator of claim 9 , wherein the link circuit is configured to adjust a target address in a received transaction by an upper bound of an address range of the hardware accelerator and determine whether the adjusted target address is local. 13. The hardware accelerator of claim 11 , wherein the link circuit comprises: a first transceiver configured to send and receive stream data; and a first retransmit engine coupled to the first transceiver and the first memory-mapped to stream mapper circuit. 14. The hardware accelerator of claim 13 , wherein the link circuit further comprises: a second transceiver configured to send and receive stream data; and a second retransmit engine coupled to the second transceiver and the second memory-mapped to stream mapper circuit. 15. A method, comprising: receiving, within a first hardware accelerator, an instruction and a target address for a data transfer sent from a host processor over a communication bus; the first hardware accelerator comparing the target address with an upper bound of an address range corresponding to the first hardware accelerator; and in response to determining that the target address exceeds the address range based on the comparing, the first hardware accelerator initiating a transaction with a second hardware accelerator to perform a data transfer using an accelerator link that directly couples the first hardware accelerator and the second hardware accelerator. 16. The method of claim 15 , wherein the accelerator link is independent of the communication bus. 17. The method of claim 15 , wherein the initiating the transaction includes initiating a memory mapped transaction and converting the memory mapped transaction to a data stream to be sent over the accelerator link. 18. The method of claim 15 , further comprising: in response to receiving the transaction in the second hardware accelerator, the second hardware accelerator modifying the target address by an upper bound of an address range of the second hardware accelerator and determining whether the modified target address is within the address range of the second hardware accelerator. 19. The method of claim 18 , wherein the second hardware accelerator receives the transaction as a data stream and converts the data stream into a memory mapped transaction. 20. The method of claim 15 , further comprising: determining a status of a direct memory access circuit of the second hardware accelerator; and initiating the data transfer in response to the status of the direct memory access circuit of the second hardware accelerator.

Assignees

Inventors

Classifications

  • using buffers · CPC title

  • Transactional memory (G06F9/528 takes precedence) · CPC title

  • G06F13/404Primary

    with address mapping · CPC title

  • G06F13/161Primary

    with latency improvement · CPC title

  • G06F13/28Primary

    using burst mode transfer, e.g. direct memory access {DMA}, cycle steal (G06F13/32 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10802995B2 cover?
A system may include a host processor coupled to a communication bus, a first hardware accelerator communicatively linked to the host processor through the communication bus, and a second hardware accelerator communicatively linked to the host processor through the communication bus. The first hardware accelerator and the second hardware accelerator are directly coupled through an accelerator l…
Who is the assignee on this patent?
Xilinx Inc
What technology area does this patent fall under?
Primary CPC classification G06F13/404. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 13 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).