Minimizing usage of hardware counters in triggered operations for collective communication

US2019213146A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2019213146-A1
Application numberUS-201916353759-A
CountryUS
Kind codeA1
Filing dateMar 14, 2019
Priority dateMar 14, 2019
Publication dateJul 11, 2019
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Examples include a computing system having an input/output (I/O) device including a plurality of counters, each counter operating as one of a completion counter and a trigger counter, a processing device; and a memory device. The memory device stores instructions that, in response to execution by the processing device, cause the processing device to represent a plurality of triggered operations of collective communication for high-performance computing executable by the I/O device as a directed acyclic graph stored in the memory device, with triggered operations represented as vertices of the directed acyclic graph and dependencies between triggered operations represented as edges of the directed acyclic graph; traverse the directed acyclic graph using a first process to identify and mark vertices that can share a completion counter; and traverse the directed acyclic graph using a second process to assign a completion counter and a trigger counter for each vertex.

First claim

Opening claim text (preview).

What is claimed is: 1 . An apparatus comprising: an input/output (I/O) device including a plurality of counters, each counter operating as one of a completion counter and a trigger counter; a processing device; and a memory device coupled to the processing device, the memory device having instructions stored thereon that, in response to execution by the processing device, cause the processing device to: represent a plurality of triggered operations of collective communication for high-performance computing executable by the I/O device as a directed acyclic graph stored in the memory device, with triggered operations represented as vertices of the directed acyclic graph and dependencies between triggered operations represented as edges of the directed acyclic graph; traverse the directed acyclic graph using a first process to identify and mark vertices that can share a completion counter; and traverse the directed acyclic graph using a second process to assign a completion counter and a trigger counter for each vertex. 2 . The apparatus of claim 1 , the memory device having instructions stored thereon that, in response to execution by the processing device, cause the processing device to insert a root node and an edge from the root node to each vertex which has no input edges into the directed acyclic graph when multiple vertices of the directed acyclic graph have no input edges. 3 . The apparatus of claim 1 , wherein the first process is a breadth first search (BFS) process. 4 . The apparatus of claim 1 , wherein the second process is a topological sorting process. 5 . The apparatus of claim 1 , wherein instructions to identify and mark vertices that can share a completion counter comprises instructions to traverse the directed acyclic graph, and during traversal to identify a set of successor vertices S of each vertex u; identify a set of predecessor vertices P of each of the successor vertices S; when vertices in a set (P-u) have the set of successor vertices S as their only successor vertices, mark vertices in the set of predecessor vertices P as capable of sharing a completion counter; when vertices in a set (P-u) do not have the set of successor vertices S as their only successor vertices, mark vertices in the set of predecessor vertices P as not capable of sharing a completion counter; and mark vertex u as visited. 6 . The apparatus of claim 5 , wherein instructions to assign the completion counter comprise instructions to assign the completion counter by applying a first set of rules and based at least in part on the marked vertices, wherein the first set of rules comprises: if the vertices in the set of predecessor vertices P are capable of sharing completion counters and the vertices have at least one common parent vertex u, then the vertices in the set of predecessor vertices P are assigned the completion counter of vertex u as their completion counter. 7 . The apparatus of claim 6 , wherein the first set of rules comprises: if the vertices in the set of predecessor vertices P are capable of sharing completion counters but do not have any common parent vertex, and if one of the vertices s in the set of predecessor vertices P has an assigned completion counter C u that is not used as the trigger counter of any other vertex so far, assign completion counter C u as the completion counter for the vertices in the set P-{s}; if there is no vertex s in the set of predecessor vertices P with an assigned completion counter C u or there is no such completion counter C u that is not used as a trigger counter, assign an available completion counter from the completion counters of the parent vertices of the vertices in the set of predecessor vertices P; and if no completion counter from the completion counters of the parent vertices is available, initialize a new completion counter and assign the new completion counter to the vertices in the set of predecessor vertices P. 8 . The apparatus of claim 6 , wherein the first set of rules comprises: if vertex u is not marked to share a completion counter with any other vertices, assign to vertex u an available completion counter from one of the completion counters of parent vertices of vertex u; and if no completion counter is available, assign a new completion counter to vertex u. 9 . The apparatus of claim 1 , wherein assigning the trigger counter comprises assigning the trigger counter by applying a second set of rules and based at least in part on the marked vertices, wherein the second set of rules comprises: if a vertex v has no predecessor vertex, assign a new counter to as the trigger counter of vertex v. 10 . The apparatus of claim 9 , wherein the second set of rules comprises: if vertex v has a number of predecessor vertices equal to one, assign the completion counter of the predecessor vertex of vertex u as the trigger counter of vertex v. 11 . The apparatus of claim 9 , wherein the second set of rules comprises: if vertex v has a number of predecessor vertices greater than one, then if all the predecessor vertices have the same completion counter C u , assign the completion counter C u as the trigger counter of vertex v; if the completion counters of all the predecessor vertices are not the same, assign to vertex v the completion counter of the parent vertex (p) with out-degree=1 as the trigger counter of vertex v's, provided that vertex p's parents are also parents of the other parents of vertex v; otherwise, assign trigger counter of vertex v to be an available counter Cq of a parent vertex q such that Cq is not being used as an active trigger counter by any other vertex in the directed acyclic graph; otherwise allocate a new counter as the trigger counter of vertex v. 12 . A method of operating a computing system comprising: representing a plurality of triggered operations of collective communication for high-performance computing executable by an I/O device as a directed acyclic graph stored in a memory device, with triggered operations represented as vertices of the directed acyclic graph and dependencies between triggered operations represented as edges of the directed acyclic graph; traversing the directed acyclic graph using a first process to identify and mark vertices that can share a completion counter; and traversing the directed acyclic graph using a second process to assign a completion counter and a trigger counter of the I/O device for each vertex. 13 . The method of claim 12 , comprising inserting a root node and an edge from the root node to each vertex which has no input edges into the directed acyclic graph when multiple vertices of the directed acyclic graph have no input edges. 14 . The method of claim 12 , wherein identifying and marking vertices that can share a completion counter comprise traversing the directed acyclic graph, and during traversal, identifying a set of successor vertices S of each vertex u; identifying a set of predecessor vertices P of each of the successor vertices S; when vertices in a set (P-u) have the set of successor vertices S as their only successor vertices, marking vertices in the set of predecessor vertices P as capable of sharing a completion counter; when vertices in a set (P-u) do not have the set of successor vertices S as their only successor vertices, marking vertices in the set of predecessor vertices P as not capable of sharing a completion counter; and marking vertex u as visited. 15 . The method of claim 14 , wherein assigning the completion counter comprises assigning the completion counter by applying a first set of rules and

Assignees

Inventors

Classifications

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

  • G06F13/16Primary

    for access to memory bus (G06F13/28 takes precedence) · CPC title

  • Memory access · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2019213146A1 cover?
Examples include a computing system having an input/output (I/O) device including a plurality of counters, each counter operating as one of a completion counter and a trigger counter, a processing device; and a memory device. The memory device stores instructions that, in response to execution by the processing device, cause the processing device to represent a plurality of triggered operations…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F16/9024. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jul 11 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).