System, apparatus and method for barrier synchronization in a multi-threaded processor

US11061742B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11061742-B2
Application numberUS-201816019685-A
CountryUS
Kind codeB2
Filing dateJun 27, 2018
Priority dateJun 27, 2018
Publication dateJul 13, 2021
Grant dateJul 13, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a first processor core includes: a plurality of execution pipelines each to execute instructions of one or more threads; a plurality of pipeline barrier circuits coupled to the plurality of execution pipelines, each of the plurality of pipeline barrier circuits associated with one of the plurality of execution pipelines to maintain status information for a plurality of barrier groups, each of the plurality of barrier groups formed of at least two threads; and a core barrier circuit to control operation of the plurality of pipeline barrier circuits and to inform the plurality of pipeline barrier circuits when a first barrier has been reached by a first barrier group of the plurality of barrier groups. Other embodiments are described and claimed.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: a first core comprising: a plurality of execution pipelines each to execute instructions of one or more threads; a plurality of pipeline barrier circuits coupled to the plurality of execution pipelines, each of the plurality of pipeline barrier circuits associated with one of the plurality of execution pipelines to maintain status information for a plurality of barrier groups, each of the plurality of barrier groups formed of at least two threads; and a core barrier circuit to: configure each pipeline barrier circuit of the plurality of pipeline barrier circuits into multiple collective configurations, wherein each of the multiple collective configurations is associated with a different barrier group of the plurality of barrier groups, and wherein each pipeline barrier circuit uses the multiple collective configurations concurrently to track the plurality of barrier groups; receive, from a first pipeline barrier circuit of the plurality of pipeline barrier circuits, a barrier reach indication indicating that a first thread of a first barrier group having a plurality of threads has reached a first barrier; in response to the received barrier reach indication, update an active count for the first barrier group based on the barrier reach indication; and in response to a determination that the active count corresponds to a configured count, send a barrier completion indication to each of the plurality of pipeline barrier circuits. 2. The processor of claim 1 , the first core further comprising a local network to couple the plurality of pipeline barrier circuits to the core barrier circuit. 3. The processor of claim 1 , wherein the core barrier circuit is to receive a configuration message to program the plurality of pipeline barrier circuits for the first barrier group, the configuration message including a count of the at least two threads of the first barrier group. 4. The processor of claim 3 , wherein the first thread of the first barrier group is to reach the first barrier upon execution of a first barrier instruction, the first barrier instruction to cause the first thread to send the barrier reach indication to the first pipeline barrier circuit. 5. The processor of claim 1 , wherein the core barrier circuit is to determine that the at least two threads of the first barrier group have reached the first barrier when the active count equals a configured value corresponding to a count of the at least two threads of the first barrier group. 6. The processor of claim 4 , wherein in response to a second barrier instruction, the first thread is to poll the first pipeline barrier circuit of the plurality of pipeline barrier circuits to determine whether the first barrier group has reached a synchronization point. 7. The processor of claim 1 , wherein the first thread, in response to a determination that the first barrier group has not reached a synchronization point, is to execute one or more instructions unassociated with the first barrier. 8. The processor of claim 1 , wherein the core barrier circuit is coupled to a second core barrier circuit of a second core coupled to the first core, to enable the first barrier group to include at least one thread to execute on the first core and at least one other thread to execute on the second core. 9. The processor of claim 1 , wherein the core barrier circuit is to couple to a second core barrier circuit of a second core, to enable the first barrier group to include at least one thread to execute on the first core and at least one other thread to execute on the second core, to enable a cross-socket barrier operation to occur, the second core included in a second processor socket, the first core included in a first processor socket. 10. At least non-transitory one computer readable storage medium having stored thereon instructions, which if performed by a machine cause the machine to perform a method comprising: tracking, in a core barrier circuit and a plurality of pipeline barrier circuits of a first core of a processor, a plurality of barrier groups associated with a plurality of execution pipelines of the first core, each of the plurality of barrier groups formed of at least two threads; configuring, by the core barrier circuit, each pipeline barrier circuit of the plurality of pipeline barrier circuits into multiple collective configurations, wherein each of the multiple collective configurations is associated with a different barrier group of the plurality of barrier groups, and wherein each pipeline barrier circuit uses the multiple collective configurations concurrently to track the plurality of barrier groups; receiving, in the core barrier circuit from a first pipeline barrier circuit of the plurality of pipeline barrier circuits in the first core, a barrier reach indication indicating that a first thread of a first barrier group having a plurality of threads has reached a first barrier, wherein the first thread is executed on a first pipeline associated with the first pipeline barrier circuit; updating, via the core barrier circuit, an active count for the first barrier group based on the barrier reach indication; determining, in the core barrier circuit, whether the active count corresponds to a configured count for the first barrier group; and in response to determining that the active count corresponds to the configured count, the core barrier circuit sending a barrier completion indication to each of the plurality of pipeline barrier circuits including the first pipeline barrier circuit. 11. The at least non-transitory one computer readable storage medium of claim 10 , wherein, in response to receiving the barrier completion indication, the plurality of pipeline barrier circuits are to inform the plurality of threads of the first barrier group that the first barrier has been reached by the plurality of threads. 12. The at least one non-transitory computer readable storage medium of claim 11 , wherein informing the plurality of threads comprises sending an interrupt to an interrupt controller to cause the interrupt controller to issue an interrupt to inform the plurality of threads. 13. The at least one non-transitory computer readable storage medium of claim 10 , wherein the method further comprises receiving, in the core barrier circuit, a second barrier reach indication from a second pipeline barrier circuit of a second core of the processor, the second barrier reach indication to indicate that a second thread of the first barrier group has reached the first barrier, the second thread to execute on the second core. 14. The at least one non-transitory computer readable storage medium of claim 10 , wherein the method further comprises receiving, in the core barrier circuit, a second barrier reach indication from a second pipeline barrier circuit of a first core of a second processor, the second barrier reach indication to indicate that a second thread of the first barrier group has reached the first barrier, the second thread to execute on the first core of the second processor, wherein a first processor socket comprises the processor and a second processor socket comprises the second processor. 15. A system comprising: a first processor having a plurality of first cores, each of the plurality of first cores comprising a plurality of execution pipelines and a hierarchical barrier circuit to monitor operation of a plurality of barrier groups, each barrier group having a plurality of threads, the hierarchical barrier circuit comprising: a plurality of pipeline barrier circuits to provide synchronization status informa

Assignees

Inventors

Classifications

  • G06F9/522Primary

    Barrier synchronisation · CPC title

  • Implementation aspects, e.g. pipeline latches; pipeline synchronisation and clocking · CPC title

  • Synchronisation or serialisation instructions · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11061742B2 cover?
In one embodiment, a first processor core includes: a plurality of execution pipelines each to execute instructions of one or more threads; a plurality of pipeline barrier circuits coupled to the plurality of execution pipelines, each of the plurality of pipeline barrier circuits associated with one of the plurality of execution pipelines to maintain status information for a plurality of barrie…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/522. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 13 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).