Barrierless and fenceless shared memory synchronization

US2021286619A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021286619-A1
Application numberUS-202016818845-A
CountryUS
Kind codeA1
Filing dateMar 13, 2020
Priority dateMar 13, 2020
Publication dateSep 16, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

When communicating through shared memory, a producer thread generates a value that is written to a location in a shared memory. The value is read from the shared memory by a consumer thread. The challenge is to ensure that the consumer thread reads the location only after the value is written and is thereby synchronized. When a memory location is written by a producer thread, a flag that is simultaneously stored in the memory location along with the value is toggled. The consumer thread tracks information to determine whether the flag stored in the location indicates whether the producer has written the value to the location. The flag is read and written simultaneously with reading and writing the location in memory, thereby eliminating the need for a memory fence. After all of the consumer threads read the value, the location may be reused to write additional value(s) and simultaneously toggle the flag.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising: executing a set of threads by a multi-threaded parallel processor to process inputs according to a sequence of instructions; generating a first value by a first thread in the set of threads; and writing the first value to a first location of a shared memory simultaneously with updating a first flag stored in the first location from a first state to a second state, wherein the first flag is initialized to the second state when execution of the sequence of instructions is initiated. 2 . The computer-implemented method of claim 1 , further comprising: determining, by a third thread, that the first flag has changed from the first state to the second state; reading, by the third thread, the first value from the first location before a second flag stored in a second location of the shared memory to be written by a second thread in the set of threads is updated; and processing the first value by the third thread to produce an output. 3 . The computer-implemented method of claim 2 , wherein determining the first flag has changed from the first state to the second state comprises simultaneously reading the first flag and the first value from the first location by the third thread. 4 . The computer-implemented method of claim 3 , wherein determining the first flag has changed from the first state to the second state further comprises comparing the first flag to a valid state. 5 . The computer-implemented method of claim 1 , wherein the first value and the first flag are encoded as one of a single 16-bit, 32-bit, 64-bit or 128-bit word. 6 . The computer-implemented method of claim 1 , wherein the first flag is stored in a position within the first location replacing a bit of the first value. 7 . The computer-implemented method of claim 6 , wherein the position corresponds to a least-significant bit of the first value. 8 . The computer-implemented method of claim 1 , wherein at least one additional value is associated with the first flag and, further comprising simultaneously writing the at least one additional value to the first memory location when the first value is written to the first memory location. 9 . The computer-implemented method of claim 1 , wherein execution of the sequence of instructions comprises execution of one or more operations using a neural network. 10 . The computer-implemented method of claim 1 , further comprising writing, by the third thread, the output to a third location simultaneously with updating a third flag stored in the third location in the shared memory from the first state to the second state. 11 . The computer-implemented method of claim 10 , further comprising: determining, by a fourth thread, that the third flag has changed from the first state to the second state; processing the output by the fourth thread to produce a fourth value; and updating the first flag stored in the first location from the second state to the first state simultaneously with writing the fourth value to the first location. 12 . A system, comprising: a multi-core parallel processor coupled to a shared memory and configured to: execute a set of threads to process inputs according to a sequence of instructions; generate a first value by a first thread in the set of threads; and write the first value to a first location in the shared memory simultaneously with updating a first flag stored in the first location from a first state to a second state, wherein the first flag is initialized to the second state when execution of the sequence of instructions is initiated. 13 . The system of claim 12 , wherein the multi-core parallel processor is further configured to: determine, by a third thread, that the first flag has changed from the first state to the second state; read, by the third thread, the first value from the first location before a second flag stored in a second location of the shared memory to be written by a second thread in the set of threads is updated; and process the first value by the third thread to produce an output. 14 . The system of claim 13 , wherein determining the first flag has changed from the first state to the second state comprises simultaneously reading the first flag and the first value from the first location by the third thread. 15 . The system of claim 14 , wherein determining the first flag has changed from the first state to the second state further comprises comparing the first flag to a valid state. 16 . The system of claim 13 , wherein the multi-core parallel processor is further configured to write, by the third thread, the output to a third location simultaneously with updating a third flag stored in the third location in the shared memory from the first state to the second state. 17 . The system of claim 13 , in the multi-core parallel processor is further configured to: determine, by a fourth thread, that the third flag has changed from the first state to the second state; process the output by the fourth thread to produce a fourth value; and update the first flag stored in the first location from the second state to the first state simultaneously with writing the fourth value to the first location. 18 . The system of claim 12 , wherein the first value and the first flag are encoded as one of a single 16-bit, 32-bit, 64-bit or 128-bit word. 19 . The system of claim 12 , wherein the first flag is stored in a position within the first location replacing a bit of the first value. 20 . A non-transitory computer-readable media storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of: executing a set of threads to process inputs according to a sequence of instructions; generating a first value by a first thread in the set of threads; and writing the first value to a first location of a shared memory simultaneously with updating a first flag stored in the first location from a first state to a second state, wherein the first flag is initialized to the second state when execution of the sequence of instructions is initiated.

Assignees

Inventors

Classifications

  • from multiple instruction streams, e.g. multistreaming · CPC title

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

  • Buffers; Shared memory; Pipes · CPC title

  • Atomic · CPC title

  • Condition code generation, e.g. Carry, Zero flag · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021286619A1 cover?
When communicating through shared memory, a producer thread generates a value that is written to a location in a shared memory. The value is read from the shared memory by a consumer thread. The challenge is to ensure that the consumer thread reads the location only after the value is written and is thereby synchronized. When a memory location is written by a producer thread, a flag that is sim…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/52. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Sep 16 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).