Dynamically detecting uniformity and eliminating redundant computations to reduce power consumption

US11055097B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11055097-B2
Application numberUS-201314048647-A
CountryUS
Kind codeB2
Filing dateOct 8, 2013
Priority dateOct 8, 2013
Publication dateJul 6, 2021
Grant dateJul 6, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment of the present invention includes techniques to decrease power consumption by reducing the number of redundant operations performed. In operation, a streamlining multiprocessor (SM) identifies uniform groups of threads that, when executed, apply the same deterministic operation to uniform sets of input operands. Within each uniform group of threads, the SM designates one thread as the anchor thread. The SM disables execution units assigned to all of the threads except the anchor thread. The anchor execution unit, assigned to the anchor thread, executes the operation on the uniform set of input operands. Subsequently, the SM sets the outputs of the non-anchor threads included in the uniform group of threads to equal the value of the anchor execution unit output.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system configured to eliminate redundant computations, the system comprising: a memory that includes a first set of operands associated with a first thread and a second set of operands associated with a second thread; and a streaming multiprocessor coupled to the memory and configured to: determine whether an operator corresponding to a given deterministic operation is assigned to both the first thread and the second thread; determine whether a value of each operand in the first set of operands equals a value of a corresponding operand in the second set of operands; in response to determining that the operator corresponding to the given deterministic operation is assigned to both the first thread and the second thread and that the value of each operand in the first set of operands equals the value of a corresponding operand in the second set of operands, determine that the first thread and the second thread are uniform; cause the first thread to execute, within a first execution unit, the operator on the first set of operands to generate a first output; in response to determining that the first thread and the second thread are uniform, activate a first uniformity signal; and in response to activating the first uniformity signal, cause the second thread to set a second output equal to the first output without executing the operator on the second set of operands. 2. The system of claim 1 , wherein the first uniformity signal causes a second execution unit on which the second thread executes to be disabled. 3. The system of claim 2 , wherein the first uniformity signal causes the second execution unit to be clock-gated. 4. The system of claim 2 , wherein the second thread is associated with an output multiplexer that is configured to select between the first output and an output of the second execution unit. 5. The system of claim 2 , wherein the first execution unit comprises an arithmetic logic unit. 6. The system of claim 1 , wherein the streaming multiprocessor is further configured to set a first uniformity tag to indicate that the second output has been set equal to the first output. 7. The system of claim 6 , wherein the streaming multiprocessor is further configured to perform write operations to store the first uniformity tag in the memory. 8. The system of claim 7 , wherein determining that the value of each operand in the first set of operands equals the value of a corresponding operand in the second set of operands comprises determining that a first value of each uniformity tag associated with the second set of operands has been set. 9. The system of claim 1 , wherein the memory further includes a third set of operands associated with the first thread and a fourth set of operands associated with the second thread, and the streaming multiprocessor is further configured to: determine that a second operator corresponding to a second deterministic operation is assigned to both the first thread and the second thread; determine that a value of an operand included in the third set of operands does not equal a value of a corresponding operand included in the fourth set of operands; cause the first thread to execute the second operator on the third set of operands to generate a third output; and cause the second thread to execute the second operator on the fourth set of operands to generate a fourth output. 10. The system of claim 1 , wherein the streaming multiprocessor includes the first execution unit and one or more other execution units, wherein the first execution unit is an anchor execution unit, and wherein the streaming multiprocessor is configured to assign the first thread to the anchor execution unit. 11. The system of claim 1 , wherein the streaming multiprocessor is further configured to select the first thread as an anchor thread based on a thread identifier associated with the first thread. 12. The system of claim 1 , wherein the streaming multiprocessor is further configured to: determine that a third thread and a fourth thread are uniform based on a value of each operand included in a third set of operands equaling a value of a corresponding operand included in a fourth set of operands; subsequent to determining that the first thread and the second thread are uniform and subsequent to determining that the third thread and the fourth thread are uniform, determine that the first thread, the second thread, the third thread, and the fourth thread are uniform, wherein the first uniformity signal is activated in response to determining that the first thread, the second thread, the third thread, and the fourth thread are uniform, and wherein, in response to the first uniformity signal being activated, the third thread sets a third output equal to the first output without executing the operator on the third set of operands and, in response to the first uniformity signal being activated, the fourth thread sets a fourth output equal to the first output without executing the operator on the fourth set of operands. 13. A computer-implemented method for eliminating redundant computations, the method comprising: determining whether an operator corresponding to a given deterministic operation is assigned to both a first thread and a second thread; determining whether a value of each operand in a first set of operands equals a value of a corresponding operand in a second set of operands; in response to determining that the operator corresponding to the given deterministic operation is assigned to both the first thread and the second thread and that the value of each operand in the first set of operands equals the value of a corresponding operand in the second set of operands, determining that the first thread and the second thread are uniform; causing the first thread to execute the operator corresponding to the given deterministic operation on the first set of operands to generate a first output; in response to determining that the first thread and the second thread are uniform, activating a first uniformity signal; and in response to activating the first uniformity signal, causing the second thread to set a second output equal to the first output without executing the operator corresponding to the given deterministic operation on the second set of operands. 14. The method of claim 13 , wherein the first thread executes on a first execution unit, and the first uniformity signal causes a second execution unit on which the second thread executes to be disabled. 15. The method of claim 14 , wherein the first uniformity signal causes the second execution unit to be clock-gated. 16. The method of claim 14 , wherein the second thread is associated with an output multiplexer that is configured to select between the first output and an output of the second execution unit. 17. The method of claim 14 , wherein the first execution unit comprises an arithmetic logic unit. 18. The method of claim 13 , further comprising setting a first uniformity tag to indicate that the second output has been set equal to the first output. 19. The method of claim 18 , further comprising performing write operations to store the first uniformity tag in a memory. 20. The method of claim 18 , wherein determining that the value of each operand in the first set of operands equals the value of a corresponding operand in the second set of operands comprises determining that a value of each uniformity tag associated with the second set of operands has been set. 21. The metho

Assignees

Inventors

Classifications

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

  • from multiple instruction streams, e.g. multistreaming · CPC title

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • to perform conditional operations, e.g. using predicates or guards · CPC title

  • Instruction operation extension or modification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11055097B2 cover?
One embodiment of the present invention includes techniques to decrease power consumption by reducing the number of redundant operations performed. In operation, a streamlining multiprocessor (SM) identifies uniform groups of threads that, when executed, apply the same deterministic operation to uniform sets of input operands. Within each uniform group of threads, the SM designates one thread a…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/30072. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 06 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).