What technology area does this patent fall under?

Primary CPC classification G06F1/3296. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 31 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Warp clustering

US9804666B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9804666-B2
Application number	US-201514721304-A
Country	US
Kind code	B2
Filing date	May 26, 2015
Priority date	May 26, 2015
Publication date	Oct 31, 2017
Grant date	Oct 31, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Units of shader work, such as warps or wavefronts, are grouped into clusters. An individual vector register file of a processor is operated as segments, where a segment may be independently operated in an active mode or a reduced power data retention mode. The scheduling of the clusters is selected so that a cluster is allocated a segment of the vector register file. Additional sequencing may be performed for a cluster to reach a synchronization point. Individual segments are placed into the reduced power data retention mode during a latency period when the cluster is waiting for execution of a request, such as a sample request.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of reducing power consumption in a shader of a graphics processing system, the method comprising: organizing a vector register file into a plurality of segments of physical memory, with each segment having an active mode and a reduced power data retention mode independently selectable from other segments of the vector register file; allocating each of the segments as a resource for a respective one of a plurality of clusters of multiple shader units of work assigned to a processor and having temporal locality and spatial locality; scheduling execution of the clusters in a sequence; and placing each of the segments that are respectively associated with the clusters that are in an inactive state into the reduced power data retention mode during at least a portion of a latency period for a texture load for the clusters. 2. The method of claim 1 , wherein the clusters are placed into the inactive state in response to completion of sending texture sample or memory load store commands of the cluster to an external unit. 3. The method of claim 1 , wherein the clusters are placed into the inactive state in response to completion of sending texture sample or memory load store commands of the cluster to a texture unit. 4. The method of claim 1 , further comprising using the vector register file as a resource for units of shader work in which each unit of shader work comprises a group of shader threads to perform Single Instruction Multiple Thread (SIMT) processing. 5. The method of claim 1 , further comprising prioritizing the shader units of work within each of the clusters to reach a synchronization point for loading a texture sample. 6. The method of claim 1 , wherein the clusters are assigned to consecutive shader tasks of a shader stage. 7. The method of claim 1 , wherein each shader unit of work is a unit of thread scheduling. 8. A method of reducing power consumption in a shader of a graphics processing system, the method comprising: scheduling clusters of shader work for a plurality of processors, each cluster including a plurality of shader units of work assigned to a processor and having temporal locality and spatial locality; for each cluster, allocating a respective segment of physical memory of a vector register file as a resource, each segment having an active mode and a reduced power data retention mode independently selectable from other segments; scheduling execution of the clusters in a sequence; rotating execution of the clusters; and placing segments of inactive clusters into the reduced power data retention mode during at least a portion of a latency period for a texture load for the inactive clusters. 9. The method of claim 8 , further comprising placing segments of inactive clusters into the reduced power data retention mode during at least a latency for a data access. 10. The method of claim 1 , further comprising placing the segments of each of the clusters awaiting a data load into the reduced power data retention mode. 11. The method of claim 8 , further comprising using the vector register file as a resource for units of shader work in which each unit of shader work has a group of shader threads to perform Single Instruction Multiple Thread (SIMT) processing. 12. The method of claim 8 , further comprising prioritizing the shader units of work within each cluster to reach a synchronization point for loading a texture sample. 13. The method of claim 8 , further comprising assigning the clusters to consecutive shader tasks of a shader stage. 14. The method of claim 8 , wherein each shader unit of work is a unit of thread scheduling. 15. A graphics processing unit, comprising: a plurality of programmable processors to perform Single Instruction Multiple Thread (SIMT) processing of shading instructions, each programmable processor including a vector register file having a plurality of data segments, each segment having an active mode and a reduced power data retention mode independently selectable from other segments; a scheduler to schedule clusters of shader work for the plurality of programmable processors, each cluster including a plurality of shader units of work assigned to an individual processor and having temporal locality and spatial locality, with each cluster supported by a segment of the vector register file of the assigned individual processor, the scheduler for selecting a schedule to rotate execution of the clusters to place segments of inactive clusters into the reduced power data retention mode during at least a portion of a latency period associated with an operation request by the cluster; and an external memory comprising a texture unit, wherein segments of inactive clusters are placed in the reduced power data retention mode during at least a portion of a latency period associated with accessing the external memory for a texture access of a cluster. 16. The graphics processing unit of claim 15 , further comprising a sequencer to prioritize the shader units of work within each cluster to reach a synchronization point. 17. The graphics processing unit of claim 15 , further comprising a load and store unit to access the external memory, wherein segments of inactive clusters are placed into the reduced power data retention mode during at least a portion of a latency period associated with accessing the external memory for a cluster. 18. A graphics processing unit, comprising: a shader including a programmable processing element; a vector register file used as a resource for units of shader work in which each unit of shader work has a group of shader threads to perform Single Instruction Multiple Thread (SIMT) processing and multiple groups of shader threads are formed into a cluster, the vector register file allocated as a plurality of individual segments; a scheduler to group clusters of units of shader work and select a schedule to assign an individual cluster to a segment of the vector register file and place the segment into a reduced power data retention mode during a latency period when the cluster is waiting for a result of a sample request during at least a portion of a latency period associated with an operation request by the cluster; and an external memory comprising a texture unit, wherein segments of inactive clusters are placed in the reduced power data retention mode during at least a portion of a latency period associated with accessing the external memory for a texture access of a cluster.

Assignees

Samsung Electronics Co Ltd

Inventors

Jiao Yang

Classifications

G06F1/3296Primary
by lowering the supply or operating voltage · CPC title
G06F1/26
Power supply means, e.g. regulation thereof (for memories G11C) · CPC title
G06T1/60
Memory management · CPC title
G06F1/3275Primary
Power saving in memory, e.g. RAM, cache · CPC title
Y02D10/00
Energy efficient computing, e.g. low power processors, power management or thermal management · CPC title

Patent family

Related publications grouped by family.

View patent family 57398468

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9804666B2 cover?: Units of shader work, such as warps or wavefronts, are grouped into clusters. An individual vector register file of a processor is operated as segments, where a segment may be independently operated in an active mode or a reduced power data retention mode. The scheduling of the clusters is selected so that a cluster is allocated a segment of the vector register file. Additional sequencing may b…
Who is the assignee on this patent?: Samsung Electronics Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06F1/3296. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 31 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).