Increasing Thread Payload for 3D Pipeline with Wider SIMD Execution Width

US2017178384A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017178384-A1
Application numberUS-201514976122-A
CountryUS
Kind codeA1
Filing dateDec 21, 2015
Priority dateDec 21, 2015
Publication dateJun 22, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Reducing SIMD fragmentation for SIMD execution widths of 32 or even 64 channels in a single hardware thread leads to better EU utilization. Increasing SIMD execution widths to 32 or 64 channels per thread, enables handling more vertices, patches, primitives and triangles per EU hardware thread. Modified 3D pipeline shader payloads can handle multiple patches in case of domain shaders or multiple primitives when primitive object instance count is greater than one in the case of geometry shaders and multiple triangles in case of pixel shaders.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: packing one of multiple vertices, patches, primitives or triangles in one graphics pipeline stage into one execution unit hardware thread. 2 . The method of claim 1 including modifying the pipeline domain shader payload to handle multiple patches. 3 . The method of claim 2 including: packing domain point data from different domain shader patches into one single instruction multiple data (SIMD) thread with each domain point occupying one SIMD lane; and storing an attribute for each domain point in its own partition in a register space addressable by a programmed thread. 4 . The method of claim 1 including modifying the pipeline geometry shader payload to handle multiple primitives when primitive objects instance count is greater than one. 5 . The method of claim 4 including replicating primitive unified return buffer handles into lanes containing an instance-ID of the primitive. 6 . The method of claim 1 including modifying the pipeline pixel shader payload to handle multiple triangles. 7 . The method of claim 6 including using barycentric parameters for attribute interpolation. 8 . The method of claim 7 including delivering a payload to a pixel shader including barycentric parameters per pixel or per sample with a set of vertex attribute deltas per channel for each attribute. 9 . The method of claim 1 including enabling attribute deltas from multiple triangles to be included in the same pixel shader payload. 10 . The method of claim 1 including packing for an SIMD width of 32 channels per thread or higher. 11 . One or more non-transitory computer readable media storing instructions to perform a sequence comprising: packing one of multiple vertices, patches, primitives or triangles in one graphics pipeline stage into one execution unit hardware thread. 12 . The media of claim 11 , further storing instructions to perform a sequence including modifying the pipeline domain shader payload to handle multiple patches. 13 . The media of claim 12 , further storing instructions to perform a sequence including: packing domain point data from different domain shader patches into one single instruction multiple data (SIMD) thread with each domain point occupying one SIMD lane; and storing an attribute for each domain point in its own partition in a register space addressable by a programmed thread. 14 . The media of claim 11 , further storing instructions to perform a sequence including modifying the pipeline geometry shader payload to handle multiple primitives when primitive objects instance count is greater than one. 15 . The media of claim 14 , further storing instructions to perform a sequence including replicating primitive unified return buffer handles into lanes containing an instance-ID of the primitive. 16 . The media of claim 11 , further storing instructions to perform a sequence including modifying the pipeline pixel shader payload to handle multiple triangles. 17 . The media of claim 16 , further storing instructions to perform a sequence including using barycentric parameters for attribute interpolation. 18 . The media of claim 17 , further storing instructions to perform a sequence including delivering a payload to a pixel shader including barycentric parameters per pixel or per sample with a set of vertex attribute deltas per channel for each attribute. 19 . The media of claim 11 , further storing instructions to perform a sequence including enabling attribute deltas from multiple triangles to be included in the same pixel shader payload. 20 . The media of claim 11 , further storing instructions to perform a sequence including packing for an SIMD width of 32 channels per thread or higher. 21 . An apparatus comprising: a processor to pack one of multiple vertices, patches, primitives or triangles in one graphics pipeline stage into one execution unit hardware thread; and a memory coupled to said processor. 22 . The apparatus of claim 21 , said processor to modify the pipeline domain shader payload to handle multiple patches. 23 . The apparatus of claim 22 , said processor to pack domain point data from different domain shader patches into one single instruction multiple data (SIMD) thread with each domain point occupying one SIMD lane, and to store an attribute for each domain point in its own partition in a register space addressable by a programmed thread. 24 . The apparatus of claim 21 , said processor to modify the pipeline geometry shader payload to handle multiple primitives when primitive objects instance count is greater than one. 25 . The apparatus of claim 24 , said processor to replicate primitive unified return buffer handles into lanes containing an instance-ID of the primitive. 26 . The apparatus of claim 21 , said processor to modify the pipeline pixel shader payload to handle multiple triangles. 27 . The apparatus of claim 26 , said processor to use barycentric parameters for attribute interpolation. 28 . The apparatus of claim 27 , said processor to deliver a payload to a pixel shader including barycentric parameters per pixel or per sample with a set of vertex attribute deltas per channel for each attribute. 29 . The apparatus of claim 21 , said processor to enable attribute deltas from multiple triangles to be included in the same pixel shader payload. 30 . The apparatus of claim 21 , said processor to pack for an SIMD width of 32 channels per thread or higher.

Assignees

Inventors

Classifications

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • Constructive solid geometry [CSG] using solid primitives, e.g. cylinders, cubes · CPC title

  • G06T15/005Primary

    General purpose rendering architectures · CPC title

  • Shading · CPC title

  • Concurrent instruction execution, e.g. pipeline or look ahead · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017178384A1 cover?
Reducing SIMD fragmentation for SIMD execution widths of 32 or even 64 channels in a single hardware thread leads to better EU utilization. Increasing SIMD execution widths to 32 or 64 channels per thread, enables handling more vertices, patches, primitives and triangles per EU hardware thread. Modified 3D pipeline shader payloads can handle multiple patches in case of domain shaders or multipl…
Who is the assignee on this patent?
Venkatesh Jayashree, Chen Gang, Raoux Thomas F, and 2 more
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 22 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).