What technology area does this patent fall under?

Primary CPC classification G06F9/3009. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 09 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Intelligent thread dispatch and vectorization of atomic operations

US10346166B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10346166-B2
Application number	US-201715581080-A
Country	US
Kind code	B2
Filing date	Apr 28, 2017
Priority date	Apr 28, 2017
Publication date	Jul 9, 2019
Grant date	Jul 9, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A mechanism is described for facilitating intelligent dispatching and vectorizing at autonomous machines. A method of embodiments, as described herein, includes detecting a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a graphics processor. The method may further include determining a first set of threads of the plurality of threads that are similar to each other or have adjacent surfaces, and physically clustering the first set of threads close together using a first set of adjacent compute blocks.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: one or more processors including a graphics processor, the one or more processors to: detect a plurality of threads corresponding to a plurality of workloads associated with tasks relating to the one or more processors; determine a first set of threads of the plurality of threads that are similar to each other or have adjacent surfaces, and further to physically cluster the first set of threads close together using a first set of adjacent compute blocks; facilitate vectorized lock operations such that multiple operands for the first set of threads are one or more of locked, modified, and written back simultaneously; and read operand width and vector length from a data stream and set or clear lock bits to perform operations on vector data of the data stream based on the operand width and the vector length. 2. The apparatus of claim 1 , wherein the one or more processors are further to determine a second set of threads of the plurality of threads that are disjoined or dissimilar to each other, where the second set of threads are launched on a second set of compute blocks to avoid address conflict with the first set of compute blocks. 3. The apparatus of claim 1 , wherein the first and second sets of compute blocks are backed by shared resources containing cache to keep locality in a memory space or a pixel space to provide utilization for common areas. 4. The apparatus of claim 1 , wherein the one or more processors are to prefetch data into one or more caches simultaneously as one or more of the plurality of threads are being loaded into a shader core. 5. The apparatus of claim 1 , wherein the graphics processor is co-located with an application processor on a common semiconductor package. 6. A method comprising: detecting a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a processor including a graphics processor; determining a first set of threads of the plurality of threads that are similar to each other or have adjacent surfaces; physically clustering the first set of threads close together using a first set of adjacent compute blocks; facilitating vectorized lock operations such that multiple operands for the first set of threads are one or more of locked, modified, and written back simultaneously; reading operand width and vector length from a data stream; and setting or clearing lock bits to perform operations on vector data of the data stream based on the operand width and the vector length. 7. The method of claim 6 , further comprising determining a second set of threads of the plurality of threads that are disjoined or dissimilar to each other, where the second set of threads are launched on a second set of compute blocks to avoid address conflict with the first set of compute blocks. 8. The method of claim 7 , wherein the first and second sets of compute blocks are backed by shared resources containing cache to keep locality in a memory space or a pixel space to provide utilization for common areas. 9. The method of claim 6 , further comprising prefetching data into one or more caches simultaneously as one or more of the plurality of threads are being loaded into a shader core. 10. The method of claim 6 , wherein the graphics processor is co-located with an application processor on a common semiconductor package. 11. At least one non-transitory machine-readable medium comprising instructions that when executed by a computing device, cause the computing device to perform operations comprising: detecting a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a processor including a graphics processor; determining a first set of threads of the plurality of threads that are similar to each other or have adjacent surfaces; physically clustering the first set of threads close together using a first set of adjacent compute blocks; facilitating vectorized lock operations such that multiple operands for the first set of threads are one or more of locked, modified, and written back simultaneously; reading operand width and vector length from a data stream; and set or clear lock bits to perform operations on vector data of the data stream based on the operand width and the vector length. 12. The machine-readable medium of claim 11 , wherein the operations further comprise: determining a second set of threads of the plurality of threads that are disjoined or dissimilar to each other, where the second set of threads are launched on a second set of compute blocks to avoid address conflict with the first set of compute blocks. 13. The machine-readable medium of claim 12 , wherein the first and second sets of compute blocks are backed by shared resources containing cache to keep locality in a memory space or a pixel space to provide utilization for common areas. 14. The machine-readable medium of claim 11 , wherein the operations further comprise prefetching data into one or more caches simultaneously as one or more of the plurality of threads are being loaded into a shader core. 15. The machine-readable medium of claim 11 , wherein the graphics processor is co-located with an application processor on a common semiconductor package.

Assignees

Intel Corp

Inventors

Classifications

G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/048
Activation functions · CPC title
G06N3/045
Combinations of networks · CPC title
G06T1/60
Memory management · CPC title
G06F9/3836
Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution · CPC title

Patent family

Related publications grouped by family.

View patent family 61655692

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10346166B2 cover?: A mechanism is described for facilitating intelligent dispatching and vectorizing at autonomous machines. A method of embodiments, as described herein, includes detecting a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a graphics processor. The method may further include determining a first set of threads of the plurality of threads that are si…
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06F9/3009. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 09 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Data processing systems

Performing multi-convolution operations in a parallel processing system

Thread scheduling across heterogeneous processing elements with resource mapping

Frequently asked questions