What technology area does this patent fall under?

Primary CPC classification G06F9/5044. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 25 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Scalar core integration

US11016929B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11016929-B2
Application number	US-201916354782-A
Country	US
Kind code	B2
Filing date	Mar 15, 2019
Priority date	Mar 15, 2019
Publication date	May 25, 2021
Grant date	May 25, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.

First claim

Opening claim text (preview).

The invention claimed is: 1. A general purpose graphics processing device comprising: a scalar processor complex comprising a plurality of scalar processors; a vector processor complex comprising a plurality of vector processors; a hardware accelerator bank comprising a plurality of specialized hardware accelerators; and a pre-processor communicably coupled to the scalar processor complex and the vector processor complex, the pre-processor to: receive a set of workload instructions for a graphics workload received at the graphics processing device from a host complex; determine, based on an analysis of a binary translation of the set of workload instructions, a first subset of operations in the set of operations that is suitable for execution by the scalar processor complex, a second subset of operations in the set of operations that is suitable for execution by the vector processor complex, and a third subset of operations in the set of operations that is suitable for execution by the hardware accelerator bank; assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs; assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs; and assign the third subset of operations to the hardware accelerator bank for execution to generate a third set of outputs. 2. The general purpose graphics processing device of claim 1 , the processor to: continue to process workload instructions until the workload is finished. 3. The general purpose graphics processing device of claim 2 , the processor to: store the first set of outputs and the second set of outputs in a local memory. 4. The general purpose graphics processing device of claim 3 , the processor to: synchronize the local memory with a memory in the host complex after the workload is finished processing. 5. The general purpose graphics processing device of claim 3 , the processor to: synchronize an execution marker with the host complex after the workload is finished processing. 6. The general purpose graphics processing device of claim 1 , the processor to: receive a binary translation of a code; and divide the binary translation into a plurality of code segments. 7. The general purpose graphics processing device of claim 1 , wherein the first subset of operations comprises at least one of a stack push operation, a stack pop operation, a register spill operation; a register fill operation, a read operation, or a write operation. 8. A computer-implemented method comprising: receiving, by a pre-processor of a graphics processing device, a set of workload instructions for a graphics workload from a host complex, wherein the pre-processor is communicably coupled to a scalar processor complex comprising a plurality of scalar processors, to a vector processor complex comprising a plurality of vector processors, and to a hardware accelerator bank comprising a plurality of specialized hardware accelerators in the graphics processing device; determining, by the pre-processor based on an analysis of a binary translation of the set of workload instructions, a first subset of operations in the set of operations that is suitable for execution by the scalar processor complex, a second subset of operations in the set of operations that is suitable for execution by the vector processor complex, and a third subset of operations in the set of operations that is suitable for execution by the hardware accelerator bank; assigning, by the pre-processor, the first subset of operations to the scalar processor complex for execution to generate a first set of outputs; assigning, by the pre-processor, the second subset of operations to the vector processor complex for execution to generate a second set of outputs; and assigning, by the pre-processor, the third subset of operations to the hardware accelerator bank for execution to generate a third set of outputs. 9. The method of claim 8 , further comprising: continuing to process workload instructions until the workload is finished. 10. The method of claim 9 , further comprising storing the first set of outputs and the second set of outputs in a local memory. 11. The method of claim 10 , further comprising: synchronizing the local memory with a memory in the host complex after the workload is finished processing. 12. The method of claim 10 , further comprising: synchronizing an execution marker with the host complex after the workload is finished processing. 13. The method of claim 8 , further comprising: receiving a binary translation of a code; and dividing the binary translation into a plurality of code segments. 14. The method of claim 8 , wherein the first subset of operations comprises at least one of a stack push operation, a stack pop operation, a register spill operation; a register fill operation, a read operation, or a write operation. 15. One or more non-transitory computer-readable medium comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: receive, by a pre-processor of a graphics processing device, a set of workload instructions for a graphics workload from a host complex, wherein the pre-processor is communicably coupled to a scalar processor complex comprising a plurality of scalar processors, to a vector processor complex comprising a plurality of vector processors, and to a hardware accelerator bank comprising a plurality of specialized hardware accelerators in the graphics processing device; determine, by the pre-processor based on an analysis of a binary translation of the set of workload instructions, a first subset of operations in the set of operations that is suitable for execution by the scalar processor complex, a second subset of operations in the set of operations that is suitable for execution by the vector processor complex, and a third subset of operations in the set of operations that is suitable for execution by the hardware accelerator bank; assign, by the pre-processor, the first subset of operations to the scalar processor complex for execution to generate a first set of outputs; assign, by the pre-processor, the second subset of operations to the vector processor complex for execution to generate a second set of outputs; and assign, by the pre-processor, the third subset of operations to the hardware accelerator bank for execution to generate a third set of outputs. 16. The computer-readable medium of claim 15 , comprising one or more instructions that when executed on the at least one processor configure the at least one processor to: continue to process workload instructions until the workload is finished. 17. The computer-readable medium of claim 16 , comprising one or more instructions that when executed on the at least one processor configure the at least one processor to: store the first set of outputs and the second set of outputs in a local memory. 18. The computer-readable medium of claim 17 , comprising one or more instructions that when executed on the at least one processor configure the at least one processor to: synchronize the local memory with a memory in the host complex after the workload is finished processing. 19. The computer-readable medium of claim 17 , comprising one or more instructions that when executed on the at least one processor configure the at least one processor to: synchronize an execution marker with the host complex after th

Assignees

Intel Corp

Inventors

Classifications

G06F9/5044Primary
considering hardware capabilities · CPC title
G06F9/3877
using a secondary processor, e.g. coprocessor (peripheral processor G06F13/12) · CPC title
G06F15/8069Primary
using a cache · CPC title
G06F9/30163
with implied specifier, e.g. top of stack · CPC title
G06F9/3836
Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution · CPC title

Patent family

Related publications grouped by family.

View patent family 69845534

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11016929B2 cover?: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subse…
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06F9/5044. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 25 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Method and Computing System for Handling Instruction Execution Using Affine Register File on Graphic Processing Unit

Instruction set for supporting wide scalar pattern matches

Speculative scalarization in vector processing

Scalarization of Vector Processing

Performing multi-convolution operations in a parallel processing system

Frequently asked questions