Concurrent multi-datatype execution within a processing resource

US12175252B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12175252-B2
Application numberUS-202217839856-A
CountryUS
Kind codeB2
Filing dateJun 14, 2022
Priority dateApr 24, 2017
Publication dateDec 24, 2024
Grant dateDec 24, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an integer operation; and a general-purpose graphics compute unit having a single instruction, multiple thread architecture, the general-purpose graphics compute unit to concurrently execute the first instruction and the second instruction.

First claim

Opening claim text (preview).

What is claimed is: 1. A graphics processing unit (GPU) comprising: a processing cluster comprising a plurality of multiprocessors interconnected via a data crossbar, the plurality of multiprocessors configured to distribute processed data among the plurality of multiprocessors directly via the data crossbar, from a first multiprocessor of the plurality of multiprocessors to a second multiprocessor of the plurality of multiprocessors, wherein a multiprocessor of the plurality of multiprocessors comprises: an instruction cache to store a first instruction and a second instruction, the first instruction to cause the multiprocessor to perform a floating-point operation and the second instruction to cause the multiprocessor to perform an integer operation; and a plurality of general-purpose graphics compute units having a single instruction, multiple thread architecture, the plurality of general-purpose graphics compute units including a first general-purpose graphics compute unit to execute the first instruction concurrently with execution of the second instruction by a second general-purpose graphics compute unit. 2. The GPU as in claim 1 , wherein the floating-point operation includes a 32-bit floating point input operand and the integer operation includes a 32-bit integer input operation. 3. The GPU as in claim 1 , wherein the floating-point operation includes a 16-bit floating point input operand and the integer operation includes a 16-bit integer input operation. 4. The GPU as in claim 3 , wherein the 16-bit floating point input operand is in a half-precision floating point format. 5. The GPU as in claim 1 , additionally including a scheduler to independently schedule threads of the first instruction and the second instruction to the multiprocessor, wherein threads of the first instruction and the second instruction have independent thread state. 6. The GPU as in claim 1 , wherein the instruction cache is to store a third instruction, the third instruction to cause the GPU to perform a mixed precision matrix multiply operation via the multiprocessor. 7. The GPU as in claim 6 , the third instruction is to cause the multiprocessor to compute a 32-bit product from two or more 8-bit integer operands in response to the third instruction. 8. The GPU as in claim 6 , wherein the third instruction is to cause the multiprocessor to compute a 32-bit product from two or more 16-bit floating-point operands in response to the third instruction. 9. The GPU as in claim 8 , wherein the two or more 16-bit floating-point operands are half-precision floating-point operands. 10. A data processing system comprising: a memory device configured to store instructions; and a graphics processing unit (GPU) to execute the instructions, the instructions including a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, and the second instruction to cause the GPU to perform an integer operation, the GPU including: a processing cluster comprising a plurality of multiprocessors interconnected via a data crossbar, the plurality of multiprocessors configured to distribute processed data among the plurality of multiprocessors directly via the data crossbar, from a first multiprocessor of the plurality of multiprocessors to a second multiprocessor of the plurality of multiprocessors, wherein a multiprocessor of the plurality of multiprocessors includes a plurality of general-purpose graphics compute units having a single instruction, multiple thread architecture, the plurality of general-purpose graphics compute units including a first general-purpose graphics compute unit to execute the first instruction concurrently with execution of the second instruction by a second general-purpose graphics compute unit. 11. The data processing system as in claim 10 , wherein the floating-point operation includes a 32-bit floating point input operand and the integer operation includes a 32-bit integer input operation. 12. The data processing system as in claim 10 , wherein the floating-point operation includes a 16-bit floating point input operand and the integer operation includes a 16-bit integer input operation. 13. The data processing system as in claim 12 , wherein the 16-bit floating point input operand is in a half-precision floating point format. 14. The data processing system as in claim 10 , additionally including a scheduler to independently schedule threads of the first instruction and the second instruction to the multiprocessor, wherein threads of the first instruction and the second instruction have independent thread state. 15. The data processing system as in claim 10 , wherein the instructions include a third instruction, the third instruction to cause the GPU to perform a mixed precision matrix multiply operation. 16. The data processing system as in claim 15 , wherein the third instruction is to cause the multiprocessor to compute a 32-bit product from two or more 8-bit integer operands in response to the third instruction. 17. The data processing system as in claim 15 , wherein the third instruction is to cause the multiprocessor to compute a 32-bit product from two or more 16-bit floating-point operands in response to the third instruction. 18. The data processing system as in claim 17 , wherein the two or more 16-bit floating-point operands are half-precision floating-point operands. 19. A method comprising: decoding a first instruction and a second instruction on a graphics processing unit (GPU), the GPU having a single instruction, multiple thread architecture; independently scheduling multiple threads of the first instruction and the second instruction, the multiple threads of the first instruction and the second instruction having independent thread state; simultaneously executing the first instruction and the second instruction on a first multiprocessor of the GPU, wherein executing the first instruction includes performing a floating-point operation and executing the second instruction includes performing an integer operation; and distributing a result of the first instruction and the second instruction to a second multiprocessor directly via a data crossbar that couples the first multiprocessor with the second multiprocessor, the first multiprocessor and the second multiprocessor included within a processing cluster comprising a plurality of multiprocessors. 20. The method as in claim 19 , wherein the floating-point operation includes a 32-bit floating point input operand and the integer operation includes a 32-bit integer input operation. 21. The method as in claim 19 , wherein the floating-point operation includes a 16-bit floating point input operand and the integer operation includes a 16-bit integer input operation. 22. The method as in claim 21 , wherein the 16-bit floating point input operand is in a half-precision floating point format. 23. The method as in claim 19 , additionally comprising: decoding a third instruction, the third instruction to cause the GPU to perform a mixed precision matrix multiply operation; and in response to the third instruction, computing a 32-bit product from two or more 8-bit integer operands or two or more 16-bit floating-point operands. 24. The method as in claim 23 , wherein the two or more 16-bit floating-point operands are half-precision floating-point operands.

Assignees

Inventors

Classifications

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • Distributed learning, e.g. federated learning · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12175252B2 cover?
One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an intege…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 24 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).