Compute optimization mechanism for deep neural networks

US11922535B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11922535-B2
Application numberUS-202318168207-A
CountryUS
Kind codeB2
Filing dateFeb 13, 2023
Priority dateApr 24, 2017
Publication dateMar 5, 2024
Grant dateMar 5, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.

First claim

Opening claim text (preview).

What is claimed is: 1. A graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including: a register file to store a plurality of different types of operands; and a plurality of processing cores, including: a first set of processing cores of a first type to perform multi-dimensional matrix operations on a first set of operands in a first set of registers of the register file, wherein the first set of processing cores of the first type includes circuitry to execute instructions to perform matrix operations on the first set of operands in the first set of registers of the register file and the first set of processing cores of the first type are associated with a first memory channel of a memory device coupled with the at least one of the one or more multiprocessors; and a second set of processing cores of a second type, the second set of processing cores being different from the first set of processing cores, the second set of processing cores to perform general purpose graphics processing unit (GPGPU) operations on a second set of operands in a second set of registers of the register file, wherein the second set of processing cores of the second type are associated with a second memory channel of the memory device coupled with the at least one of the one or more multiprocessors, the second memory channel is distinct from the first memory channel, and the memory device is external to the at least one of the one or more multiprocessors. 2. The graphics processing unit as in claim 1 , wherein the second set of processing cores comprises: a set of floating point units (FPUs) to execute instructions to perform floating point operations, the set of FPUs to perform 32-bit floating point (FP32) operations and 16-bit floating point (FP16) operations; and a set of integer units to execute instructions to perform integer operations. 3. The graphics processing unit as in claim 2 , wherein the set of FPUs includes first FPUs to perform the FP32 operations and second FPUs to perform the FP16 operations. 4. The graphics processing unit as in claim 1 , wherein the first set of operands include one or more 64-bit operands. 5. The graphics processing unit as in claim 1 , wherein the first set of processing cores of the first type is configured to perform an in-place matrix to vector transformation for a first type of operand stored in the register file, the in-place matrix to vector transformation includes a set of operations having a source and destination, the source and destination within the register file, wherein the source includes a register address start limit, stride, number of elements, and element size. 6. The graphics processing unit as in claim 1 , wherein the one or more multiprocessors have a single instruction multiple thread (SIMT) architecture. 7. A method to facilitate processing of data at a graphics processing unit (GPU) including one or more multiprocessors, the method comprising: receiving, at a first set of processing cores of a first type, a first set of operands from first registers of a register file, wherein the first set of processing cores of the first type includes circuitry to execute instructions to perform matrix operations on the first set of operands in the first set of registers of the register file; receiving, at a second set of processing cores of a second type, a second set of operands from second registers of the register file, the second set of processing cores being different from the first set of processing cores; performing multi-dimensional matrix math operations on the first set of operands at the first set of processing cores of the first type, wherein the first set of processing cores of the first type are associated with a first memory channel of a memory device coupled with at least one of the one or more multiprocessors; and performing general-purpose graphics processing unit (GPGPU) operations on the second set of operands at the second set of processing cores, wherein the second set of processing cores of the second type are associated with a second memory channel of the memory device coupled with the at least one of the one or more multiprocessors, the second memory channel is distinct from the first memory channel, and the memory device is external to the at least one of the one or more multiprocessors. 8. The method as in claim 7 , wherein performing the GPGPU operations at the second set of processing cores comprises: executing instructions at a set of floating point units (FPUs) to perform floating point operations, wherein executing the instructions at the set of FPUs comprises performing 32-bit floating point (FP32) operations and 16 -bit floating point (FP16) operations; and executing instructions at a set of integer units to perform integer operations. 9. The method as in claim 8 , further comprising performing the FP32 operations at first FPUs of the set of FPUs and performing the FP16 operations at second FPUs of the set of FPUs. 10. The method as in claim 7 , wherein the first set of operands include one or more 64-bit operands. 11. The method as in claim 7 , further comprising performing an in-place matrix to vector transformation for a first type of operand stored in the register file via the first set of processing cores of the first type, wherein the in-place matrix to vector transformation includes a set of operations having a source and destination, the source and destination are within the register file, and the source includes a register address start limit, stride, number of elements, and element size. 12. The method as in claim 7 , wherein the GPU includes one or more multiprocessors comprising the first set of processing cores of the first type, the second set of processing cores of the second type, and an instruction cache to store a first instruction associated with the first set of operands and a second instruction associated with the second set of operands, wherein the one or more multiprocessors have a single instruction multiple thread (SIMT) architecture. 13. A graphics processing system comprising: a graphics processing unit comprising one or more multiprocessors having a single instruction multiple thread (SIMT) architecture, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores, including: a first set of processing cores of a first type to perform multi-dimensional matrix operations on a first set of operands in a first set of registers of the register file, wherein the first set of processing cores of the first type includes circuitry to execute instructions to perform matrix operations on the first set of operands in the first set of registers of the register file and the first set of processing cores of the first type are associated with a first memory channel of a memory device coupled with the at least one of the one or more multiprocessors; and a second set of processing cores of a second type, the second set of processing cores being different from the first set of processing cores, the second set of processing cores to perform general purpose graphics processing unit (GPGPU) operations on a second set of operands in a second set of registers of the register file, wherein the second set of processing cores of the second type are associated with a second memory channel of the memory device coupled with the at least one of the one or more multiprocessors, the second memory channel is distinct from the first memory channel, and the memory device is external to the at least one of the one or more multiprocessors.

Assignees

Inventors

Classifications

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Distributed learning, e.g. federated learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11922535B2 cover?
Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first s…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 05 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).