Compute optimization mechanism for deep neural networks

US12198221B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12198221-B2
Application numberUS-202418436494-A
CountryUS
Kind codeB2
Filing dateFeb 8, 2024
Priority dateApr 24, 2017
Publication dateJan 14, 2025
Grant dateJan 14, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.

First claim

Opening claim text (preview).

What is claimed is: 1. A graphics processing apparatus comprising: a system interconnect to a host processor; and a plurality of graphics processing clusters, each of the plurality of graphics processing clusters including a plurality of multiprocessors coupled via a crossbar interconnect, the crossbar interconnect to enable transfer of data from a first multiprocessor of the plurality of multiprocessors to a second multiprocessor of the plurality of multiprocessors, a graphics processing cluster of the plurality of graphics processing clusters including: a register file to store a plurality of different types of operands; a first plurality of processing resources of a first type configurable to process a first number of threads having operands stored in a first number of registers of the register file; and a second plurality of processing resources of a second type configurable to process a second number of threads having operands stored in a second number of registers of the register file, the first number of threads greater than the second number of threads and the second number of registers greater than the first number of registers. 2. The graphics processing apparatus of claim 1 , wherein the first plurality of processing resources is configured to perform multi-dimensional matrix operations on the operands stored in the first number of registers. 3. The graphics processing apparatus of claim 1 , wherein the second plurality of processing resources is configured to perform graphics operations on the operands stored in the second number of registers. 4. The graphics processing apparatus of claim 1 , further comprising compute circuitry to select processing resources from the first plurality of processing resources and the second plurality of processing resources to execute a workload. 5. The graphics processing apparatus of claim 4 , wherein the compute circuitry is to select processing resources of the first type to process a first type of application workload and to select the processing resources of the second type to process a second type of application workload. 6. The graphics processing apparatus of claim 1 , further comprising a memory device coupled with the plurality of graphics processing clusters. 7. The graphics processing apparatus of claim 6 , wherein the memory device includes a high bandwidth memory (HBM) including a plurality of memory channels. 8. The graphics processing apparatus of claim 7 , wherein a first memory channel of the HBM is configured to couple with one or more processing resources of the first plurality of processing resources of a first type and a second memory channel of the HBM is configured to couple with one or more processing resources of the second plurality of processing resources of a second type, the second memory channel distinct from the first memory channel. 9. The graphics processing apparatus of claim 1 , wherein the register file is configured to perform matrix-vector transformations. 10. The graphics processing apparatus of claim 1 , further comprising a shared local memory (SLM) configured to perform matrix-vector transformations. 11. A method comprising: storing operands for a plurality of different types of operations to a register file of a graphics processor including a plurality of graphics processing clusters, each of the plurality of graphics processing clusters including a plurality of multiprocessors coupled via a crossbar interconnect, the crossbar interconnect to enable transfer of data from a first multiprocessor of the plurality of multiprocessors to a second multiprocessor of the plurality of multiprocessors; processing a first number of threads having operands stored in a first number of registers of the register file via a first plurality of processing resources of a first type; and processing second number of threads having operands stored in a second number of registers of the register file via a second plurality of processing resources of a second type, the first number of threads greater than the second number of threads and the second number of registers greater than the first number of registers. 12. The method of claim 11 , comprising: performing multi-dimensional matrix operations on the operands stored in the first number of registers via the first plurality of processing resources; and performing graphics operations on the operands stored in the second number of registers via the second plurality of processing resources. 13. The method of claim 11 , further comprising selecting processing resources to execute a workload via compute circuitry of the graphics processor, including selecting processing resources of the first type to process a first type of application workload and selecting the processing resources of the second type to process a second type of application workload. 14. The method of claim 11 , further comprising performing matrix-vector transformations via the register file of the graphics processor. 15. The method of claim 11 , further comprising performing matrix-vector transformations via shared local memory (SLM) of the graphics processor. 16. A graphics processing system comprising: a system interconnect to a host processor; a memory device coupled with the system interconnect; and a graphics processor coupled with the system interconnect and the memory device, the graphics processor including a plurality of graphics processing clusters, each of the plurality of graphics processing clusters including a plurality of multiprocessors coupled via a crossbar interconnect, the crossbar interconnect to enable transfer of data from a first multiprocessor of the plurality of multiprocessors to a second multiprocessor of the plurality of multiprocessors, a graphics processing cluster of the plurality of graphics processing clusters including: a register file to store a plurality of different types of operands; a first plurality of processing resources of a first type configurable to process a first number of threads having operands stored in a first number of registers of the register file; and a second plurality of processing resources of a second type configurable to process a second number of threads having operands stored in a second number of registers of the register file, the first number of threads greater than the second number of threads and the second number of registers from greater than the first number of registers. 17. The graphics processing system of claim 16 , wherein the first plurality of processing resources is configured to perform multi-dimensional matrix operations on the operands stored in the first number of registers and the second plurality of processing resources is configured to perform graphics operations on the operands stored in the second number of registers. 18. The graphics processing system of claim 16 , further comprising compute circuitry to select processing resources to execute a workload, the compute circuitry to select processing resources of the first type to process a first type of application workload and to select the processing resources of the second type to process a second type of application workload. 19. The graphics processing system of claim 16 , wherein the memory device includes a high bandwidth memory (HBM) including a plurality of memory channels, a first memory channel of the HBM is configured to couple with one or more processing resources of the first plurality of processing resources of a first type, and a second memory channel of the HBM is configured to couple with one or mo

Assignees

Inventors

Classifications

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Distributed learning, e.g. federated learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12198221B2 cover?
Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first s…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/045. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).