Machine learning training architecture for programmable devices

US2025199762A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025199762-A1
Application numberUS-202519069183-A
CountryUS
Kind codeA1
Filing dateMar 3, 2025
Priority dateMar 27, 2019
Publication dateJun 19, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A programmable device may be configured to support machine learning training operations using matrix multiplication circuitry. In some embodiments, the multiplication is implemented on a systolic array. The systolic array includes an array of processing elements, each of which includes hybrid floating-point dot-product circuitry.

First claim

Opening claim text (preview).

What is claimed is: 1 . An integrated circuit, comprising: a floating-point multiplier; a fixed-point multiplier; and an adder coupled to the floating-point multiplier and the fixed-point multiplier, wherein the adder generates output data based on receiving a first signal from the floating-point multiplier and a second signal from the fixed-point multiplier. 2 . The integrated circuit of claim 1 , wherein the floating-point multiplier comprises hard logic circuitry, and wherein the fixed-point multiplier comprises hard logic circuitry and soft logic circuitry. 3 . The integrated circuit of claim 1 , wherein the floating-point multiplier and the fixed-point multiplier receive input signals of a first floating-point format, and wherein the floating-point multiplier outputs signals in a second floating-point format that is different than the first floating-point format. 4 . The integrated circuit of claim 3 , wherein the fixed-point multiplier outputs signals in a third floating-point format that is different than the first and second floating-point formats. 5 . The integrated circuit of claim 4 , comprising a format conversion circuit coupled to the floating-point multiplier, wherein the format conversion circuit converts the first signal from the second floating-point format to the third floating-point format having a greater number of exponent bits than the first floating-point format. 6 . The integrated circuit of claim 3 , wherein the first floating-point format is a BFLOAT16 format having one sign bit, eight exponent bits, and at most seven fraction bits. 7 . The integrated circuit of claim 4 , wherein the adder generates an amount of truncation for the third floating-point format and the third floating-point format has an adjustable number of fraction bits. 8 . The integrated circuit of claim 1 , comprising interface circuitry configurable to receive a first data matrix and a second data matrix from off-chip memory. 9 . The integrated circuit of claim 8 , comprising a load circuit coupled to the interface circuitry, wherein the load circuit receives first matrix data and second matrix data. 10 . The integrated circuit of claim 9 , comprising a multiplier circuit configurable to generate the first signal and the second signal based on loading the first matrix data and the second matrix data in the floating-point multiplier and the fixed-point multiplier. 11 . The integrated circuit of claim 10 , wherein the multiplier circuit generates the first signal and the second signal based at least in part by: loading a first portion of the first matrix data and the second matrix data in the floating-point multiplier; and loading a second portion of the first matrix data and the second matrix data in the fixed-point multiplier. 12 . The integrated circuit of claim 1 , comprising accumulation storage to receive the output data. 13 . The integrated circuit of claim 12 , wherein the adder generates the output data based on feedback data from the accumulation storage. 14 . The integrated circuit of claim 1 , comprising circuitry to compensate a latency discrepancy between routing to the floating-point multiplier and routing to the fixed-point multiplier. 15 . A machine learning training circuit, comprising: a load circuit configurable to receive, from off-chip memory, first matrix data and second matrix data; a multiplier circuit configurable to generate result data based on loading the first matrix data and the second matrix data in a floating-point multiplier and in a fixed-point multiplier; and a store circuit configurable to write, to the off-chip memory, the result data. 16 . The machine learning training circuit of claim 15 , comprising one or more systolic arrays, wherein the multiplier circuit is configurable to generate result data based on loading the first matrix data and the second matrix data in the floating-point multiplier and in the fixed-point multiplier using the one or more systolic arrays. 17 . The machine learning training circuit of claim 15 , wherein the multiplier circuit generates the result data based at least in part by: loading a first portion of the first matrix data and the second matrix data in the floating-point multiplier; and loading a second portion of the first matrix data and the second matrix data in the fixed-point multiplier. 18 . Circuitry, comprising: a floating-point multiplier; a fixed-point multiplier; and one or more delay registers to delay first input data transmitted to the floating-point multiplier relative to second input data transmitted to the fixed-point multiplier. 19 . The circuitry of claim 18 , wherein a delay added via the one or more delay registers is configurable to compensate for a latency discrepancy between the floating-point multiplier and the fixed-point multiplier. 20 . The circuitry of claim 18 , comprising an adder coupled to the floating-point multiplier and the fixed-point multiplier, wherein the adder generates output data based on the first input data and the second input data.

Assignees

Inventors

Classifications

  • Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • with variable precision · CPC title

  • H03M7/24Primary

    Conversion to or from floating-point codes · CPC title

  • Half or full adders, i.e. basic adder cells for one denomination · CPC title

  • Activation functions · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025199762A1 cover?
A programmable device may be configured to support machine learning training operations using matrix multiplication circuitry. In some embodiments, the multiplication is implemented on a systolic array. The systolic array includes an array of processing elements, each of which includes hybrid floating-point dot-product circuitry.
Who is the assignee on this patent?
Altera Corp
What technology area does this patent fall under?
Primary CPC classification H03M7/24. Mapped technology areas include Electricity.
When was this patent published?
Publication date Thu Jun 19 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).