Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06N3/084. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Neural network activation compression with narrow block floating-point

US12443848B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12443848-B2
Application number	US-201816237197-A
Country	US
Kind code	B2
Filing date	Dec 31, 2018
Priority date	Dec 31, 2018
Publication date	Oct 14, 2025
Grant date	Oct 14, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a narrower numerical precision than the first block floating-point format. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.

First claim

Opening claim text (preview).

What is claimed is: 1. A computing system comprising: one or more hardware processors; at least one memory coupled to the one or more hardware processors; and one or more computer-readable storage media storing computer-executable instructions, or hardware comprising logic implementing the computer-executable instructions, that, when executed by the computing system, cause the computing system to perform operations comprising: for each of multiple layers of a neural network comprising a plurality of layers: performing forward propagation for a given layer of the multiple layers of the neural network using a set of input data to produce activation values in a first quantized block floating-point format for the given layer, the first quantized block floating-point format having a first numerical precision; converting at least one of the produced activation values in the first quantized block floating-point format for the given layer to a second, different, quantized block floating-point format to produce compressed activation values for the given layer, the second quantized block floating-point format having a second numerical precision less than the first numerical precision; storing the compressed activation values to provide stored compressed activation values for the given layer; and for layers of the multiple layers that are not a final layer of the neural network, forward propagating the activation values in the first quantized block-floating point format for the given layer to a next layer of the multiple layers; calculating a measure of loss for an initial set of input data provided to the neural network based on results of forward propagation through the multiple layers using the activation values in the first quantized block floating-point format provided by an output layer of the neural network, wherein the output layer receives a set of input values of a prior layer of the multiple layers in the first quantized block floating-point format; retrieving the stored compressed activation values for the multiple layers; decompressing the stored compressed activation values for given layers of the multiple layers from the second quantized block floating-point format to the first block floating point format to provide decompressed activation values for the given layers of the multiple layers; and performing backpropagation for the multiple layers of the neural network using the measure of loss and respective decompressed activation values for respective given layers of the multiple layers. 2. The computing system of claim 1 , wherein the second quantized block floating-point format has a lower-precision mantissa and/or a lower-precision exponent than the first quantized block floating-point format. 3. The computing system of claim 1 , the operations further comprising: converting the activation values in the first quantized block floating-point format to a normal-precision format, producing converted normal-precision values; and converting the converted normal-precision values to the second quantized block floating-point format. 4. The computing system of claim 1 , wherein the second quantized block floating-point format has a different sharing format of a common exponent than the first quantized block floating-point format, the different sharing format being different based on per-row, per-column, or per-tile sharing of a common exponent for the compressed activation values. 5. The computing system of claim 1 , the operations further comprising: further compressing the compressed activation values prior to the storing by performing at least one or more of: entropy compression, zero compression, run length encoding, compressed sparse row compression, or compressed sparse column compression. 6. The computing system of claim 1 , the operations further comprising: during the backpropagation, performing a gradient operation with the decompressed activation values for the respective given layers. 7. The computing system of claim 1 , the operations further comprising: performing forward propagation for a layer of the plurality of layers, the layer of the plurality of layers not being a layer of the multiple layers, to provide further activation values in the first quantized block floating-point format; converting the further activation values to a third quantized block floating-point format, the third quantized block floating-point format having a numerical precision lower than the second numerical precision; and storing the further activation values in the third quantized block floating-point format for use during the backpropagation. 8. The computing system of claim 1 , wherein: the one or more hardware processors comprise a first processor comprising at least one of: a tensor processing unit, a neural network accelerator, a graphics processing unit, or a processor implemented in a reconfigurable logic array; and the at least one memory comprises bulk memory, the bulk memory being situated on a different integrated circuit than the first processor. 9. The computing system of claim 8 , wherein the bulk memory includes dynamic random access memory (DRAM) or embedded DRAM and the computing system further comprises a hardware accelerator including a memory temporarily storing the activation values for at least a portion of only one layer of the multiple layers of the neural network, the at least one memory comprising hardware accelerator memory, the hardware accelerator memory comprising static RAM (SRAM) or a register file. 10. A method, implemented in a computing environment comprising at least one hardware processor and at least one memory coupled to the at least one hardware processor, the method comprising: for each of multiple layers of a neural network comprising a plurality of layers: performing forward propagation for a given layer of the multiple layers of the neural network using a set of input data to produce activation values in a first quantized block floating-point format for the given layer, the first quantized block floating-point format having a first numerical precision; converting at least one of the produced activation values in the first quantized block floating-point format for the given layer to a second, different, quantized block floating-point format to produce compressed activation values for the given layer, the second quantized block floating-point format having a second numerical precision less than the first numerical precision; storing the compressed activation values to provide stored compressed activation values for the given layer; and for layers of the multiple layers that are not a final layer of the neural network, forward propagating the activation values in the first quantized block-floating point format for the given layer to a next layer of the multiple layers; calculating a measure of loss for an initial set of input data provided to the neural network based on results of forward propagation through the multiple layers using the activation values in the first quantized block floating-point format provided by an output layer of the neural network, wherein the output layer receives a set of input values of a prior layer of the multiple layers in the first quantized block floating-point format; retrieving the stored compressed activation values for the multiple layers; decompressing the stored compressed activation values for given layers of the multiple layers from the second quantized block floating-point format to the first quantized block floating point format to provide decompressed activation values for the given layers of the multiple layers; and performing backpropagation for the multiple layers of the neural network using the measure of los

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06N5/046
Forward inferencing; Production systems · CPC title
G06F9/5027
the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title
G06F9/30025
Format conversion instructions, e.g. Floating-Point to Integer, decimal conversion · CPC title
G06F7/49915
Mantissa overflow or underflow in handling floating-point numbers · CPC title
G06N20/00
Machine learning · CPC title

Patent family

Related publications grouped by family.

View patent family 69160458

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12443848B2 cover?: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a com…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).