Methods and hardware for inter-layer data format conversion in neural networks

US12572785B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12572785-B2
Application numberUS-202217860439-A
CountryUS
Kind codeB2
Filing dateJul 8, 2022
Priority dateJul 8, 2022
Publication dateMar 10, 2026
Grant dateMar 10, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to a method of inter-layer format conversion for a neural network, the neural network comprising at least two computation layers including a first layer to process first data in a first data format and a second layer to process second data in a second data format, the method comprising: extracting data statistics from data output by the first layer, said data statistics being representative of the data output by the first layer; determining one or more conversion parameters based on the extracted data statistics and the second data format; and generating the second data for the second layer by modifying said data output by the first layer using the one or more conversion parameters.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A method of operation of a neural network hardware accelerator, the method comprising: executing a neural network on the hardware accelerator, the neural network comprising at least two computation layers including a first layer to process first data in a first data format and a second layer to process second data in a second data format, the second data format differing from the first data format; extracting, by the hardware accelerator, data statistics from data output by the first layer, said data statistics being representative of the data output by the first layer; determining, by the hardware accelerator, one or more conversion parameters based on the extracted data statistics and the second data format; generating, by the hardware accelerator, the second data for the second layer by modifying said data output by the first layer using the one or more conversion parameters; and providing, by the hardware accelerator, the generated second data to the second layer for processing. 2 . The method of claim 1 , further comprising obtaining, by the hardware accelerator, one or more format parameters representative of the second data format, wherein determining one or more conversion parameters is performed using the one or more format parameters. 3 . The method of claim 2 , wherein said one or more format parameters comprises one or more of: a bit size, a precision, fixed point, floating point. 4 . The method of claim 1 , wherein the first data format is a first number format and the second data format is a second number format. 5 . The method of claim 4 , wherein the first number format or the second number format comprises an integer format, a floating point format, or a block floating point format. 6 . The method of claim 4 , wherein modifying said data output by the first layer using the one or more conversion parameters comprises converting said data output by the first layer from a first precision to a second precision. 7 . The method of claim 4 , wherein modifying said data output by the first layer using the one or more conversion parameters comprises converting a size of said data output by the first layer from a first number of bits to a second number of bits. 8 . The method of claim 1 , wherein said data statistics comprises one or more of: a mean, a variance, a minimum, a maximum, or a combination thereof. 9 . The method of claim 1 , wherein said one or more conversion parameters comprise one or more of: an exponent of said data output by the first layer, a scale factor between said first data and said second data, a zero point, an indication of linearity or non-linearity, an indication of exponent bias, or a combination thereof. 10 . The method of claim 1 , further comprising generating, by the hardware accelerator, subsequent second data for the second layer by modifying subsequent data output by the first layer using at least one of the one or more conversion parameters. 11 . A non-transitory computer-readable medium comprising machine-readable code which, when executed by a processor of a neural network hardware accelerator, causes the processor to perform a method comprising: executing a neural network, the neural network comprising at least two computation layers including a first layer to process first data in a first data format and a second layer to process second data in a second data format, the second data format differing from the first data format; extracting data statistics from data output by the first layer, said data statistics being representative of the data output by the first layer; determining one or more conversion parameters based on the extracted data statistics and the second data format; generating the second data for the second layer by modifying said data output by the first layer using the one or more conversion parameters; and providing the generated second data to the second layer for processing. 12 . Hardware for executing a neural network, the neural network comprising at least two computation layers including a first layer to process first data in a first data format and a second layer to process second data in a second data format, the second data format differing from the first data format, the hardware further comprising: statistics extraction circuitry configured for extracting data statistics from data output by the first layer, said data statistics being representative of the data output by the first layer; format deduction circuitry configured for determining one or more conversion parameters based on the extracted data statistics and the second data format; and modification circuitry configured for generating the second data for the second layer by modifying said data output by the first layer using the one or more conversion parameters and for providing the generated second data to the second layer for processing. 13 . The hardware of claim 12 , wherein said format deduction circuitry is further configured to obtain one or more format parameters representative of the second data format, and said format deduction module is configured to determine said one or more conversion parameters using the one or more format parameters. 14 . The hardware of claim 13 , wherein said one or more format parameters comprises one or more of: a bit size, a precision, fixed point, floating point. 15 . The hardware of claim 12 , wherein the first data format is a first number format and the second data format is a second number format. 16 . The hardware of claim 15 , wherein said modification module modifies said data output by the first layer using the one or more conversion parameters by converting said data output by the first layer from a first precision to a second precision. 17 . The hardware of claim 12 , wherein said data statistics comprises one or more of: a mean, a variance, a minimum, a maximum, or a combination thereof. 18 . The hardware of claim 12 , wherein said one or more conversion parameters comprise one or more of: an exponent of said data output by the first layer, a scale factor between said first data and said second data, or a combination thereof. 19 . A method of quantization-aware training of a neural network executing on a hardware accelerator, the neural network comprising at least two computation layers including a first layer to process first data in a first data format and a second layer to process second data in a second data format, the second data format differing from the first data format, the method comprising: extracting, by the hardware accelerator, data statistics from data output by the first layer, said data statistics being representative of the data output by the first layer; determining, by the hardware accelerator, one or more conversion parameters based on the extracted data statistics and the second data format; generating, by the hardware accelerator, the second data for the second layer by modifying said data output by the first layer using the one or more conversion parameters; and providing, by the hardware accelerator, the generated second data to the second layer for processing.

Assignees

Inventors

Classifications

  • Programmable structures, i.e. where the code converter contains apparatus which is operator-changeable to modify the conversion process · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Conversion to or from floating-point codes · CPC title

  • G06N3/0495Primary

    Quantised networks; Sparse networks; Compressed networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12572785B2 cover?
The present disclosure relates to a method of inter-layer format conversion for a neural network, the neural network comprising at least two computation layers including a first layer to process first data in a first data format and a second layer to process second data in a second data format, the method comprising: extracting data statistics from data output by the first layer, said data stat…
Who is the assignee on this patent?
Advanced Risc Mach Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/0495. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 10 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).