Hardware implementation of an attention-based neural network

US2024127044A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024127044-A1
Application numberUS-202318211202-A
CountryUS
Kind codeA1
Filing dateJun 16, 2023
Priority dateJun 17, 2022
Publication dateApr 18, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method for selecting numerical formats suitable for use in configuring a hardware implementation of an attention-based neural network. A dataset of test input sequences for the neural network is obtained. Each test input sequence is padded with padding values. For each padded input sequence, a padding mask is generated identifying the part of the padded input sequence that contains the padding values. An attention mask is generated from each padding mask, using an outer product operation. The padded input sequences and attention masks are processed through the neural network. During the processing, statistics are collected, describing ranges of values obtained at various layers of the neural network. Numerical formats are selected for the various layers based on the collected statistics.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for selecting numerical formats for use in configuring a hardware implementation of an attention-based neural network, the method comprising: obtaining a representation of the attention-based neural network; implementing the representation as a test neural network; obtaining a dataset of first test input sequences for the attention-based neural network, wherein the dataset includes first test input sequences of varying length; padding each first test input sequence with padding values to produce a respective first padded input sequence of a first fixed length; generating, for each first padded input sequence, a respective first padding mask identifying the part of the first padded input sequence that contains the padding values; generating a first attention mask from each first padding mask, wherein the generating comprises an outer product operation applied to the first padding mask; processing the first padded input sequences and the first attention masks through the test neural network; collecting statistics describing ranges of values obtained during said processing, wherein the statistics describe ranges of values for at least two different layers of the attention-based neural network; and selecting numerical formats for the at least two different layers based on the collected statistics. 2 . The method of claim 1 , wherein the attention-based neural network comprises a decoder, wherein the first test input sequences are test input sequences for the decoder, wherein processing each of the first padded input sequences and respective first attention masks through the test neural network comprises executing the decoder for a number of iterations equal to the first fixed length, wherein the statistics are collected over all of the iterations. 3 . The method of claim 2 , wherein, at each iteration the decoder produces an output sequence, and, at each iteration other than an initial iteration, the input to the decoder comprises the output sequence from the preceding iteration. 4 . The method of claim 1 , wherein the attention based neural network comprises an encoder, and wherein the first test input sequences are test input sequences for the encoder. 5 . The method of claim 4 , wherein the attention-based neural network further comprises a decoder, the method further comprising: obtaining a dataset of second test input sequences, wherein the second test input sequences are test input sequences for the decoder; padding each second test input sequence with padding values to produce a respective second padded input sequence of a second fixed length; generating, for each second padded input sequence, a respective second padding mask identifying the part of the second padded input sequence that contains the padding values; and generating a second attention mask from each second padding mask, wherein the generating comprises an outer product operation applied to the second padding mask, wherein the method further comprises selecting a numerical format for at least one layer of the decoder. 6 . The method of claim 5 , further comprising executing the decoder for a number of iterations equal to the second fixed length, wherein, at each iteration the decoder produces an output sequence, and, at each iteration other than an initial iteration, the input to the decoder comprises the output sequence from the preceding iteration, wherein the statistics are collected over all of the iterations. 7 . The method of claim 5 , further comprising generating a cross-attention mask from each first padding mask and respective second padding mask, comprising an outer product of the first padding mask with the second padding mask, wherein the method further comprises: processing the first padded input sequences, the second padded input sequences, the first attention masks, the second attention masks, and the cross-attention masks through the test neural network, and selecting numerical formats for any one or any combination of two or more of: the first padded input sequences, the second padded input sequences, the first attention masks, the second attention masks, and the cross-attention masks. 8 . The method of claim 1 , wherein the attention-based neural network comprises a scaled dot-product attention calculation. 9 . The method of claim 1 , wherein each first attention mask comprises: a plurality of zeros, in locations corresponding to the elements of the respective first input sequence; and one or more large negative values, in locations corresponding to the padding values of the respective first padded input sequence. 10 . The method of claim 1 , wherein the attention-based neural network comprises a Softmax function, and wherein the processing comprises adding the first attention mask to an input to the Softmax function. 11 . The method of claim 1 , wherein the attention-based neural network comprises a transformer network. 12 . The method of claim 1 , wherein the attention-based neural network comprises a layer normalisation. 13 . The method of claim 12 , wherein the hardware implementation is configured to perform a set of available elementary neural network operations, the method further comprising: mapping the layer normalisation to an equivalent representation comprising a plurality of elementary neural network operations from the set of available elementary neural network operations; and selecting numerical formats for said plurality of elementary neural network operations, wherein each of the plurality of elementary neural network operations is selected from the list consisting of: a convolution operation; an element-wise subtraction operation; an element-wise multiplication operation; a reciprocal operation; a square root operation; an element-wise division operation; a rectified linear activation function; a local response normalisation; and an element-wise addition. 14 . The method of claim 13 , wherein the plurality of elementary neural network operations implements: a first convolution operation, applied to an input to the layer normalisation, to calculate a mean of the input; an element-wise subtraction operation, to subtract the mean from the input; a first element-wise multiplication operation, to calculate the square of the output of the element-wise subtraction operation; and a second convolution operation, applied to the output of the first element-wise multiplication operation, to calculate the variance about the mean. 15 . The method of claim 14 , wherein the plurality of elementary neural network operations implements: a square root operation and a reciprocal operation, applied after the second convolution operation, to calculate the reciprocal of the standard deviation; and a second element-wise multiplication operation, to multiply the output of the element-wise subtraction operation by the reciprocal of the standard deviation. 16 . The method of claim 14 , wherein the plurality of elementary neural network operations implements: a square root operation, applied to the output of the second convolution operation, to calculate the standard deviation; and an element-wise division operation, to divide the output of the element-wise subtraction operation by the reciprocal of the standard deviation. 17 . The method of claim 1 , wherein the attention-based neural network comprises a matrix multiplication operation defined in two or more dimensions between a first tensor X having dimensions [ . . . , Q, . . . ] and a second tensor Y

Assignees

Inventors

Classifications

  • G06N3/063Primary

    using electronic means · CPC title

  • Combinations of networks · CPC title

  • G06N3/0455Primary

    Auto-encoder networks; Encoder-decoder networks · CPC title

  • Learning methods · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024127044A1 cover?
A computer-implemented method for selecting numerical formats suitable for use in configuring a hardware implementation of an attention-based neural network. A dataset of test input sequences for the neural network is obtained. Each test input sequence is padded with padding values. For each padded input sequence, a padding mask is generated identifying the part of the padded input sequence tha…
Who is the assignee on this patent?
Imagination Tech Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 18 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).