Method and System for Activity Prediction, Prefetching and Preloading of Computer Assets by A Client-Device
US-2023050882-A1 · Feb 16, 2023 · US
US2024127044A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2024127044-A1 |
| Application number | US-202318211202-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 16, 2023 |
| Priority date | Jun 17, 2022 |
| Publication date | Apr 18, 2024 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method for selecting numerical formats suitable for use in configuring a hardware implementation of an attention-based neural network. A dataset of test input sequences for the neural network is obtained. Each test input sequence is padded with padding values. For each padded input sequence, a padding mask is generated identifying the part of the padded input sequence that contains the padding values. An attention mask is generated from each padding mask, using an outer product operation. The padded input sequences and attention masks are processed through the neural network. During the processing, statistics are collected, describing ranges of values obtained at various layers of the neural network. Numerical formats are selected for the various layers based on the collected statistics.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method for selecting numerical formats for use in configuring a hardware implementation of an attention-based neural network, the method comprising: obtaining a representation of the attention-based neural network; implementing the representation as a test neural network; obtaining a dataset of first test input sequences for the attention-based neural network, wherein the dataset includes first test input sequences of varying length; padding each first test input sequence with padding values to produce a respective first padded input sequence of a first fixed length; generating, for each first padded input sequence, a respective first padding mask identifying the part of the first padded input sequence that contains the padding values; generating a first attention mask from each first padding mask, wherein the generating comprises an outer product operation applied to the first padding mask; processing the first padded input sequences and the first attention masks through the test neural network; collecting statistics describing ranges of values obtained during said processing, wherein the statistics describe ranges of values for at least two different layers of the attention-based neural network; and selecting numerical formats for the at least two different layers based on the collected statistics. 2 . The method of claim 1 , wherein the attention-based neural network comprises a decoder, wherein the first test input sequences are test input sequences for the decoder, wherein processing each of the first padded input sequences and respective first attention masks through the test neural network comprises executing the decoder for a number of iterations equal to the first fixed length, wherein the statistics are collected over all of the iterations. 3 . The method of claim 2 , wherein, at each iteration the decoder produces an output sequence, and, at each iteration other than an initial iteration, the input to the decoder comprises the output sequence from the preceding iteration. 4 . The method of claim 1 , wherein the attention based neural network comprises an encoder, and wherein the first test input sequences are test input sequences for the encoder. 5 . The method of claim 4 , wherein the attention-based neural network further comprises a decoder, the method further comprising: obtaining a dataset of second test input sequences, wherein the second test input sequences are test input sequences for the decoder; padding each second test input sequence with padding values to produce a respective second padded input sequence of a second fixed length; generating, for each second padded input sequence, a respective second padding mask identifying the part of the second padded input sequence that contains the padding values; and generating a second attention mask from each second padding mask, wherein the generating comprises an outer product operation applied to the second padding mask, wherein the method further comprises selecting a numerical format for at least one layer of the decoder. 6 . The method of claim 5 , further comprising executing the decoder for a number of iterations equal to the second fixed length, wherein, at each iteration the decoder produces an output sequence, and, at each iteration other than an initial iteration, the input to the decoder comprises the output sequence from the preceding iteration, wherein the statistics are collected over all of the iterations. 7 . The method of claim 5 , further comprising generating a cross-attention mask from each first padding mask and respective second padding mask, comprising an outer product of the first padding mask with the second padding mask, wherein the method further comprises: processing the first padded input sequences, the second padded input sequences, the first attention masks, the second attention masks, and the cross-attention masks through the test neural network, and selecting numerical formats for any one or any combination of two or more of: the first padded input sequences, the second padded input sequences, the first attention masks, the second attention masks, and the cross-attention masks. 8 . The method of claim 1 , wherein the attention-based neural network comprises a scaled dot-product attention calculation. 9 . The method of claim 1 , wherein each first attention mask comprises: a plurality of zeros, in locations corresponding to the elements of the respective first input sequence; and one or more large negative values, in locations corresponding to the padding values of the respective first padded input sequence. 10 . The method of claim 1 , wherein the attention-based neural network comprises a Softmax function, and wherein the processing comprises adding the first attention mask to an input to the Softmax function. 11 . The method of claim 1 , wherein the attention-based neural network comprises a transformer network. 12 . The method of claim 1 , wherein the attention-based neural network comprises a layer normalisation. 13 . The method of claim 12 , wherein the hardware implementation is configured to perform a set of available elementary neural network operations, the method further comprising: mapping the layer normalisation to an equivalent representation comprising a plurality of elementary neural network operations from the set of available elementary neural network operations; and selecting numerical formats for said plurality of elementary neural network operations, wherein each of the plurality of elementary neural network operations is selected from the list consisting of: a convolution operation; an element-wise subtraction operation; an element-wise multiplication operation; a reciprocal operation; a square root operation; an element-wise division operation; a rectified linear activation function; a local response normalisation; and an element-wise addition. 14 . The method of claim 13 , wherein the plurality of elementary neural network operations implements: a first convolution operation, applied to an input to the layer normalisation, to calculate a mean of the input; an element-wise subtraction operation, to subtract the mean from the input; a first element-wise multiplication operation, to calculate the square of the output of the element-wise subtraction operation; and a second convolution operation, applied to the output of the first element-wise multiplication operation, to calculate the variance about the mean. 15 . The method of claim 14 , wherein the plurality of elementary neural network operations implements: a square root operation and a reciprocal operation, applied after the second convolution operation, to calculate the reciprocal of the standard deviation; and a second element-wise multiplication operation, to multiply the output of the element-wise subtraction operation by the reciprocal of the standard deviation. 16 . The method of claim 14 , wherein the plurality of elementary neural network operations implements: a square root operation, applied to the output of the second convolution operation, to calculate the standard deviation; and an element-wise division operation, to divide the output of the element-wise subtraction operation by the reciprocal of the standard deviation. 17 . The method of claim 1 , wherein the attention-based neural network comprises a matrix multiplication operation defined in two or more dimensions between a first tensor X having dimensions [ . . . , Q, . . . ] and a second tensor Y
using electronic means · CPC title
Combinations of networks · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Learning methods · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.