Specialized fixed function hardware for efficient convolution
US-2018307980-A1 · Oct 25, 2018 · US
US10771089B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10771089-B2 |
| Application number | US-201916390084-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 22, 2019 |
| Priority date | Apr 23, 2018 |
| Publication date | Sep 8, 2020 |
| Grant date | Sep 8, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of data compression performed by at least one core communicating with a central memory. The input data presents a two-dimensional input array formed by a plurality data items stored contiguously in the central memory according to a contiguous direction. The method comprises a step of wavelet transform comprising the following sub-steps: forming from the input array at least one tile comprising a plurality of consecutive data block columns, each data block column being formed by a plurality of lines of consecutive data items according to the contiguous direction, the length of each line being a multiple of the cache line length; and for each data block column computing dot products between a filter vector and each group of N lines using fused multiply-add instructions for the core.
Opening claim text (preview).
The invention claimed is: 1. A method of input data compression performed by at least one core communicating with a central memory, the core being associated to an operating cache memory able to store data items, and comprising a plurality of vector registers able to store data items to be processed, each vector register presenting a predetermined register length, the operating cache memory comprising a plurality of cache lines, each cache line presenting a predetermined cache line length which is a multiple of the register length; the input data presents a two-dimensional input array formed by a plurality data items stored contiguously in the central memory according to a contiguous direction; the method comprising a step of two-dimensional wavelet transform implemented using a low band-pass filter vector of dimension N and a high band-pass filter vector of dimension N, said step comprising the following sub-steps: A) forming from the input array at least one tile comprising a plurality of consecutive data block columns, each data block column being formed by a plurality of lines of consecutive data items according to the contiguous direction, the length of each line being a multiple of the cache line length; C) for the or each tile, processing each data block column and for each data block column, computing dot products between the low band-pass or high band-pass filter vector and each group of N lines using fused multiply-add instructions for the core, wherein one tile is formed during the sub-step A), said tile corresponding to said two-dimensional input array, and wherein the step of two-dimensional wavelet transform further comprises a sub-step D′) of transposing the two-dimensional input array, the sub-steps A) and C) being performed before the sub-step D′) for a non-transposed input array and then, after the sub-step D′) for a transposed input array. 2. The method according to claim 1 , wherein the sub-step C) is performed n times where n is a level of the wavelet transform. 3. The method according to claim 1 , wherein several tiles are formed during the sub-step A), each tile corresponding to a part of said two-dimensional input array which the operating cache memory is able to store entirely, preferably the operating cache memory is able to store entirely each tile and the result of its processing. 4. The method according to claim 3 , wherein the step of two-dimensional wavelet transform further comprises a sub-step D) of transposing of each tile, the sub-step C) being performed before the sub-step D) for each non-transposed tile and then, after the sub-step D) for each transposed tile. 5. The method according to claim 4 , wherein, during the sub-step D), each tile is transposed in the operating cache memory. 6. The method according to claim 1 , wherein each data item of two-dimensional input array intended to form the first data item in the corresponding line of the corresponding data block column in the operating cache memory, is stored in the central memory using an address which is a multiple of a predetermined alignment value depending on the cache line length. 7. The method according to claim 1 , wherein the operating cache memory is the level-2 cache of the core or the level-1 cache of the core. 8. The method according to claim 1 , wherein the step of two-dimensional wavelet transform further comprises a sub-step B) of padding at least one tile with a predetermined value so as the number of data items in this tile in each direction presents a number multiple of 2 n . 9. The method according to claim 1 , further comprising a step of quantization of data obtained after the step of two-dimensional wavelet transform. 10. The method according to claim 9 , further comprising at least one step of lossless compression of data obtained after the step of quantization. 11. The method of compressed data extraction comprising steps configured to decompress input data compressed with the method according to claim 1 . 12. A computer program product comprising software instructions which, when executed by a computer system, implement a method of input data compression performed by at least one core communicating with a central memory, the core being associated to an operating cache memory able to store data items, and comprising a plurality of vector registers able to store data items to be processed, each vector register presenting a predetermined register length, the operating cache memory comprising a plurality of cache lines, each cache line presenting a predetermined cache line length which is a multiple of the register length; the input data presents a two-dimensional input array formed by a plurality data items stored contiguously in the central memory according to a contiguous direction; the method comprising a step of two-dimensional wavelet transform implemented using a low band-pass filter vector of dimension N and a high band-pass filter vector of dimension N, said step comprising the following sub-steps: A) forming from the input array at least one tile comprising a plurality of consecutive data block columns, each data block column being formed by a plurality of lines of consecutive data items according to the contiguous direction, the length of each line being a multiple of the cache line length; C) for the or each tile, processing each data block column and for each data block column, computing dot products between the low band-pass or high band-pass filter vector and each group of N lines using fused multiply-add instructions for the core, wherein one tile is formed during the sub-step A), said tile corresponding to said two-dimensional input array, wherein the step of two-dimensional wavelet transform further comprises a sub-step D′) of transposing the two-dimensional input array, the sub-steps A) and C) being performed before the sub-step D′) for a non-transposed input array and then, after the sub-step D′) for a transposed input array. 13. A computer system for input data compression comprising a central memory and at least one core communicating with the central memory; the core being associated to an operating cache memory able to store data items, and comprising a plurality of vector registers able to store data items to be processed, each vector register presenting a predetermined register length, the operating cache memory comprising a plurality of cache lines, each cache line presenting a predetermined cache line length which is a multiple of the register length; the core being configured to carry out a method of input data compression; the input data presents a two-dimensional input array formed by a plurality data items stored contiguously in the central memory according to a contiguous direction; the method comprising a step of two-dimensional wavelet transform implemented using a low band-pass filter vector of dimension N and a high band-pass filter vector of dimension N, said step comprising the following sub-steps: A) forming from the input array at least one tile comprising a plurality of consecutive data block columns, each data block column being formed by a plurality of lines of consecutive data items according to the contiguous direction, the length of each line being a multiple of the cache line length; C) for the or each tile, processing each data block column and for each data block column, computing dot products between the low band-pass or high band-pass filter vector and each group of N lines using fused multiply-add instructions for the core, wherein one tile is formed during the sub-step A), said tile corresponding to said two-dimensional input array, wherein the step of two-dimensional wavelet transform
Arrangements specific to bandpass modulators · CPC title
Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code · CPC title
employing the use of a dictionary, e.g. LZ78 · CPC title
of parts of caches, e.g. directory or tag array · CPC title
characterised by the number of quantisers and their type and resolution · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.