Scalable fir filter
US-2019348970-A1 · Nov 14, 2019 · US
US9966932B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9966932-B2 |
| Application number | US-201314785359-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 19, 2013 |
| Priority date | Apr 19, 2013 |
| Publication date | May 8, 2018 |
| Grant date | May 8, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus for parallel filtering, including a multi-granularity memory, a data cache device, a coefficient buffer broadcast device, a vector operation device and a command queue device. The multi-granularity memory is configured to store data to be filtered, filter coefficients and filtering result data. The data cache device is configured to cache, read and update the data to be filtered. The coefficient buffer broadcast device is configured to cache and broadcast the read filter coefficients. The command queue device is configured to store and output a queue of operation commands for the parallel filtering operation. The vector operation device is configured to perform a vector operation based on the data to be filtered and the output coefficient data, and write an operation result into the multi-granularity filtering result storage unit. A method is also provided. The apparatus and method have a fast filtering speed, a smaller number of accesses, an improved usage efficiency, a reduced power consumption and a wide application scope.
Opening claim text (preview).
What is claimed is: 1. An apparatus for parallel filtering comprising: a multi-granularity memory, a data cache device, a coefficient buffer broadcast device ( 30 ), a vector operation device and a command queue device, wherein: the multi-granularity memory is configured to store data to be filtered and filter coefficients, which are read from a matrix of data to be filtered and a matrix of filter coefficients, respectively, for parallel filtering operation, and filtering result data obtained after the filtering operation, the multi-granularity memory comprising a multi-granularity to-be-filtered data storage unit, a multi-granularity filter coefficient storage unit and a multi-granularity filtering result storage unit; wherein multi-granularity to-be-filtered data storage unit and the multi-granularity filter coefficient storage unit each have a read/write bit width, denoted as BS, identical to an operational size of the vector operation device; wherein the vector operation device is configured to execute BS operations, and write BS results into the multi-granularity to-be-filtered data storage unit and the multi-granularity filter coefficient storage unit simultaneously; the data cache device is configured to cache the data to be filtered as read from the multi-granularity to-be-filtered data storage unit, and read and update the data to be filtered, the data cache device comprising a data cache body and a data buffer control unit; the coefficient buffer broadcast device is configured to cache the filter coefficients as read from the multi-granularity filter coefficient storage unit, and broadcast the data to be filtered by duplicating the data to be filtered into BS copies to obtain output coefficient data having a width of BS data elements, the coefficient buffer broadcast device comprising a coefficient buffer entity and a plurality of coefficient buffer control units: a read control logic unit, an initialization logic unit and an update logic unit; the command queue device is configured to store and output to the vector operation device a queue of operation commands for the parallel filtering operation; and the vector operation device is configured to perform a vector operation based on the data to be filtered as read from the data cache device and the output coefficient data as read from the coefficient buffer broadcast device, and write an operation result into the multi-granularity filtering result storage unit; wherein the coefficient buffer entity is configured to cache the filter coefficients in the matrix of filter coefficients; the read control logic unit is configured to control an operation to read the coefficient buffer entity; the initialization logic unit is configured to initialize the coefficient buffer entity when an initialization start signal, which is an input signal to the coefficient buffer broadcast device, becomes valid; and the update logic unit is configured to read, when the coefficient buffer entity is not sufficient for holding all the filter coefficients in the multi-granularity filter coefficient storage unit, excessive filter coefficients from the multi-granularity filter coefficient storage unit and store them in the coefficient buffer entity. 2. The apparatus of claim 1 , wherein, in operation, the apparatus first reads the data to be filtered in the matrix of data to be filtered from the multi-granularity to-be-filtered data storage unit in columns and caches it in the data cache device, while reading the filter coefficients in the matrix of filter coefficients from the multi-granularity filter coefficient storage unit in columns and caching them in the coefficient buffer broadcast device; the vector operation device is configured to read the data to be filtered from the data cache device, read the output coefficient data that has been broadcasted from the coefficient buffer broadcast device, and then perform the filtering operation on the read data based on the operation commands from the command queue device and write the operation result into the multi-granularity filtering result storage unit. 3. The apparatus of claim 1 , wherein the data cache body comprises an upper region, a lower region and a main region, the data to be filtered being distributed over the main region, the first columns of the upper region. 4. The apparatus of claim 1 , wherein the input signal to the coefficient buffer broadcast device comprises: a read enabling signal, a filter coefficient number indicator signal, the data read from the multi-granularity to-be-filtered data storage unit by the initialization logic unit or the update logic unit, an update signal transmitted from the vector multiplier and vector operation device to the coefficient buffer broadcast device, and the initialization start signal; an output signal comprises: a read request, read granularity and read address signal from the initialization logic unit or the update logic unit to the multi-granularity to-be-filtered data storage unit, and the output coefficient data obtained by broadcasting the data read from the coefficient buffer entity by the read control logic unit. 5. The apparatus of claim 1 , wherein the vector operation device is a vector multiplier and accumulator device: wherein the vector multiplier and accumulator device comprises a vector multiplier unit, a vector adder unit, a vector accumulating register unit and an operation control logic unit, wherein the vector multiplier unit and the vector adder unit each have an operational size of BS data elements and the vector accumulating register unit is configured to store BS result values; and the operation control logic control is configured to transmit to the data cache device an initialization start signal comprising a signal set, including a read data buffer enabling signal, a read data buffer column number signal and a read data buffer in-column offset signal, and a column shift signal, transmit to the coefficient buffer broadcast device an initialization start signal, a read coefficient buffer enabling signal and an update signal, and write the filtering result back into the multi-granularity filtering result storage unit. 6. The apparatus of claim 5 , wherein the vector multiplier and accumulator device is configured to: first read the data to be filtered and the output coefficient data from the data cache device and the coefficient buffer broadcast device as operands for multiplying operation by the vector multiplier unit; then add an operation result at the vector multiplier unit to a current value at the vector accumulating register unit by the vector adder unit; and finally generate and write every BS filtering results back into the multi-granularity filtering result storage unit under control of the operation control logic unit. 7. A method for parallel filtering used in the apparatus of claim 1 , the method comprising: Step 1 ): reading a number, BS, of data to be filtered from a data cache device and a number, BS, of output coefficient data from a coefficient buffer broadcast device, the BS data to be filtered being first data of first BS rows in a matrix of data to be filtered, while, in a signal set for a vector multiplier and accumulator device, a read data buffer enabling signal is valid, a column number in a read data buffer column number signal corresponds to a column number of the read data and a read data buffer in-column offset signal is valid, a read coefficient buffer enabling signal is valid, and the data to be filtered and the output coefficients are read at an input terminal of the vector multiplier and accumulator device; Step 2 ): multiplying, at the vector multiplier unit, the read output coefficients with the data to be filtered, respectively; Step 3 - 1 ): adding a multiplication res
Computation saving measures; Accelerating measures (computations per se G06F) · CPC title
Measures to reduce power consumption · CPC title
Multiplier and or accumulator units · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.