Instruction and logic to provide vector scatter-op and gather-op functionality
US-2017357514-A1 · Dec 14, 2017 · US
US11829322B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11829322-B2 |
| Application number | US-202017124442-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 16, 2020 |
| Priority date | Dec 31, 2015 |
| Publication date | Nov 28, 2023 |
| Grant date | Nov 28, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A vector memory subsystem for use with a programmable mix-radix vector processor (“PVP”) capable of calculating discrete Fourier transform (“DFT/IDFT”) values. In an exemplary embodiment, an apparatus includes a vector memory bank and a vector memory system (VMS) that generates input memory addresses that are used to store input data into the vector memory bank. The VMS also generates output memory addresses that are used to unload vector data from the memory banks. The input memory addresses are used to shuffle the input data in the memory bank based on a radix factorization associated with an N-point DFT, and the output memory addresses are used to unload the vector data from the memory bank to compute radix factors of the radix factorization.
Opening claim text (preview).
What is claimed is: 1. A programmable network processing unit configured to facilitate discrete Fourier transform (“DFT”) operations for data processing, comprising: a memory containing a “ping” and “pong” memory banks for facilitating selection of read or write operation to enhance efficiency of data flows; a vector load unit coupled to the memory and configured to retrieve a data flow from the memory in accordance with address information from an address generator; a vector dynamic scaling unit coupled to the vector loading unit and configure to scale the data flow to generate parallel scaled samples operating within a predefined amplitude in a bit-width of a data-path for facilitating radix computation; and a vector data twiddle multiplier coupled to the vector dynamic scaling unit and configured to multiply scaled samples with twiddle factors. 2. The programmable network processing unit of claim 1 further comprising a vector staging buffer coupled to the vector dynamic scaling unit and operable to facilitate parallel data output to the vector data twiddle multiplier. 3. The programmable network processing unit of claim 2 , wherein the vector staging buffer stores scaled vector data in a temporary memory in a first order, and outputs the scaled vector data from the temporary memory in a second order. 4. The programmable network processing unit of claim 1 , wherein the vector load unit, vector dynamic scaling unit, and the vector data twiddle multiplier form a sequence of operations as a vector data-path pipeline. 5. The programmable network processing unit of claim 1 , further comprising a finite state machine controller coupled to the memory and configured to generate radix engine control signals in accordance an input of index. 6. The programmable network processing unit of claim 1 , further comprising a mixed radix engine coupled to the memory and capable of being reconfigurable based on radix engine control signals for facilitating generation of a radix result in accordance with scaled vector data from a vector data-path pipeline. 7. The programmable network processing unit of claim 1 , further comprising an output staging buffer coupled to a mixed radix engine and configured to buffer intermediate radix results generated by the mixed radix engine. 8. The programmable network processing unit of claim 1 , further comprising an output interface streamer coupled to the memory and configured to retrieve result from a staging buffer for ordering data sequence. 9. The programmable network processing unit of claim 1 , further comprising an output vector ping-pong buffer coupled to an output interface streamer and configured to generate discrete Fourier transform (“DFT”) and inverse DFT (“IDFT”) data for a downstream entity in a sequential order. 10. The programmable network processing unit of claim 1 , further comprising a programmable vector mixed-radix engine coupled to the vector data twiddle multiplier and operable to perform a selected radix computation selected from a plurality of radix computations for generating a radix result. 11. The programmable network processing unit of claim 10 , wherein the plurality of radix computations includes radix3, radix4, radix5, and radix6 computations. 12. The programmable network processing unit of claim 1 , further comprising a configuration look up table (“LUT”) coupled to the memory and configured to store index values selectable by an index. 13. A method for processing unit configured to facilitate discrete Fourier transform (“DFT”) and inverse DFT (“IDFT”) operations for data processing, comprising: loading a data stream from a ping-pong memory bank and forwarding the data stream to vector load unit for passing through a vector data-path pipeline; generating multiple samples in response to the data stream and forwarding the samples to a vector dynamic scaling unit; scaling the samples to keep amplitudes with in a predefined bit-width of a data-path for radix computation; and forwarding scaled samples to a vector data twiddle multiplier for multiplying the scaled samples with twiddle factors. 14. The method of claim 13 , wherein loading the data stream includes transmitting parallel data to a vector data twiddle multiplier for facilitating a vector mixed-radix computation. 15. The method of claim 13 , further comprising: receiving an index from an external component; and selecting one of index values representing size of DFT/IDFT stored in a configuration look up table (“LUT”) based on the index. 16. The method of claim 13 , further comprising instructing a vector input shuffling controller to store input data in a vector memory bank based on selected index value. 17. The method of claim 13 , further comprising facilitating to program a programmable vector mixed-radix engine in accordance with an index value. 18. The method of claim 13 , further comprising: generating input memory addresses for storing input data into a vector memory bank; and generating output memory addresses for retrieving vector data from the vector memory bank to compute radix factors of a radix factorization. 19. The method of claim 13 , further comprising staging vector data from a memory bank and staging outputs of vector data from a temporary memory. 20. An apparatus for processing unit configured to facilitate discrete Fourier transform operations for data processing, comprising: means for loading a data stream from a ping-pong memory bank and forwarding the data stream to vector load unit for passing through a vector data-path pipeline; means for generating multiple samples in response to the data stream and forwarding the samples to a vector dynamic scaling unit; means for scaling the samples to keep amplitudes with in a predefined bit-width of a data-path for radix computation; and means for forwarding scaled samples to a vector data twiddle multiplier for multiplying the scaled samples with twiddle factors. 21. The apparatus of claim 20 , wherein means for loading the data stream includes means for transmitting parallel data to a vector data twiddle multiplier for facilitating a vector mixed-radix computation. 22. The apparatus of claim 20 , further comprising: means for receiving an index from an external component; and means for selecting one of index values representing size of discrete Fourier transform (“DFT”) and inverse DFT (“IDFT”) stored in a configuration look up table (“LUT”) based on the index. 23. The apparatus of claim 20 , further comprising means for instructing a vector input shuffling controller to store input data in a vector memory bank based on selected index value. 24. The apparatus of claim 20 , further comprising means for facilitating to program a programmable vector mixed-radix engine in accordance with an index value.
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Details on data memory access · CPC title
Discrete Fourier transforms · CPC title
Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.