Instruction and logic to provide vector scatter-op and gather-op functionality
US-2017357514-A1 · Dec 14, 2017 · US
US10311018B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10311018-B2 |
| Application number | US-201615292015-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 12, 2016 |
| Priority date | Dec 31, 2015 |
| Publication date | Jun 4, 2019 |
| Grant date | Jun 4, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A vector memory subsystem for use with a programmable mix-radix vector processor (“PVP”) capable of calculating discrete Fourier transform (“DFT/IDFT”) values. In an exemplary embodiment, an apparatus includes a vector memory bank and a vector memory system (VMS) that generates input memory addresses that are used to store input data into the vector memory bank. The VMS also generates output memory addresses that are used to unload vector data from the memory banks. The input memory addresses are used to shuffle the input data in the memory bank based on a radix factorization associated with an N-point DFT, and the output memory addresses are used to unload the vector data from the memory bank to compute radix factors of the radix factorization.
Opening claim text (preview).
What is claimed is: 1. An apparatus, comprising: a vector memory bank; and a vector memory system (VMS) coupled to the vector memory bank and configured to include a vector memory address generator and a vector data path pipeline which includes a vector load unit, vector dynamic scaling unit, vector input staging buffer, and vector data twiddle multiplier, wherein the vector memory address generator generates input memory addresses that are used to store input data into the vector memory bank and generates output memory addresses that are used to unload vector data from the vector memory bank, wherein the input memory addresses are used to shuffle the input data in the vector memory bank based on a radix factorization associated with an N-point DFT, and wherein the output memory addresses are used to unload the vector data from the vector memory bank to compute radix factors of the radix factorization. 2. The apparatus of claim 1 , wherein an input of the vector load unit of the vector data path pipeline, coupled to the vector memory bank, is configured to receive the vector data from the vector memory back; and wherein output of the vector load unit is coupled to an input of the vector dynamic scaling unit of the vector data path pipeline for providing data scaling operation. 3. The apparatus of claim 2 , wherein the vector data path pipeline carries twelve data values per clock cycle. 4. The apparatus of claim 2 , wherein the vector data path pipeline comprises a vector scaling unit that scales the vector data to generate scaled vector data. 5. The apparatus of claim 4 , wherein the vector data path pipeline comprises a vector staging buffer that stores the scaled vector data in a temporary memory in a first order, and outputs the scaled vector data from the temporary memory in a second order. 6. The apparatus of claim 5 , wherein the vector data path pipeline comprises a twiddle multiplier that multiples the scaled vector data by twiddle factors to form multiplied scaled vector data. 7. The apparatus of claim 6 , further comprising a configurable mixed radix engine, wherein the configurable mixed radix engine is configurable to perform a selected radix computation selected from a plurality of radix computations, and wherein the configurable mixed radix engine performs the selected radix computation on the multiplied scaled vector data to generate a radix result. 8. The apparatus of claim 7 , wherein the plurality of radix computations comprise radix3, radix4, radix5, and radix6 computations. 9. The apparatus of claim 7 , further comprising an output staging buffer, and wherein the configurable mixed radix engine outputs the radix result to the output staging buffer. 10. The apparatus of claim 9 , further comprising a vector feedback path coupled to the output staging buffer to pass the radix result to a vector store unit. 11. The apparatus of claim 10 , wherein the vector store unit stores the radix result in the vector memory bank at the location at which the vector data was stored. 12. The apparatus of claim 10 , wherein the vector feedback path comprises a scaling factor calculator that determines scaling factors that are input to the vector scaling unit. 13. The apparatus of claim 1 , wherein the VMS generates the input memory addresses to organize the vector memory bank into a virtual folded memory. 14. A method, comprising: generating input memory addresses that are used to store input data into a vector memory bank, wherein the input memory addresses are used to shuffle the data in the memory bank based on a radix factorization associated with an N-point DFT; generating output memory addresses that are used to unload vector data from the vector memory bank to compute radix factors of the radix factorization; fetching data from the vector memory bank by a vector load unit after activating a vector data path pipeline and subsequently forwarding the data to a vector dynamic scaling unit of the vector data path pipeline; and scaling the data received from the vector load unit to keep signal within a predefined bit width. 15. The method of claim 14 , further comprising receiving the vector data from the memory bank into the vector data path pipeline. 16. The method of claim 15 , wherein the vector data path pipeline carries twelve data values per clock cycle. 17. The method of claim 14 , further comprising staging the vector data from the memory bank, wherein the staging stores the vector data in a temporary memory in a first order, and wherein the staging outputs the vector data from the temporary memory in a second order. 18. The method of claim 17 , further comprising multiplying the vector data by twiddle factors to form multiplied vector data. 19. The method of claim 18 , further comprising performing a radix computation selected from a plurality of radix computations, wherein the radix computation is performed on the multiplied vector data by a configurable mixed radix engine. 20. The method of claim 14 , further comprising generating the input memory addresses to organize the vector memory bank into a virtual folded memory.
Discrete Fourier transforms · CPC title
Details on data memory access · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.