Data value prediction
US-2024370268-A1 · Nov 7, 2024 · US
US2016124651A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016124651-A1 |
| Application number | US-201514920365-A |
| Country | US |
| Kind code | A1 |
| Filing date | Oct 22, 2015 |
| Priority date | Nov 3, 2014 |
| Publication date | May 5, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This invention deals with the problem of paralleling random read access within a reasonably sized block of data for a vector SIMD processor. The invention sets up plural parallel look up tables, moves data from main memory to each plural parallel look up table and then employs a look up table read instruction to simultaneously move data from each parallel look up table to a corresponding part a vector destination register. This enables data processing by vector single instruction multiple data (SIMD) operations. This vector destination register load can be repeated if the tables store more used data. New data can be loaded into the original tables if appropriate. A level one memory is preferably partitioned as part data cache and part directly addressable memory. The look up table memory is stored in the directly addressable memory.
Opening claim text (preview).
What is claimed is: 1 . A method of data processing according to a predetermined algorithm having at least one data access pattern comprising the steps of: determining whether overhead of defining look up tables, moving data from memory to the look up tables and moving data to vector registers of each data access pattern is less than overhead of moving data to vector registers by plural scalar loads; and if the overhead of defining look up tables, moving data from memory to the look up tables and moving data to vector registers for a data access pattern is less than overhead of moving data to vector registers by plural scalar loads setting up plural parallel look up tables, moving data required by the algorithm from main memory to each of said plural parallel look up tables, simultaneously moving data from each of said parallel look up tables to corresponding locations of a vector destination register, and performing at least one vector single instruction multiple data (SIMD) operation upon data in said vector destination register. 2 . The method of data processing of claim 1 , wherein: said step of setting up plural look up tables includes selecting an element size corresponding to a data size of said data access pattern. 3 . The method of data processing of claim 2 , wherein: said step of selecting an element size corresponding to a data size of said data access pattern selects an element size greater than or equal to said data size of said data access pattern. 4 . The method of data processing of claim 1 , wherein: said step of setting up plural look up tables includes selecting a number of parallel tables corresponding to said selected element size relative to a data width of vector registers. 5 . The method of data processing of claim 1 , wherein: said step of setting up plural look up tables includes selecting a table size corresponding to a density of data elements accessed to maximize a number of data elements accessible in a single look up table read instruction. 6 . The method data processing of claim 5 , further comprising the steps of: partitioning a level one memory as part data cache and part directly addressable memory available as look up table memory; wherein said step of selecting a table size enabling said partitioning of the level one memory to include an amount of data cache greater than a minimum data cache required by the algorithm. 7 . The method data processing of claim 1 , further comprising the steps of: following performing the at least one vector single instruction multiple data (SIMD) operation, determining whether the algorithm may operate upon more data currently stored in the look up tables; if the algorithm may operate upon more data currently stored in the look up tables simultaneously moving further data from each of said parallel look up tables to corresponding locations of said vector destination register, and performing at least one further vector single instruction multiple data (SIMD) operation upon data in said vector destination register. 8 . The method data processing of claim 7 , further comprising the steps of: if the algorithm cannot operate upon more data currently stored in the look up tables, determining if the algorithm may operate on more data of the currently set up look up tables; if the algorithm may operate on more data of the currently set up look up tables moving further data required by the algorithm from main memory to each of said plural parallel look up tables, simultaneously moving further data from each of said parallel look up tables to corresponding locations of said vector destination register, and performing at least one further vector single instruction multiple data (SIMD) operation upon data in said vector destination register. 9 . The method of data processing of claim 1 , wherein: said step of simultaneously moving data from each of said parallel look up tables to corresponding locations of a vector destination register includes receiving a plurality of table indexes equal in number to said number of tables, said table indexes from corresponding locations of a vector source register, recalling from each table an element corresponding to a corresponding table index, and storing each recalled element in said vector destination register at a location corresponding to a location of said corresponding table index in said vector source register. 10 . The method of data processing of claim 9 , wherein: said vector destination register includes sixteen data slots; and upon selecting a number of tables equal to one, said step of storing each recalled element in said vector destination register at a location stores said recalled element in a first data slot. 11 . The method of data processing of claim 9 , wherein: said vector destination register includes sixteen data slots; and upon selecting a number of tables equal to two, said step of storing each recalled element in said vector destination register at a location stores a first recalled element in a first data slot and a second recalled element in a ninth data slot. 12 . The method of data processing of claim 9 , wherein: said vector destination register includes sixteen data slots; and upon selecting a number of tables equal to four, said step of storing each recalled element in said vector destination register at a location stores a first recalled element in a first data slot, a second recalled element in a fifth data slot, a third recalled element in a ninth data slot and a fourth recalled element in a thirteenth data slot. 13 . The method of data processing of claim 9 , wherein: said vector destination register includes sixteen data slots; and upon selecting a number of tables equal to eight, said step of storing each recalled element in said vector destination register at a location stores a first recalled element in a first data slot, a second recalled element in a third data slot, a third recalled element in a fifth data slot and a fourth recalled element in a seventh data slot, fifth recalled element in a ninth data slot, a sixth recalled element in an eleventh data slot, a seventh recalled element in s thirteenth data slot and an eight recalled element in a fifteenth data slot. 14 . The method of data processing of claim 9 , wherein: said vector destination register includes sixteen data slots; and upon selecting a number of tables equal to sixteen, said step of storing each recalled element in said vector destination register at a location stores a first recalled element in a first data slot, a second recalled element in a second data slot, a third recalled element in a third data slot and a fourth recalled element in a fourth data slot, a fifth recalled element in a fifth data slot, a sixth recalled element in an sixth data slot, a seventh recalled element in a seventh data slot, an eight recalled element in a eighth data slot, a ninth recalled element in a ninth data slot, a tenth recalled element in a tenth data slot, an eleventh recalled element in an eleventh data slot, a twelfth recalled element in a twelfth data slot, a thirteenth recalled element in a thirteenth data slot, a fourteenth recalled element in a fourteenth data slot, a fifteenth recalled element in a fifteenth data slot and a sixteenth recalled element in a sixteenth data slot. 15 . The method of data processing of claim 9 , wherein: said table indexes are not related to said corresponding elements as a function argument to a function value.
Operand prefetching (cache prefetching G06F12/0862) · CPC title
LOAD or STORE instructions; Clear instruction · CPC title
to perform operations on memory · CPC title
Migration mechanisms · CPC title
Single storage device · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.