Gather-op instruction to duplicate a mask and perform an operation on vector elements gathered via tracked offset-based gathering
US-9747101-B2 · Aug 29, 2017 · US
US10387151B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10387151-B2 |
| Application number | US-201113250223-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 30, 2011 |
| Priority date | Dec 31, 2007 |
| Publication date | Aug 20, 2019 |
| Grant date | Aug 20, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and apparatus are disclosed for accessing multiple data cache lines for scatter/gather operations. Embodiment of apparatus may comprise address generation logic to generate an address from an index of a set of indices for each of a set of corresponding mask elements having a first value. Line or bank match ordering logic matches addresses in the same cache line or different banks, and orders an access sequence to permit a group of addresses in multiple cache lines and different banks. Address selection logic directs the group of addresses to corresponding different banks in a cache to access data elements in multiple cache lines corresponding to the group of addresses in a single access cycle. A disassembly/reassembly buffer orders the data elements according to their respective bank/register positions, and a gather/scatter finite state machine changes the values of corresponding mask elements from the first value to a second value.
Opening claim text (preview).
What is claimed is: 1. A processor comprising: a cache memory having a plurality of banks to store data in mutually exclusive portions of a cache line; a first register comprising a plurality of data fields, wherein the plurality of data fields in the first register corresponds to a plurality of data elements accessible using a plurality of corresponding indices in a second register, wherein for each data field in the first register, a first value indicates the corresponding data element has not been accessed and a second value indicates that the corresponding data element does not need to be, or has already been, accessed using a corresponding index from the second register; a decode stage to decode a first instruction; and one or more execution units, responsive to the decoded first instruction, to: read the values of each of the plurality of data fields in the first register; for two or more of the plurality of data fields in the first register having the first value, determine a first pair of corresponding data elements stored in different banks of the cache memory, and simultaneously access the first pair of corresponding data elements in said different banks using their corresponding indices; and change the values of a pair of data fields in the first register corresponding to said first pair of corresponding data elements from the first value to the second value. 2. The processor of claim 1 wherein said simultaneously accessing the first pair of corresponding data elements means gathering the first pair of corresponding data elements from said different banks in a single cache access. 3. The processor of claim 1 wherein said simultaneously accessing the first pair of corresponding data elements means scattering the first pair of corresponding data elements to said different banks in a single cache access. 4. A processor comprising: a cache memory having a plurality of banks to store data in mutually exclusive portions of a cache line; a first register comprising data fields, wherein each data field in the first register corresponds to a data element to be written into a second register, wherein for each data field in the first register, a first value is to indicate the corresponding data element has not been written into the second register and a second value is to indicate that the corresponding data element does not need to be, or has already been, written into the second register; a decode stage to decode a first instruction; and one or more execution units, responsive to the decoded first instruction, to: read the values of each of the data fields in the first register; for a plurality of data fields in the first register having the first value, determine a first pair of corresponding data elements stored in different banks of the cache memory, and access said different banks using a second pair of addresses, corresponding to said first pair of corresponding data elements, to gather the first pair of corresponding data elements and write the first pair of corresponding data elements into the second register; and change the values of a third pair of data fields in the first register, corresponding to said first pair of corresponding data elements, from the first value to the second value. 5. The processor of claim 4 further comprising: a disassembly/reassembly buffer, coupled with the cache memory and with the second register, to order the first pair of corresponding data elements according to the respective positions of the third pair of data fields in the first register to be merged into the second register. 6. The processor of claim 4 further comprising: line or bank match ordering circuitry to match the second pair of addresses corresponding to different banks to determine the first pair of corresponding data elements. 7. A method comprising: decoding a first instruction; and executing the decoded first instruction, to: read values of each of a plurality of data fields in a first register, wherein the plurality of data fields in the first register corresponds to a plurality of data elements accessible using a plurality of corresponding indices in a second register, wherein for each data field in the first register, a first value indicates the corresponding data element has not been accessed and a second value indicates that the corresponding data element does not need to be, or has already been, accessed using a corresponding index from the second register, for two or more of the plurality of data fields in the first register having the first value, determine a first pair of corresponding data elements stored in different banks of a cache memory having a plurality of banks to store data in mutually exclusive portions of a cache line, and simultaneously access the first pair of corresponding data elements in said different banks using their corresponding indices; and change the values of a pair of data fields in the first register corresponding to said first pair of corresponding data elements from the first value to the second value. 8. The method of claim 7 wherein said simultaneously accessing the first pair of corresponding data elements means gathering the first pair of corresponding data elements from said different banks in a single cache access. 9. The method of claim 7 wherein said simultaneously accessing the first pair of corresponding data elements means scattering the first pair of corresponding data elements to said different banks in a single cache access. 10. A method comprising: decoding a first instruction; and executing the decoded first instruction, to: read values of each data field in a first register, wherein each data field in the first register corresponds to a data element to be written into a second register, wherein for each data field in the first register, a first value indicates the corresponding data element has not been written into the second register and a second value indicates that the corresponding data element does not need to be, or has already been, written into the second register, for a plurality of data fields in the first register having the first value, determine a first pair of corresponding data elements stored in different banks of a cache memory having a plurality of banks to store data in mutually exclusive portions of a cache line, and access said different banks using a second pair of addresses, corresponding to said first pair of corresponding data elements, to gather the first pair of corresponding data elements and write the first pair of corresponding data element into the second register; and change the values of a third pair of data fields in the first register, corresponding to said first pair of corresponding data elements, from the first value to the second value. 11. The method of claim 10 further comprising: ordering the first pair of corresponding data elements according to the respective positions of the third pair of data fields in the first register to be merged into the second register. 12. The method of claim 10 further comprising: matching the second pair of addresses corresponding to different banks to determine the first pair of corresponding data elements.
of multiple operands or results {(addressing multiple banks G06F12/06)} · CPC title
LOAD or STORE instructions; Clear instruction · CPC title
in hierarchically structured memory systems, e.g. virtual memory systems · CPC title
Special purpose registers · CPC title
Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication (G06F12/08 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.