Systems, apparatuses, and methods for performing a horizontal add or subtract in response to a single instruction
US-9619226-B2 · Apr 11, 2017 · US
US10102888B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10102888-B2 |
| Application number | US-201715728293-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 9, 2017 |
| Priority date | Jun 28, 2013 |
| Publication date | Oct 16, 2018 |
| Grant date | Oct 16, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processor includes N-bit registers and a decode unit to receive a multiple register memory access instruction. The multiple register memory access instruction is to indicate a memory location and a register. The processor includes a memory access unit coupled with the decode unit and with the N-bit registers. The memory access unit is to perform a multiple register memory access operation in response to the multiple register memory access instruction. The operation is to involve N-bit data, in each of the N-bit registers comprising the indicated register. The operation is also to involve different corresponding N-bit portions of an M×N-bit line of memory corresponding to the indicated memory location. A total number of bits of the N-bit data in the N-bit registers to be involved in the multiple register memory access operation is to amount to at least half of the M×N-bits of the line of memory.
Opening claim text (preview).
What is claimed is: 1. A processor comprising: a plurality of N-bit registers; a decode unit to decode a multiple register memory access instruction, the multiple register memory access instruction to indicate a memory location and to indicate a register; and a memory access unit coupled with the decode unit and coupled with the plurality of the N-bit registers, the memory access unit to perform a multiple register memory access operation in response to the multiple register memory access instruction, the multiple register memory access operation to involve N-bit data, in each of the plurality of the N-bit registers that are to comprise the indicated register, and different corresponding N-bit portions of an M×N-bit line of memory, that is to correspond to the indicated memory location, in which a total number of bits of the N-bit data, in the plurality of the N-bit registers to be involved in the multiple register memory access operation, is to amount to at least half of the M×N-bit line of memory. 2. The processor of claim 1 , in which the memory access unit is to perform the operation in which the total number of bits of the N-bit data in the plurality of the N-bit registers to be involved in the multiple register memory access operation is to amount to all of the M×N-bit line of memory. 3. The processor of claim 1 , in which the memory access unit is to perform the operation in which the total number of bits of the N-bit data in the plurality of the N-bit registers to be involved in the multiple register memory access operation is to amount to at least 256-bits. 4. The processor of claim 3 , in which the memory access unit is to perform the operation in which the total number of bits of the N-bit data in the plurality of the N-bit registers to be involved in the multiple register memory access operation is to amount to at least 512-bits. 5. The processor of claim 1 , in which the memory access unit is to perform the operation that is to involve the N-bit data in each of at least three N-bit registers. 6. The processor of claim 5 , in which the memory access unit is to perform the operation that is to involve the N-bit data in each of at least four N-bit registers. 7. The processor of claim 1 , in which the memory access unit is to perform the operation that is to involve 128-bit data, in each of at least four 128-bit registers, and the different corresponding N-bit portions which are 128-bit portions of the line of memory that is to be at least 512-bits. 8. The processor of claim 1 , in which the memory access unit is to perform the operation that is to involve 256-bit data, in each of at least two 256-bit registers, and the different corresponding N-bit portions which are 256-bit portions of the line of memory that is to be at least 512-bits. 9. The processor of claim 1 , in which the processor comprises a reduced instruction set computing (RISC) processor, and in which the multiple register memory access instruction comprises a multiple register load from memory instruction, and in which the memory access unit is to load the different N-bit portions of the M×N-bit line of memory in the plurality of the N-bit registers, in response to the multiple register load from memory instruction, in which the total number of bits of the different N-bit portions to be loaded in the plurality of the N-bit registers from the M×N-bit line of memory is to amount to at least half of the M×N-bit line of memory. 10. The processor of claim 9 , in which the memory access unit is to load different 128-bit portions of the line of memory which is at least 512-bits in each of at least four 128-bit registers. 11. The processor of claim 9 , in which the memory access unit is to load different 256-bit portions of the line of memory which is at least 512-bits in each of at least two 256-bit registers. 12. The processor of claim 1 , in which the processor comprises a reduced instruction set computing (RISC) processor, and in which the multiple register memory access instruction comprises a multiple register write to memory instruction, and in which the memory access unit is to write the N-bit data, from the plurality of the N-bit registers, to the different corresponding N-bit portions of the M×N-bit line of memory, in response to the multiple register write to memory instruction, in which the total number of bits of the N-bit data to be written from the plurality of the N-bit registers to the M×N-bit line of memory is to amount to at least half of the M×N-bit line of memory, in which the at least half of the M×N-bit line of memory is at least 256-bits. 13. The processor of claim 1 , in which the multiple register memory access instruction is to explicitly specify each of the plurality of registers. 14. The processor of claim 1 , in which the multiple register memory access instruction is to specify a number of the plurality of registers. 15. A processor comprising: a cache to store a plurality of cache lines; a plurality of general purpose registers; a plurality of 128-bit packed data registers, including a first destination 128-bit packed data register, and a second destination 128-bit packed data register; an instruction fetch unit to fetch instructions, including a load from memory instruction; a decode unit to decode the load from memory instruction, the load from memory instruction indicating a starting memory location in a memory, the starting memory location associated with data to be loaded, and the load from memory instruction having a first field to specify the first destination 128-bit packed data register, and having a second field to specify the second destination 128-bit packed data register; and a memory access unit coupled to the decode unit, and coupled to the plurality of 128-bit packed data registers, the memory access unit to perform a load from memory operation in response to the decoded load from memory instruction, the load from memory operation to: load a first 128-bit data from the indicated starting memory location, and store the loaded first 128-bit data in the first destination 128-bit packed data register; and load a second 128-bit data, which is adjacent to the first 128-bit data, and store the loaded second 128-bit data in the second destination 128-bit packed data register. 16. The processor of claim 15 , further comprising a plurality of write mask registers to predicate result vector writes. 17. The processor of claim 15 , further comprising a 16-wide vector processing unit to execute double-precision float instructions. 18. The processor of claim 15 , wherein the cache is to store 512-bit cache lines. 19. The processor of claim 15 , wherein the cache is to store 1024-bit cache lines. 20. The processor of claim 15 , wherein the cache is to store cache lines that are a multiple of a width of the 128-bit packed data registers. 21. The processor of claim 15 , further comprising: a branch prediction unit; and a translation lookaside buffer (TLB). 22. The processor of claim 15 , wherein the processor is a reduced instruction set computing (RISC) processor. 23. A method performed by a processor, the method comprising: storing a plurality of cache lines in a cache of the processor; storing data in a plurality of general purpose registers of the processor; fetching a load from memory instruction with an instruction fetch unit of the processor; decoding the load from memory instruction with a decode unit of the processor, the load from memory inst
using data shift registers · CPC title
LOAD or STORE instructions; Clear instruction · CPC title
having multiple operands in a single register · CPC title
with implied specifier, e.g. top of stack · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.