Compiler optimizations for vector instructions
US-9619214-B2 · Apr 11, 2017 · US
US2017123792A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2017123792-A1 |
| Application number | US-201514930740-A |
| Country | US |
| Kind code | A1 |
| Filing date | Nov 3, 2015 |
| Priority date | Nov 3, 2015 |
| Publication date | May 4, 2017 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processor includes a register and a load store unit (LSU). The LSU loads data into the register from a memory. When in little endian mode, bytes from sequentially increasing memory addresses are loaded in order of corresponding sequentially increasing byte memory addresses from a first end (right end) of the register to a second end (left end) of the register. When in big endian mode, bytes from sequentially increasing memory addresses are loaded in order of corresponding sequentially increasing memory addresses from the second end (left end) of the register to the first end (right) of the register. Therefore, regardless of operating in little or big endian mode, the data in the register has its most significant byte on its left side and its least significant byte on its right side which simplifies the execution of SIMD instructions because the data is aligned the same for both endian modes.
Opening claim text (preview).
What is claimed is: 1 . A processor system comprising: a register; and a load store unit (LSU) configured to load data into the register from a memory, wherein when in little endian mode bytes from sequentially increasing memory addresses are loaded in order of corresponding sequentially increasing byte memory addresses from a first end of the register to a second end of the register, and wherein when in big endian mode bytes from sequentially increasing memory addresses are loaded in order of corresponding sequentially increasing memory addresses from the second end of the register to the first end of the register. 2 . The processor system of claim 1 wherein bytes are stored within the register as data elements, wherein the data elements are sized according to one of the group of: byte, half-word, word, double-word, and quad-word. 3 . The processor system of claim 2 wherein the register is a first source register and further comprising: an arithmetic local unit (ALU) configured to execute an instruction using data from the first source register and a second source register, wherein when performing one or more operations from the group of: addition, subtraction, and multiplication bits within individual data elements that are input to the ALU from the first source register and the second source register are not reordered regardless of whether the system is operating in big endian mode or little endian mode. 4 . The processor system of claim 3 wherein the instruction is a single instruction multiple data (SIMD) instruction. 5 . The processor system of claim 3 further comprising: input reordering logic configured to rearrange an order of bits from the first source register before data from the first source register is input to the ALU based, at least in part, whether the processor system is operating in big ending mode or little ending mode. 6 . The processor system of claim 5 wherein the SIMD instruction represents four instructions operating on four data element word pairs, wherein word data element pair i[3] and i[7] is processed together, word data element pair i[2] and i[6] is processed together, word data element pair i[1] and i[5] is processed together, and word data element pair i[0] and i[4] is processed together. 7 . The processor system of claim 6 wherein the data element pairs are processed in parallel in multiple ALUs. 8 . The processor system of claim 2 further comprising: a single load instruction to configured to cause the LSU to load data with data elements from memory to the register without regard to a size of the data elements. 9 . The processor system of claim 3 further comprising: output reordering logic configured to align bytes of an output value calculated by the ALU. 10 . The processor system of claim 1 wherein the LSU is further configured to return bytes stored in the register to original memory byte addresses from which the bytes stored in the register were loaded. 11 . The processor system of claim 1 further comprising: search logic configured to search in little endian mode for a byte value starting at the first end of the register and searching byte by byte for the byte value until reaching the second end of the register, and wherein the search logic is configured to search in big endian mode for a byte value starting at the second end of the register and searching byte by byte for the byte value until reaching the first end of the register. 12 . A processor system, comprising: a load store unit (LSU) configured to execute load instructions and store instructions to access data in a memory comprising multiple distinct data elements, wherein the load instruction and store instructions do not differentiate as to a size of the multiple distinct data elements; and a register file configured to receive data in response to load instructions and to provide data for storing to memory in response to store instructions, wherein contents of a register differs in dependence on whether the register was loaded in either a big endian or a little endian mode. 13 . The processor system of claim 12 , further comprising: an execution unit configured to perform Single Instruction Multiple Data (SIMD) operations on one or more source registers and configured to store a result of the operation in one or more destination registers, wherein the execution unit receives an indication of endian mode to identify a location within the one or more source registers where a particular data element is located. 14 . The processor system of claim 12 , wherein there is one load instruction and one store instruction for data to be used in an SIMD operation, regardless of intended element size of the operation to be performed. 15 . The processor system of claim 14 , wherein the intended element size is one of the group of: byte, half-word, word, double-word, and quad-word. 16 . The processor system of claim 12 , wherein the load store unit populates a destination register for a load instruction with a first appearing data element at a most significant byte portion of the destination register when operating in big endian mode. 17 . The processor system of claim 12 , wherein the load store unit populates a destination register for a load instruction with a first appearing data element at a least significant byte portion of the destination register for little endian mode. 18 . The processor system of claim 12 , wherein the processor system executes a search instruction logically starting at one end of a source register for both big endian mode and little endian mode. 19 . A processor system, comprising: a load store unit (LSU); a single load instruction to cause the LSU to load data from memory to a register, wherein in little endian mode the byte of the first memory address is loaded into the least significant byte (LSB) of the register with bytes of consecutively increasing addresses loaded next to each other in the register with the with the byte at the largest memory address loaded in the most significant byte (MSB) of the register, wherein in big endian mode the byte of the first memory address is loaded into the MSB of the register with bytes of consecutively increasing addresses loaded next to each other in the register with the byte at the largest memory address loaded in the LSB of the register; and a single store instruction to cause the LSU in little endian mode to store data from the register to memory at a starting memory address with the LSB of the register loaded to the lowest starting memory address with consecutive bytes loaded to consecutively increasing memory addresses until the MSB of the register is loaded to the largest last memory address that is addressed by the single store instruction, and wherein the single store instruction is configured to cause the LSU in big endian mode to store data from the register to memory at a starting memory address with the MSB of the register loaded to the lowest starting memory address with consecutive bytes loaded to consecutively increasing memory addresses until the LSB of the register is loaded to the largest last memory address that is addressed by the single store instruction. 20 . The processor system of claim 20 further comprising: an execution pipeline; and an execution unit configured to perform Single Instruction Multiple Data (SIMD) operations on one or more source registers loaded by the single load instruction and configured to store a result of the operation in one or more destination registers.
Arithmetic instructions · CPC title
Operand accessing · CPC title
Register arrangements · CPC title
Organisation of register space, e.g. banked or distributed register file · CPC title
LOAD or STORE instructions; Clear instruction · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.