Instruction set for eliminating misaligned memory accesses during processing of an array having misaligned data rows

US9910670B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9910670-B2
Application numberUS-201414327534-A
CountryUS
Kind codeB2
Filing dateJul 9, 2014
Priority dateJul 9, 2014
Publication dateMar 6, 2018
Grant dateMar 6, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processor is described having an instruction execution pipeline. The instruction execution pipeline includes an instruction fetch stage to fetch an instruction. The instruction format of the instruction specifies a first input vector, a second input vector and a third input operand. The instruction execution pipeline comprises an instruction decode stage to decode the instruction. The instruction execution pipeline includes a functional unit to execute the instruction. The functional unit includes a routing network to route a first contiguous group of elements from a first end of one of the input vectors to a second end of the instruction's resultant vector, and, route a second contiguous group of elements from a second end of the other of the input vectors to a first end of the instruction's resultant vector. The first and second ends are opposite vector ends. The first and second groups of contiguous elements are defined from the third input operand. The instruction is not capable of routing non-contiguous groups of elements from the input vectors to the instruction's resultant vector. A software pipeline that uses the instruction is also described.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: an instruction execution pipeline comprising: an instruction fetch stage to fetch a first instruction and a second instruction, an instruction format of the first instruction specifying a first register storing a first input vector that is misaligned with respect to memory addressing space, a second register storing a second, next input vector that is misaligned with respect to the memory addressing space, a first resultant vector, and a third input operand, and an instruction format of the second instruction specifying the second register storing the second, next input vector that is misaligned with respect to the memory addressing space, a third register storing a third, next input vector that is misaligned with respect to the memory addressing space, a second resultant vector, and a fourth input operand; an instruction decoder to decode said first instruction into a decoded first instruction, and decode said second instruction into a decoded second instruction; and an execution unit to: execute the decoded first instruction to cause a routing network to route a first contiguous group of elements from a first end of the first input vector to a second end of said first resultant vector, route a second contiguous group of elements from a second end of the second, next input vector to a first end of said first resultant vector, said first end of the first input vector and said second end of the second, next input vector being opposite vector ends, and preserve the second, next input vector in the second register after execution of the decoded first instruction, wherein said first and second contiguous groups of elements of the first input vector and the second, next input vector are defined from said third input operand, and execute the decoded second instruction to cause the routing network to route a first contiguous group of elements from a first end of the second, next input vector to a second end of said second resultant vector, route a second contiguous group of elements from a second end of the third, next input vector to a first end of said second resultant vector, said first end of the second, next input vector and said second end of the third, next input vector being opposite vector ends, wherein said first and second contiguous groups of elements of the second, next input vector and the third, next input vector are defined from said fourth input operand. 2. The processor of claim 1 , wherein said third input operand is specified as a scalar. 3. The processor of claim 1 , wherein said third input operand is embodied with a mask vector. 4. The processor of claim 1 , wherein said first end of the first input vector is a left end and said second end of the second, next input vector is a right end. 5. The processor of claim 1 , wherein said first end of the first input vector is a right end and said second end of the second, next input vector is a left end. 6. The processor of claim 1 , wherein the execution unit is to execute the decoded second instruction to further preserve the third, next input vector in the third register after execution of the decoded second instruction. 7. The processor of claim 1 , wherein the first resultant vector is stored as a resultant of the first instruction in a register that is not the first register and not the second register. 8. A non-transitory machine readable medium containing program code stored therein that when processed by a computing system causes a method to be performed, said method comprising: detecting processing of an array having misaligned data rows; compiling said processing of said array into a software pipelined loop a program code sequence having a first instruction and a second instruction, an instruction format of the first instruction specifying a first register storing a first input vector of the array that is misaligned with respect to memory addressing space, a second register storing a second, next input vector of the array that is misaligned with respect to the memory addressing space, a first resultant vector, and a third input operand, and an instruction format of the second instruction specifying the second register storing the second, next input vector of the array that is misaligned with respect to the memory addressing space, a third register storing a third, next input vector of the array that is misaligned with respect to the memory addressing space, a second resultant vector, and a fourth input operand; decoding said first instruction into a decoded first instruction; decoding said second instruction into a decoded second instruction; executing the decoded first instruction to cause a routing network to route a first contiguous group of elements from a first end of the first input vector to a second end of said first resultant vector, route a second contiguous group of elements from a second end of the second, next input vector to a first end of said first resultant vector, said first end of the first input vector and said second end of the second, next input vector being opposite vector ends, and preserve the second, next input vector in the second register after execution of the decoded first instruction, wherein said first and second contiguous groups of elements of the first input vector and the second, next input vector are defined from said third input operand; and executing the decoded second instruction to cause the routing network to route a first contiguous group of elements from a first end of the second, next input vector to a second end of said second resultant vector, route a second contiguous group of elements from a second end of the third, next input vector to a first end of said second resultant vector, said first end of the second, next input vector and said second end of the third, next input vector being opposite vector ends, wherein said first and second contiguous groups of elements of the second, next input vector and the third, next input vector are defined from said fourth input operand. 9. The non-transitory machine readable medium of claim 8 , wherein peeling is not used in formulating said program code sequence. 10. The non-transitory machine readable medium of claim 8 , wherein said first resultant vector is an aligned row of said array. 11. The non-transitory machine readable medium of claim 10 , wherein said program code sequence includes code to process said aligned row. 12. The non-transitory machine readable medium of claim 10 , wherein said first resultant vector includes sections of two different rows of said array. 13. The non-transitory machine readable medium of claim 10 , wherein the executing of the decoded second instruction further comprises preserving the third, next input vector in the third register after execution of the decoded second instruction. 14. The non-transitory machine readable medium of claim 13 , wherein memory accesses for the rows are not aligned with row boundaries. 15. The non-transitory machine readable medium of claim 10 , wherein the first resultant vector is stored as a resultant of the first instruction in a register that is not the first register and not the second register. 16. The non-transitory machine readable medium of claim 15 , wherein memory accesses for the rows are not aligned with row boundaries. 17. A computing system comprising: a system memory; a processor coupled to said system memory, said processor comprising an instruction execution pipeline comprising: an instruction fetch stage to fetch a first instruction and a second instruction, an instruction format of the firs

Assignees

Inventors

Classifications

  • Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • using a mask · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9910670B2 cover?
A processor is described having an instruction execution pipeline. The instruction execution pipeline includes an instruction fetch stage to fetch an instruction. The instruction format of the instruction specifies a first input vector, a second input vector and a third input operand. The instruction execution pipeline comprises an instruction decode stage to decode the instruction. The instruc…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/30036. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 06 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).