Instruction set for eliminating misaligned memory accesses during processing of an array having misaligned data rows

US2016011870A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016011870-A1
Application numberUS-201414327534-A
CountryUS
Kind codeA1
Filing dateJul 9, 2014
Priority dateJul 9, 2014
Publication dateJan 14, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processor is described having an instruction execution pipeline. The instruction execution pipeline includes an instruction fetch stage to fetch an instruction. The instruction format of the instruction specifies a first input vector, a second input vector and a third input operand. The instruction execution pipeline comprises an instruction decode stage to decode the instruction. The instruction execution pipeline includes a functional unit to execute the instruction. The functional unit includes a routing network to route a first contiguous group of elements from a first end of one of the input vectors to a second end of the instruction's resultant vector, and, route a second contiguous group of elements from a second end of the other of the input vectors to a first end of the instruction's resultant vector. The first and second ends are opposite vector ends. The first and second groups of contiguous elements are defined from the third input operand. The instruction is not capable of routing non-contiguous groups of elements from the input vectors to the instruction's resultant vector. A software pipeline that uses the instruction is also described

First claim

Opening claim text (preview).

1 . A processor, comprising: an instruction execution pipeline comprising: an instruction fetch stage to fetch an instruction, the instruction format of the instruction specifying a first input vector, a second input vector and a third input operand; an instruction decode stage to decode said instruction; a functional unit to execute the instruction, the functional unit including a routing network to route a first contiguous group of elements from a first end of one of said input vectors to a second end of said instruction's resultant vector, and, route a second contiguous group of elements from a second end of the other of said input vectors to a first end of said instruction's resultant vector, said first and second ends being opposite vector ends, wherein, said first and second groups of contiguous elements are defined from said third input operand, said instruction not capable of routing non-contiguous groups of elements from said input vectors to said instruction's resultant vector. 2 . The processor of claim 1 wherein said third input operand is specified as a scalar. 3 . The processor of claim 1 wherein said third input operand is embodied with a mask vector. 4 . The processor of claim 1 wherein said first end is a left end and said second end is a right end. 5 . The processor of claim 1 wherein said first end is a right end and said second end is a left end. 6 . The processor of claim 1 wherein said instruction execution pipeline comprises a second functional unit to execute a second instruction, said second functional unit including a routing network to route a first contiguous group of elements from the second end of a first input vector to the first end of said instruction's resultant vector, and, route a second contiguous group of elements from the first end of a second input vector to the second end of the second instruction's resultant vector, wherein, said first and second groups of contiguous elements are defined from a third input operand of said second instruction. 7 . The processor of claim 6 wherein the first and second functional units are the same functional unit. 8 . A machine readable medium containing program code stored therein that when processed by a computing system causes a method to be performed by a compiler, said method comprising: detecting processing of an array having misaligned data rows; compiling said processing of said array into a software pipelined loop program code sequence having an instruction whose instruction format specifies a first input vector, a second input vector and a third input operand, said instruction to route a first contiguous group of elements from a first end of one of said input vectors to a second end of said instruction's resultant vector, and, route a second contiguous group of elements from a second end of the other of said input vectors to a first end of said instruction's resultant vector, said first and second ends being opposite vector ends, wherein, said first and second groups of contiguous elements are defined from said third input operand, and wherein, said instruction is not capable of routing non-contiguous groups of elements from said input vectors to said instruction's resultant vector. 9 . The machine readable medium of claim 8 wherein peeling is not used in formulating said program code sequence. 10 . The machine readable medium of claim 8 wherein said instruction's resultant vector is an aligned row of said array. 11 . The machine readable medium of claim 10 wherein said program code sequence includes code to process said aligned row. 12 . The machine readable medium of claim 10 wherein said instruction's resultant vector includes sections of two different rows of said array. 13 . The machine readable medium of claim 10 wherein the software pipelined loop on a per cycle basis accepts as an input a next vector having a leading section of a third row and trailing section of second row in an array and wherein the instruction accepts the next vector and a previous cycle's input vector as said first and second input vectors, the previous cycle's vector having a leading section of said second vector and trailing section of a first row in the array. 14 . The machine readable medium of claim 13 wherein memory accesses for the rows are not aligned with row boundaries. 15 . The machine readable medium of claim 10 wherein the software pipelined loop on a per cycle basis writes a next output vector having a trailing section of a first row and a leading section of a second row, wherein the instruction accepts as said first and second input vectors results of processing performed on the first and second rows. 16 . The machine readable medium of claim 15 wherein memory accesses for the rows are not aligned with row boundaries. 17 . A computing system, comprising: a system memory; a processor coupled to said system memory, said processor comprising an instruction execution pipeline comprising: an instruction fetch stage to fetch an instruction, the instruction format of the instruction specifying a first input vector, a second input vector and a third input operand; an instruction decode stage to decode said instruction; a functional unit to execute the instruction, the functional unit including a routing network to route a first contiguous group of elements from a first end of one of said input vectors to a second end of said instruction's resultant vector, and, route a second contiguous group of elements from a second end of the other of said input vectors to a first end of said instruction's resultant vector, said first and second ends being opposite vector ends, wherein, said first and second groups of contiguous elements are defined from said third input operand, said instruction not capable of routing non-contiguous groups of elements from said input vectors to said instruction's resultant vector. 18 . The computing system of claim 17 wherein said third input operand is specified as a scalar. 19 . The computing system of claim 17 wherein said third input operand is embodied with a mask vector. 20 . The computing system of claim 17 wherein said first end is a left end and said second end is a right end. 21 . The computing system of claim 17 wherein said first end is a right end and said second end is a left end. 22 . The computing system of claim 17 wherein said system memory contains compiled code to process an array having misaligned data rows wherein aligned accesses are made to said system memory to process said array's data.

Assignees

Inventors

Classifications

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • Instruction analysis, e.g. decoding, instruction word fields · CPC title

  • Instruction prefetching · CPC title

  • using a mask · CPC title

  • Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016011870A1 cover?
A processor is described having an instruction execution pipeline. The instruction execution pipeline includes an instruction fetch stage to fetch an instruction. The instruction format of the instruction specifies a first input vector, a second input vector and a third input operand. The instruction execution pipeline comprises an instruction decode stage to decode the instruction. The instruc…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/30036. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 14 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).