Instruction and logic for cache-based speculative vectorization

US9690582B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9690582-B2
Application numberUS-201314143576-A
CountryUS
Kind codeB2
Filing dateDec 30, 2013
Priority dateDec 30, 2013
Publication dateJun 27, 2017
Grant dateJun 27, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processor includes a decoder to decode an instruction, a scheduler to schedule the instruction, and an execution unit to execute the instruction. The instruction is to load a memory operation applicable to a quantity of addresses into an execution vector. The execution vector includes a plurality of vector positions for respective addressees. The instruction is further to evaluate, for a given address in the execution vector at a vector position, whether a cache indicates that a previous memory operation was performed at a higher vector position than the vector position of the given address. The instruction is also to determine, based on the evaluation whether the cache indicates that the previous memory operation was performed at a higher vector position than the vector position of the given address, whether the memory operation will cause a memory error.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising, within a processor: decoding a first instruction, the first instruction to: load a memory operation applicable to a quantity of addresses into an execution vector, the execution vector including a plurality of vector positions for respective addressees, the addresses arising from a plurality of different cache lines of a cache; evaluate, for each address in the execution vector and during execution of the first instruction, each address being at a respective vector position, whether the cache indicates that a previous memory operation to the address was performed at a higher vector position than the vector position of the address in the execution vector; and determine, based on an evaluation result indicating whether, for at least one address in the execution vector, the cache indicates that a previous memory operation to the address was performed at a higher vector position than the vector position of the address in the execution vector, whether the memory operation will cause a memory error; scheduling the first instruction; and executing the first instruction. 2. The method of claim 1 , wherein the first instruction is further to determine a plurality of vector positions indicated in a given cache line of the cache that are less than or equal to half of a vector length of the processor. 3. The method of claim 1 , further comprising: decoding a second instruction, the second instruction to set an indication of performance of the memory operation on each of the addresses in the execution vector in the cache; scheduling the second instruction; and executing the second instruction. 4. The method of claim 3 , wherein, for each of the addresses in the execution vector, a position of the indication of performance of the memory operation on the address matches the vector position of the memory operation in the execution vector for the address. 5. The method of claim 1 , wherein the first instruction is further to: evaluate a plurality of bits in the cache corresponding to vector positions of the processor; and associate a given bit with previous application of the memory operation to an address at a specified vector position in the execution vector. 6. The method of claim 5 , wherein, for at least one of the addresses in the execution vector, the first instruction is further to: identify an entry in the cache with a data value corresponding to the address; identify, for the address at the vector position in the execution vector, a leftmost bit indicating that the memory operation was applied to the address; evaluate whether the leftmost bit is to the right of the vector position of the address; and determine, based on the evaluation of whether the leftmost bit is to the right of the vector position of the addresses, whether the memory operation applied to the address will cause a memory error. 7. A processor, comprising: a decoder including circuitry to decode a first instruction, the first instruction to: load a memory operation applicable to a quantity of addresses into an execution vector, the execution vector including a plurality of vector positions for respective addressees, the addresses arising from a plurality of different cache lines of a cache; evaluate, for each address in the execution vector and during execution of the first instruction, each address being at a respective vector position, whether the cache indicates that a previous memory operation to the address was performed at a higher vector position than the vector position of the address in the execution vector; and determine, based on an evaluation result indicating whether, for at least one address in the execution vector, the cache indicates that a previous memory operation to the address was performed at a higher vector position than the vector position of the address in the execution vector, whether the memory operation will cause a memory error; a scheduler including circuitry to schedule the first instruction; and an execution unit including circuitry to execute the first instruction. 8. The processor of claim 7 , wherein the first instruction is further to determine a plurality of vector positions indicated in a given cache line of the cache that are less than or equal to half of a vector length of the processor. 9. The processor of claim 7 , wherein: the decoder is further to decode a second instruction, the second instruction to set an indication of performance of the memory operation on each of the addresses in the execution vector in the cache; the scheduler is further to schedule the second instruction; and the execution unit is further to execute the second instruction. 10. The processor of claim 9 , wherein, for each of the addresses in the execution vector, a position of the indication of performance of the memory operation on the address matches the vector position of the memory operation in the execution vector for the address. 11. The processor of claim 7 , wherein the first instruction is further to: evaluate a plurality of bits in the cache corresponding to vector positions of the processor; and associate a given bit with previous application of the memory operation to an address at a specified vector position in the execution vector. 12. The processor of claim 11 , wherein, for at least one of the addresses in the execution vector, the first instruction is further to: identify an entry in the cache with a data value corresponding to the address; identify, for the address at the vector position in the execution vector, a leftmost bit indicating that the memory operation was applied to the address; evaluate whether the leftmost bit is to the right of the vector position of the address; and determine, based on the evaluation of whether the leftmost bit is to the right of the vector position of the addresses, whether the memory operation applied to the address will cause a memory error. 13. The processor of claim 7 , wherein: the decoder is further to decode a second instruction, the second instruction to clear identifiers of the cache; the scheduler is further to schedule the second instruction; and the execution unit is further to execute the second instruction. 14. A system, comprising: a memory; and a processor communicatively coupled to the memory and including: a decoder including circuitry to decode a first instruction, the first instruction to: load a memory operation applicable to a quantity of addresses into an execution vector, the execution vector including a plurality of vector positions for respective addressees, the addresses arising from a plurality of different cache lines of a cache; evaluate, for each address in the execution vector and during execution of the first instruction, each address being at a respective vector position, whether the cache indicates that a previous memory operation to the address was performed at a higher vector position than the vector position of the address in the execution vector; and determine, based on an evaluation result indicating whether, for at least one address in the execution vector, the cache indicates that a previous memory operation to the address was performed at a higher vector position than the vector position of the address in the execution vector, whether the memory operation will cause a memory error; a scheduler including circuitry to schedule the first instruction; and an execution unit including circuitry to execute the first instruction. 15. The system of claim 14 , wherein the first instruction is further to determine a plurality of vector positions indicate

Assignees

Inventors

Classifications

  • Maintaining memory consistency · CPC title

  • Operand accessing · CPC title

  • LOAD or STORE instructions; Clear instruction · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9690582B2 cover?
A processor includes a decoder to decode an instruction, a scheduler to schedule the instruction, and an execution unit to execute the instruction. The instruction is to load a memory operation applicable to a quantity of addresses into an execution vector. The execution vector includes a plurality of vector positions for respective addressees. The instruction is further to evaluate, for a give…
Who is the assignee on this patent?
Vasudevan Nalini, Wu Youfeng, Wang Cheng, and 3 more
What technology area does this patent fall under?
Primary CPC classification G06F9/30043. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 27 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).