Vector indexed memory access plus arithmetic and/or logical operation processors, methods, systems, and instructions

US9552205B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9552205-B2
Application numberUS-201314040409-A
CountryUS
Kind codeB2
Filing dateSep 27, 2013
Priority dateSep 27, 2013
Publication dateJan 24, 2017
Grant dateJan 24, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processor including a decode unit to receive a vector indexed load plus arithmetic and/or logical (A/L) operation plus store instruction. The instruction is to indicate a source packed memory indices operand that is to have a plurality of packed memory indices. The instruction is also to indicate a source packed data operand that is to have a plurality of packed data elements. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the instruction, is to load a plurality of data elements from memory locations corresponding to the plurality of packed memory indices, perform A/L operations on the plurality of packed data elements of the source packed data operand and the loaded plurality of data elements, and store a plurality of result data elements in the memory locations corresponding to the plurality of packed memory indices.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: a decode unit to decode a vector indexed load plus arithmetic/logical (A/L) operation plus store instruction that has a single opcode, the vector indexed load plus A/L operation plus store instruction to indicate a source packed memory indices operand that is to have a plurality of packed memory indices, and to indicate a source packed data operand that is to have a plurality of packed data elements; and an execution unit coupled with the decode unit, the execution unit, in response to the vector indexed load plus A/L operation plus store instruction, to load a plurality of data elements from memory locations corresponding to the plurality of packed memory indices, to perform A/L operations on the plurality of packed data elements of the source packed data operand and the loaded plurality of data elements, and to store a plurality of result data elements in the memory locations corresponding to the plurality of packed memory indices. 2. The processor of claim 1 , wherein the instruction comprises a gather plus A/L operation plus scatter instruction. 3. The processor of claim 1 , wherein the A/L operations comprise at least one of packed addition operations, packed subtraction operations, packed multiplication operations, packed division operations, packed multiply-add operations, packed shift operations, packed rotate operations, packed logical AND operations, packed logical OR operations, packed logical NOT operations, and packed logical AND NOT operations. 4. The processor of claim 1 , wherein the A/L operations comprise at least one of packed addition operations and packed multiplication operations. 5. The processor of claim 1 , wherein the processor is to perform the vector indexed load plus A/L operation plus store instruction without transferring the loaded data elements into a core. 6. The processor of claim 1 , wherein the execution unit is in an uncore portion of the processor within a memory subsystem. 7. The processor of claim 6 , wherein the decode unit is within a core, and wherein the execution unit is closer to a last level cache than to the core having the decode unit. 8. The processor of claim 1 , wherein a portion of the execution unit that is to perform the A/L operations is to receive the loaded data elements from one of a last level of cache and a next to last level of cache. 9. The processor of claim 1 , wherein the decode unit is to decode the vector indexed load plus A/L operation plus store instruction which is to be a masked vector indexed load plus A/L operation plus store instruction that is to indicate a source packed data operation mask operand. 10. The processor of claim 1 , wherein the decode unit is to decode the vector indexed load plus A/L operation plus store instruction that is to indicate the source packed data operand that is at least 512-bits wide. 11. A method in a processor comprising: receiving a vector indexed load plus arithmetic/logical (A/L) operation plus store instruction having a single opcode, the vector indexed load plus A/L operation plus store instruction indicating a source packed memory indices operand having a plurality of packed memory indices, and indicating a source packed data operand having a plurality of packed data elements; and performing the vector indexed load plus A/L operation plus store instruction including: loading a plurality of data elements from memory locations corresponding to the plurality of packed memory indices; performing A/L operations on the plurality of packed data elements of the source packed data operand and the loaded plurality of data elements; and storing a plurality of result data elements in the memory locations corresponding to the plurality of packed memory indices. 12. The method of claim 11 , wherein receiving comprises receiving a gather plus A/L operation plus scatter instruction. 13. The method of claim 11 , wherein performing the A/L operations comprises performing at least one of packed addition operations, packed subtraction operations, packed multiplication operations, packed division operations, packed multiply-add operations, packed shift operations, packed rotate operations, packed logical AND operations, packed logical OR operations, packed logical NOT operations, and packed logical AND NOT operations. 14. The method of claim 11 , wherein performing the A/L operations comprises performing at least one of packed addition operations and packed multiplication operations. 15. The method of claim 11 , wherein performing the vector indexed load plus A/L operation plus store instruction completes without transferring the loaded data elements into a core. 16. The method of claim 11 , wherein performing the A/L operations is performed by a unit in an uncore portion of the processor within a memory subsystem, and wherein the unit is closer to a last level cache than to a core into which the instruction was received. 17. The method of claim 11 , wherein receiving comprises receiving a masked vector indexed load plus A/L operation plus store instruction that indicates a source packed data operation mask operand. 18. The method of claim 11 , wherein receiving comprises receiving the instruction indicating the source packed data operand that is at least 512-bits wide. 19. A system to process instructions comprising: an interconnect; a dynamic random access memory (DRAM) coupled with the interconnect; and a processor coupled with the interconnect, the processor to receive a vector indexed load plus arithmetic/logical (A/L) operation plus store instruction having a single opcode and that is to indicate a source packed memory indices operand that is to have a plurality of packed memory indices, and that is to indicate a source packed data operand that is to have a plurality of packed data elements, the processor operable, in response to the vector indexed load plus A/L operation plus store instruction to load a plurality of data elements from memory locations in the DRAM corresponding to the plurality of packed memory indices, to perform A/L operations on the plurality of packed data elements of the source packed data operand and the loaded plurality of data elements, and to store a plurality of result data elements in destination storage locations. 20. The system of claim 19 , wherein the destination storage locations comprise the memory locations corresponding to the plurality of packed memory indices in the DRAM. 21. The system of claim 19 , wherein the instruction comprises a gather plus A/L operation plus scatter instruction. 22. An article of manufacture comprising a non-transitory machine-readable storage medium, the non-transitory machine-readable storage medium storing instructions including a vector indexed load plus arithmetic/logical (A/L) operation plus store instruction having a single opcode, the vector indexed load plus A/L operation plus store instruction to indicate a source packed memory indices operand that is to have a plurality of packed memory indices, and to indicate a source packed data operand that is to have a plurality of packed data elements, the vector indexed load plus A/L operation plus store instruction if executed by a machine is to be operable to cause the machine to perform operations comprising: load a plurality of data elements from memory locations corresponding to the plurality of packed memory indices; perform A/L operations on the plurality of packed data elements of the source packed data operand and

Assignees

Inventors

Classifications

  • G09F9/30Primary

    in which the desired character or characters are formed by combining individual elements (panels comprising a number of electrodes in a single cell controlling light arriving from an independent light source, e.g. electro-optical or magneto-optical cell, G02F1/00) · CPC title

  • Vector or matrix data · CPC title

  • single instruction multiple data [SIMD] multiprocessors · CPC title

  • Prefetch instructions; cache control instructions · CPC title

  • LOAD or STORE instructions; Clear instruction · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9552205B2 cover?
A processor including a decode unit to receive a vector indexed load plus arithmetic and/or logical (A/L) operation plus store instruction. The instruction is to indicate a source packed memory indices operand that is to have a plurality of packed memory indices. The instruction is also to indicate a source packed data operand that is to have a plurality of packed data elements. The processor a…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G09F9/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 24 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).