Processing memory access instructions that have duplicate memory indices

US9842046B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9842046-B2
Application numberUS-201213631378-A
CountryUS
Kind codeB2
Filing dateSep 28, 2012
Priority dateSep 28, 2012
Publication dateDec 12, 2017
Grant dateDec 12, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of an aspect includes receiving an instruction indicating a first source packed memory indices, a second source packed data operation mask, and a destination storage location. Memory indices of the packed memory indices are compared with one another. One or more sets of duplicate memory indices are identified. Data corresponding to each set of duplicate memory indices is loaded only once. The loaded data corresponding to each set of duplicate memory indices is replicated for each of the duplicate memory indices in the set. A packed data result in the destination storage location in response to the instruction. The packed data result includes data elements from memory locations that are indicated by corresponding memory indices of the packed memory indices when not blocked by corresponding elements of the packed data operation mask.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving an instruction, the instruction indicating a first source packed memory indices, indicating a second source packed data operation mask, and indicating a destination storage location; comparing memory indices of the first source packed memory indices with one another for equality; identifying one or more sets of duplicate identical memory indices; loading data corresponding to each set of duplicate memory indices only once; replicating the loaded data corresponding to each set of duplicate memory indices for each of the duplicate memory indices in the set that is not blocked by a corresponding element of the second source packed data operation mask; and storing a packed data result in the destination storage location in response to the instruction, the packed data result including data elements from memory locations that are indicated by corresponding memory indices of the packed memory indices when not blocked by corresponding elements of the second source packed data operation mask. 2. The method of claim 1 , wherein comparing the memory indices comprises comparing the memory indices prior to generating memory addresses from the memory indices. 3. The method of claim 1 , further comprising: generating a broadcast mask indicating a set of duplicate memory indices; and replicating data loaded for the set of duplicate memory indices for each of the duplicate memory indices in the set using the generated broadcast mask. 4. The method of claim 3 , further comprising logically associating a load for the data with the generated broadcast mask. 5. The method of claim 1 , further comprising: selecting a memory index of the packed memory indices; comparing the selected memory index to all more significant memory indices of the packed memory indices; and generating a broadcast mask to indicate at least one of the more significant memory indices that is a duplicate of the selected memory index. 6. The method of claim 1 , wherein comparing comprises: comparing a first memory index to a plurality other memory indices before loading data for the first memory index; and comparing a second memory index to a plurality of other memory indices after loading the data for the first memory index. 7. The method of claim 1 , wherein comparing comprises comparing each of the packed memory indices to one or more other memory indices before loading data for any of the packed memory indices. 8. The method of claim 1 , wherein receiving comprises receiving an instruction indicating a first source packed memory indices that is at least 512-bits wide and that has 64-bit memory indices. 9. An apparatus comprising: a plurality of packed data registers; a decoder to decode an instruction, the instruction to indicate a first source packed memory indices; and execution logic coupled with the decoder and coupled with the packed data registers, the execution logic comprising: comparison logic including at least some circuitry to compare memory indices of the first source packed memory indices with one another for equality and to identify a set of duplicate equal memory indices; broadcast mask generation logic coupled with the comparison logic and including at least some circuitry, the broadcast mask generation logic to generate a broadcast mask indicating that is to indicate the set of duplicate memory indices; load logic including at least some circuitry to load data for the set of duplicate memory indices only once; and broadcast logic including at least some circuitry to broadcast the data loaded for the set of duplicate memory indices to each of the duplicate memory indices in the set using the generated broadcast mask. 10. The apparatus of claim 9 , wherein the comparison logic is to compare the memory indices before memory addresses are generated from the memory indices. 11. The apparatus of claim 9 , wherein the comparison logic is to: compare a first memory index to a plurality other memory indices before data is loaded for the first memory index; and compare a second memory index to a plurality of other memory indices after data is loaded for the first memory index. 12. The apparatus of claim 9 , wherein the comparison logic is to batch compare each of the packed memory indices to one of all more significant memory indices and all less significant memory indices. 13. The apparatus of claim 9 , wherein the comparison logic is to compare each of the packed memory indices to one or more other memory indices before data is loaded for any of the packed memory indices. 14. The apparatus of claim 9 , wherein the comparison logic is also used to implement a vector conflict instruction. 15. The apparatus of claim 9 , wherein the first packed memory indices is at least 512-bits wide and has 64-bit memory indices. 16. A system comprising: an interconnect; a processor coupled with the interconnect, the processor to receive an instruction, the instruction to indicate a first source packed memory indices, the processor comprising: comparison logic including at least some circuitry to compare the memory indices of the first source packed memory indices with one another for equality and to identify a set of duplicate equal memory indices; broadcast mask generation logic coupled with the comparison logic and including at least some circuitry, the broadcast mask generation logic to generate a broadcast mask that is to indicate the set of duplicate memory indices; load logic including at least some circuitry to load data for the set of duplicate memory indices only once; and broadcast logic including at least some circuitry to broadcast the data loaded for the set of duplicate memory indices to each of the duplicate memory indices in the set using the generated broadcast mask; and a dynamic random access memory (DRAM) coupled with the interconnect. 17. The system of claim 16 , wherein the comparison logic is to: compare a first memory index to a plurality other memory indices before data is loaded for the first memory index; and compare a second memory index to a plurality of other memory indices after data is loaded for the first memory index. 18. The system of claim 16 , wherein the comparison logic is to compare each of the packed memory indices to one or more other memory indices before data is loaded for any of the packed memory indices. 19. An article of manufacture comprising: a tangible non-transitory machine-readable storage medium storing a gather instruction, the gather instruction to indicate a packed memory indices, the gather instruction if executed by a machine operable to cause the machine to perform operations comprising to: compare the packed memory indices with one another for equality; identify a set of duplicate identical memory indices; generate a broadcast mask that is to indicate the set of duplicate memory indices; load data for the set of duplicate memory indices only once; and broadcast the data loaded for the set of duplicate memory indices to each of the duplicate memory indices in the set using the generated broadcast mask. 20. The article of manufacture of claim 19 , wherein the machine-readable storage medium comprises a random access memory of the machine. 21. The article of manufacture of claim 19 , wherein comparing the packed memory indices is performed in batch prior to loading data for any of the packed memory indices. 22. An apparatus comprising: a plurality of packed data regi

Assignees

Inventors

Classifications

  • using de-duplication of the data · CPC title

  • De-duplication techniques · CPC title

  • Saving storage space on storage systems · CPC title

  • G06F9/3838Primary

    Dependency mechanisms, e.g. register scoreboarding · CPC title

  • Arrangements for executing machine instructions, e.g. instruction decode (for executing microinstructions G06F9/22) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9842046B2 cover?
A method of an aspect includes receiving an instruction indicating a first source packed memory indices, a second source packed data operation mask, and a destination storage location. Memory indices of the packed memory indices are compared with one another. One or more sets of duplicate memory indices are identified. Data corresponding to each set of duplicate memory indices is loaded only on…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/3838. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 12 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).