Apparatus and method for sliding window data access

US9348592B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9348592-B2
Application numberUS-201113995793-A
CountryUS
Kind codeB2
Filing dateDec 22, 2011
Priority dateDec 22, 2011
Publication dateMay 24, 2016
Grant dateMay 24, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus and method are described for fetching and storing a plurality of portions of a data stream into a plurality of registers. For example, a method according to one embodiment includes the following operations: determining a set of N vector registers into which to read N designated portions of a data stream stored in system memory; determining the system memory addresses for each of the N designated portions of the data stream; fetching the N designated portions of the data stream from the system memory at the system memory addresses; and storing the N designated portions of the data stream into the N vector registers.

First claim

Opening claim text (preview).

I claim: 1. A processor to execute an instruction to perform the operations of: determining a set of N vector registers into which to read N designated portions of a data stream stored in system memory; determining system memory addresses for each of the N designated portions of the data stream; fetching the N designated portions of the data stream from the system memory at the system memory addresses, wherein the N designated portions of the data stream include overlapping portions within the data stream; and storing the N designated portions of the data stream into the N vector registers, wherein the instruction is a single instruction. 2. The processor as in claim 1 wherein determining the system memory addresses comprises directly determining a first system memory address from the instruction and calculating the remaining N−1 addresses by adding multiples of a slide value to the first system memory address. 3. The processor as in claim 2 wherein the slide value is set to be equal to a size of a data element of the data stream. 4. The processor as in claim 1 wherein the portions of the data stream comprise data elements of the data stream. 5. The processor as in claim 1 wherein the instruction is specified in the form INSTRUCTION REG1, COUNT, MEMLOCATION, where REG1 comprises a first vector register to store a first portion of a data stream, COUNT comprises the number of portions of the data stream to be fetched from the system memory, and MEMLOCATION comprises the memory location for the first portion of the data stream. 6. The processor as in claim 5 wherein COUNT is set to a value of 16 for 16 portions of the data stream. 7. The processor as in claim 1 wherein each of the N portions of the data stream comprise floating point values and wherein each of the N vector registers comprise floating point registers. 8. The processor as in claim 7 wherein each of the floating point values comprise scalar floating point values. 9. The processor as in claim 7 wherein each of the floating point values comprise double floating point values. 10. The processor as in claim 1 wherein each of the N portions of the data stream comprise integer values. 11. The processor as in claim 10 wherein each of the integer values comprise packed doubleword values. 12. The processor as in claim 10 wherein each of the integer values comprise packed quadword values. 13. A method comprising: determining a set of N vector registers into which to read N designated portions of a data stream stored in system memory; determining system memory addresses for each of the N designated portions of the data stream; fetching the N designated portions of the data stream from the system memory at the system memory addresses, wherein the N designated portions of the data stream include overlapping portions within the data stream; and storing the N designated portions of the data stream into the N vector registers, wherein the method is performed through executing a single instruction by a processor. 14. The method as in claim 13 wherein determining the system memory addresses comprises directly determining a first system memory address from the single instruction and calculating the remaining N−1 addresses by adding multiples of a slide value to the first system memory address. 15. The method as in claim 14 wherein the slide value is set to be equal to a size of a data element of the data stream. 16. The method as in claim 13 wherein the portions of the data stream comprise data elements of the data stream. 17. The method as in claim 13 wherein the single instruction is specified in the form INSTRUCTION REG1, COUNT, MEMLOCATION, where REG1 comprises a first vector register to store a first portion of a data stream, COUNT comprises the number of portions of the data stream to be fetched from the system memory, and MEMLOCATION comprises the memory location for the first portion of the data stream. 18. The method as in claim 17 wherein COUNT is set to a value of 16 for 16 portions of the data stream. 19. The method as in claim 13 wherein each of the N portions of the data stream comprise floating point values and wherein each of the N vector registers comprise floating point registers. 20. The method as in claim 19 wherein each of the floating point values comprise scalar floating point values. 21. The method as in claim 19 wherein each of the floating point values comprise double floating point values. 22. The method as in claim 13 wherein each of the N portions of the data stream comprise integer values. 23. The method as in claim 22 wherein each of the integer values comprise packed doubleword values. 24. The method as in claim 22 wherein each of the integer values comprise packed quadword values. 25. A computer system comprising: a memory for storing program instructions and data; a processor to execute a single program instruction to perform the operations of: determining a set of N vector registers into which to read N designated portions of a data stream stored in system memory; determining system memory addresses for each of the N designated portions of the data stream; fetching the N designated portions of the data stream from the system memory at the system memory addresses, wherein the N designated portions of the data stream include overlapping portions within the data stream; and storing the N designated portions of the data stream into the N vector registers. 26. The system as in claim 25 further comprising: a display adapter to render graphics images in response to execution of the single program instruction by the processor. 27. The system as in claim 26 further comprising: a user input interface to receive control signals from a user input device, the processor executing the single program instruction in response to the control signals. 28. A processor to execute an instruction comprising: means for determining a set of N vector registers into which to read N designated portions of a data stream stored in system memory; means for determining system memory addresses for each of the N designated portions of the data stream; means for fetching the N designated portions of the data stream from the system memory at the system memory addresses, wherein the N designated portions of the data stream include overlapping portions within the data stream; and means for storing the N designated portions of the data stream into the N vector registers, wherein the instruction is a single instruction. 29. The processor as in claim 28 wherein means for determining the system memory addresses comprises directly determining a first system memory address from the instruction and calculating the remaining N−1 addresses by adding multiples of a slide value to the first system memory address. 30. The processor as in claim 29 wherein the slide value is set to be equal to a size of a data element of the data stream.

Assignees

Inventors

Classifications

  • with variable precision · CPC title

  • Adapting program code to run in a different environment; Porting · CPC title

  • according to one or more bits in the instruction, e.g. prefix, sub-opcode · CPC title

  • Bit or string instructions · CPC title

  • Operand prefetching (cache prefetching G06F12/0862) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9348592B2 cover?
An apparatus and method are described for fetching and storing a plurality of portions of a data stream into a plurality of registers. For example, a method according to one embodiment includes the following operations: determining a set of N vector registers into which to read N designated portions of a data stream stored in system memory; determining the system memory addresses for each of th…
Who is the assignee on this patent?
Jha Ashish, Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/30014. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 24 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).