Method and apparatus to increase the speed of the load access and data return speed path using early lower address bits

US9891915B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9891915-B2
Application numberUS-201414281663-A
CountryUS
Kind codeB2
Filing dateMay 19, 2014
Priority dateMar 15, 2013
Publication dateFeb 13, 2018
Grant dateFeb 13, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A microprocessor implemented method for resolving dependencies for a load instruction in a load store queue (LSQ) is disclosed. The method comprises initiating a computation of a virtual address corresponding to the load instruction in a first clock cycle. It also comprises transmitting early calculated lower address bits of the virtual address to a load store queue (LSQ) in the same cycle as the initiating. Finally, it comprises performing a partial match in the LSQ responsive to and using the lower address bits to find a prior aliasing store, wherein the prior aliasing store stores to a same address as the load instruction.

First claim

Opening claim text (preview).

What is claimed is: 1. A microprocessor implemented method for resolving dependencies for a load instruction in a load store queue (LSQ), said method comprising: initiating a computation of a virtual address corresponding to said load instruction in a first clock cycle; transmitting early calculated lower address bits of said virtual address to a load store queue (LSQ) in said first clock cycle, wherein said early calculated lower address bits are computed earlier and faster than upper bits of said virtual address to allow for earlier access times in a pipeline of said microprocessor; performing a partial match in said LSQ responsive to and using said lower address bits to find a prior aliasing store, wherein said prior aliasing store stores to a same address as said load instruction; and responsive to a determination that there is a partial match in said LSQ, storing a set of partially matched entries from said LSQ in a memory and performing a look-up on said set of partially matched entries in a second clock cycle. 2. The method of claim 1 further comprising: performing a prediction that said load instruction has a prior aliasing store in said LSQ; and responsive to a determination that there is no partial match in said LSQ and no prediction is available, retrieving data corresponding to said load instruction from a data cache memory in said second clock cycle. 3. The method of claim 2 , wherein said prediction is based on prior instances of store-to-load forwarding for said load instruction. 4. The method of claim 2 , further comprising: responsive to said determination that there is a partial match in said LSQ, waiting for said virtual address to fully compute to perform said look-up on said set of partially matched entries in said second clock cycle. 5. The method of claim 1 , further comprising: routing said lower address bits to said LSQ using a higher metal route relative to other bits of said virtual address. 6. The method of claim 1 , wherein said performing and said initiating are performed in a same cycle. 7. A processor unit configured to perform operations for resolving dependencies for a load instruction in a load store queue (LSQ), said operations comprising: initiating a computation of a virtual address corresponding to said load instruction in a first clock cycle; transmitting early calculated lower address bits of said virtual address to a load store queue (LSQ) in said first clock cycle, wherein said early calculated lower address bits are computed earlier and faster than upper bits of said virtual address to allow for earlier access times in a pipeline of said processor; performing a partial match in said LSQ responsive to and using said lower address bits to find a prior aliasing store, wherein said prior aliasing store stores to a same address as said load instruction; and responsive to a determination that there is a partial match in said LSQ, storing a set of partially matched entries from said LSQ in a memory and performing a look-up on said set of partially matched entries in a second clock cycle. 8. The processor unit of claim 7 , wherein said operations further comprise: performing a prediction that said load instruction has a prior aliasing store in said LSQ; and responsive to a determination that there is no partial match in said LSQ and no prediction is available, retrieving data corresponding to said load instruction from a data cache memory in said second clock cycle. 9. The processor unit of claim 8 , wherein said prediction is based on prior instances of store-to-load forwarding for said load instruction. 10. The processor unit of claim 8 , wherein said operations further comprise: responsive to said determination that there is a partial match in said LSQ, waiting for said virtual address to fully compute to perform said look-up on said set of partially matched entries in said second clock cycle. 11. The processor unit of claim 7 , wherein said operations further comprise: routing said lower address bits to said LSQ using a higher metal route relative to other bits of said virtual address. 12. The processor unit of claim 7 , wherein said performing and said initiating are performed in a same cycle. 13. An apparatus configured to resolve dependencies for a load instruction in a load store queue (LSQ), said apparatus comprising: a memory; a processor communicatively coupled to said memory, wherein said processor is configured to process instructions out of order, and further wherein said processor is configured to perform operations comprising: initiating a computation of a virtual address corresponding to said load instruction in a first clock cycle; transmitting early calculated lower address bits of said virtual address to a load store queue (LSQ) in said first clock cycle, wherein said early calculated lower address bits are computed earlier and faster than upper bits of said virtual address to allow for earlier access times in a pipeline of said processor; performing a partial match in said LSQ responsive to and using said lower address bits to find a prior aliasing store, wherein said prior aliasing store stores to a same address as said load instruction; and responsive to a determination that there is a partial match in said LSQ, storing a set of partially matched entries from said LSQ in a storage and performing a look-up on said set of partially matched entries in a second clock cycle. 14. The apparatus of claim 13 , wherein said operations further comprise: performing a prediction that said load instruction has a prior aliasing store in said LSQ; and responsive to a determination that there is no partial match in said LSQ and no prediction is available, retrieving data corresponding to said load instruction from a data cache memory in said second clock cycle. 15. The apparatus of claim 14 , wherein said prediction is based on prior instances of store-to-load forwarding for said load instruction. 16. The apparatus of claim 14 , wherein said operations further comprise: responsive to said determination that there is a partial match in said LSQ, waiting for said virtual address to fully compute to perform said look-up on said set of partially matched entries in said second clock cycle. 17. The apparatus of claim 13 , wherein said operations further comprise: routing said lower address bits to said LSQ using a higher metal route relative to other bits of said virtual address. 18. The apparatus of claim 13 , wherein said performing and said initiating are performed in a same cycle.

Assignees

Inventors

Classifications

  • LOAD or STORE instructions; Clear instruction · CPC title

  • the data cache being concurrently physically addressed · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9891915B2 cover?
A microprocessor implemented method for resolving dependencies for a load instruction in a load store queue (LSQ) is disclosed. The method comprises initiating a computation of a virtual address corresponding to the load instruction in a first clock cycle. It also comprises transmitting early calculated lower address bits of the virtual address to a load store queue (LSQ) in the same cycle as t…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/30043. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 13 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).